Introduction

In general terms, interpolation is a type of estimation or method of construction of new data points, based on a discrete set of known values. Let's suppose that you would like to know the minimum temperature of your home city for a particular day. Visiting each location in your study area will be difficult or expensive, so instead of carrying out thousands of measurements, you can measure the phenomena at strategically dispersed sample locations, and predict values that can be assigned to all other locations.

This process will create continuous surfaces (raster dataset) which represent in your study scenario, the temperature of your home country for all its extent, whether or not a measurement has been taken. As you may be thinking, this method is often used for science and engineering estimations.

Based on the above, in the following StoryMap, we will show you how interpolation methods (deterministic) can be applied to a real-world scenario and how the different parameters have to be managed, as they can highly influence the results.

Context and Data

The last summer (2022), the world faced a Record heat period that extended as north as Arctic Circle and as far south as the Middle East. For example, On June 18th, France's average temperature rose to 81.3° Fahrenheit. Additionally, hundreds of records were also broken in Spain, Germany, Austria and the Czech Republic.

Japan was not the exception. On June 25th, temperatures shot above 104° Fahrenheit for the first time on record during the month. In Isesaki, the mercury soared to 104.4° Fahrenheit. But it was not the only city, "dozens" of locations in Japan set monthly high-temperature records that day. For example, in Tokyo, the temperature climbed to 95.7° Fahrenheit.

This heat wave phenomenon it's a perfect scenario to perform interpolation methods and discuss the results. For that purpose, the Max Temperature registers (in Fahrenhait) for June 25th (Japon) were downloaded from the National Centers for Environmental Information (NCEI - NOAA), obtaining data from 219 weather stations all across Japan (Map 1).

Weather Stations

Map 1 - Weather Stations to be used in the Interpolation analyses.

Using the weather stations above and their Maximum Temperature records, IDW and Trend interpolation methods will be carried out and their results discussed. Moreover, specific questions will be answered for a complete understanding of those methods.

Questions

The following questions will be answered during the lab:

  1. Create a temperature map for Japon in ArcGIS Online using the data collected through the NCEI-NOAA. carefully choosing presentation details like classification and symbols. Discuss your results.
  2. In ArcGIS Pro, use the Japon data set and create four (4) IDW interpolation results with different power values (2) and neighbor (2) settings. Compare the two results (including quantitative statistics) and discuss the different outcomes.
  3. In ArcGIS Pro, use the Japon data set and create two Trend interpolation results (2) with different Polynomial Order settings. Compare the two results (including quantitative statistics) and discuss the different outcomes.
  4. Briefly discuss how you suggest approaching the quality assessment of interpolation results. You e.g. create an IDW output based on 12 vs 18 closest sample points - how can you judge one to be the preferable option?

Temperature Map

Before running the "Interpolate Points" tool in ArcGIS Online, the Weather Stations Maximum Temperature records were visualized using the "Counts and Amounts" symbology. As result, higher temperatures are observed in the center of the country, near the capital. (Map 2)

Map 2 - Weather Stations with "Counts and Amounts" symbology for the Maximum Temperature.

In ArcGIS Online, the "Interpolate Points" tool was used to generate two raster surfaces. This tool performs the interpolation using the "Empirical Bayesian Kriging" tool. For the first one (Swipe 1 Left-Right ), a "default" optimization and 10 classes were selected as parameters. For the second one (Swipe 1 Right-Left), a "Speed" optimization and 20 classes were selected. The "Speed" optimization was selected as the greater number of classes was making ArcGIS Online crash.

Swipe 1: Interpolation Surfaces created in ArcGIS Online.

Based on the  ArcGIS Online  documentation, the Interpolation Surfaces created using the "Default" parameter, simulate 100 semivariograms (Power), and use 10 neighbors. On the other hand, the Interpolation Surfaces created using the "Speed" parameter, simulate 30 semivariograms and use 8 neighbors.

As result, when selecting a greater number of classes, the surface is more detailed as each class is narrowed. However, working with more classes make it difficult for the software (ArcGIS Online) to create surfaces that honor the measured points. In our study case, for the 20 classes surface, there were some areas that when compared to the measures, were included in a class that does not honor the maximum temperature values. Regarding the number of neighbors used for both surfaces, these changes are not visually easily identifiable, since the predominant parameter was the number of classes.

IDW Interpolation

For this exercise, the "IDW" interpolation tool was used in ArcGIS Pro. This method determines cell values using a linearly weighted combination of a set of sample points. The weight is a function of inverse distance. In this method, it's assumed that the variable being mapped decreases in influence with distance from its sampled location ( Esri ).

There are two main parameters that influence the most: Power and Number of Points. The Power parameter controls the significance of known points on the interpolated values based on their distance from the output point. It is a positive, real number which is by default 2. When working with high power values, the nearby data will have the most influence, and as result, fewer smooth surfaces will be created. On the other hand, working with low power values will result in smoother surfaces, as the points farther away have more influence.

Commonly the most reasonable Power values are between 0.5 and 3. However, for this exercise, the first two IDW Surfaces were created using 4 and 6 as Power values respectively (Swipe 2), to provide a better visualization of the Power parameter influence.

Swipe 2: IDW Surfaces created in ArcGIS Pro (Different Powers).

As result, the IDW Surface created with Power 4 values (Swipe 2 Left-Right) evidence a smoother surface when compared with the IDW Surface created with Power 6 Values (Swipe 2 Right-Left). Additionally, for the Power 6 IDW Surface, it can be observed that similar measurements (In the North-Eastern part of Japan) that were spatially close, were assigned to the same classification.

Even though it can't be determined which power value is better. Using the "Cross Validation" tool, the Root-Mean-Square error can be calculated to check how closely the model predicts the measured values. For the Power 4 Surface, the RMS was 5,65, and for the Power 6 Surface, it was 5,75. The above shows how the Power can influence not only the resulting Interpolation surface but the accuracy of the model.

The other important parameter mentioned before is the number of neighbors or points used for interpolation. Limiting the number of inputs can speed up the process. Additionally, this parameter is reduced when points far away from the cell point location where the prediction is being made, may have a poor spatial correlation.

For visualizing the influence of the number of points parameter, two Interpolation surfaces were created (With Power 2). The first one with 15 points to be considered (Swipe 3 Left-Right) and the second, with 25 points (Swipe 3 Right-Left)

Swipe 3: IDW Surfaces created in ArcGIS Pro (Different Neighbors Numbers).

Regarding the best number of points, it's recommended to use a minimum number of points if the phenomenon has a great amount of variation.

For our case study, it's difficult to visually identify big differences beyond the slight smoothness changes. However, the RMS error can give us a big picture. For the 15 neighbors' surfaces, the RMS value was 5,1518. while for the 25 neighbors' surfaces, it was 5,1514. In this case, as in the study area there is no high variation of values, both surfaces have the same prediction accuracy.

Trend Interpolation

Finally, for this exercise, two Interpolation Surfaces were created using the Trend method. This method uses a global polynomial interpolation that fits a smooth surface defined by a mathematical function (a polynomial) to the input sample points (Esri). Trend interpolation creates smooth surfaces based on the polynomial order. This must be an integer between 1 and 12, where a value of 1 will fit a flat plane to the points, and a higher value will fit a more complex surface ( Esri ). The most used polynomial orders are between 1 to 3. However, for a better understanding of the polynomial parameter, higher orders will be used for this study case.

The trends Interpolation method is used mainly in two scenarios. When the surfaces varies gradually from region to region, or when global trends are likely to be examined or removed.

For our study case, the first Interpolation surface was created using a Polynomial order of 2 (Swipe 4 Left-Right), while for the creation of the second surface, a Polynomial order of 6 (Swipe 4 Right-Left) was instead selected.

Swipe 4 - Trend Interpolation Surfaces (Different Polynomial Order)

First, as you may notice, none of the resulting Interpolation Surfaces explain the temperature phenomena in a good way. The reason is that as the sample values do not vary gradually or the phenomenon is not on a global scale perspective, the Trend interpolation method is not the best option.

When visually comparing the Polynomial (Order 6) and Polynomial (Order 2) surfaces, it can be observed that effectively a higher Polynomial order results in a more complex surface, that in this case fits better the samples. This can be confirmed by checking the RMS error values. For the Polynomial Order 6 surface, the error was 4,79, while for the Polynomial Order 2 Surface, the error was 5,88.

Quality Assesment

As we evidenced in this lab, the accuracy of the resulting Interpolation Surfaces will depend initially on the samples you are working with and how they fit the different assumptions done by the interpolation methods. For example, in this lab. we observed that a Trend method may not be the best option if your measurements do not varies gradually or the phenomenon is not global. In recap, previously knowing how much your data fit the assumptions of a specific model, can give you an idea of how your model will perform.

Even though the previous approach is useful, the best way to know how your interpolation method is performing is with the "Root Mean Square" (RMS) error, which indicates how closely your model predicts the measured values. The lower the error value, the best your model explains your phenomenon.

In ArcGIS Pro, the " Cross Validation " tool is used to generate cross-validation statistics.

Conclusions

In conclusion, Interpolation methods are a fundamental part of today's science, engineering, and thousand of simulation and modeling processes. From a non-computational or statistical user, the interpolation workflow may seem not too difficult. However, in the backend, the mathematical and statistical background is fundamental and is recommended to be related to it, if you want to know how your model works and if it's the best option regarding your needs.

During this lab, we discussed two of the most simple Interpolation (deterministic) methods, the IDW and Trend interpolation. Additionally, we were able to create temperature surfaces for Japan on June 25th, 2022, and check in which zones of the country the temperature values were critical.

The performance of each interpolation method will depend on how your data fit the assumptions to carry out those models, and which parameters you apply when running them. It will be recommended to first check the related documentation or the theory in order to implement them in the best way and create the best explanatory surfaces with the lowest RMS error values.

Today's invitation is to learn about advanced Interpolation methods like Kriging and Spline, and continue modeling world phenomena as discrete surfaces.