Predicting unknown from known
Spatial Interpolation
When analyzing real-world phenomena, it is not practical to collect data for every location in the area of interest. Estimation or prediction might be needed at some point in time. The method of accurate prediction seems to be magic in itself but how to do this magic and what needs to considered when performing this magic?
Answer might come out saying: "Take that you have with known location and perform magic called interpolation on it"
For studying the behaviour of simple deterministic interpolation that create surfaces from measured points, based on either the extent of similarity and the parameter that affects the method, I randomly chose my study of the area within Europe in Belgium. The data for the study was acquired from Global Surface of the Day (GSOD).
Comparitive disscussion for number of Closest Points
I learned that for the prediction of the unknown value we need neighbourhood sample points that will act as the training dataset. Inverse Distance Weight (IDW) uses, certain predefined formula for interpolating unknown values of distant locations. It follows the rules that sample values closer to the prediction location will have more influence than sample values farther apart.
To determine the temperature at an unknown location, one factor that affected its outcome was the number of its neighbourhood. Initially, I chose 12 nearest neighbourhood that could determine the value at the centre of Brussel.
When taking only those 12 points into consideration and interpolating, the output was as shown in the map (represented with Geometric Classification for better visualization).
Similarly, if 18 was taken as number of points and based on that if unknown location was chosen in the exact place as previous and finally interpolating these points. I received a map....
......I would say a better map as compared to the previous was obtained, the random peak and crust in the data was reduced but represented more global phenomenon and lost local phenomenon. The value varied and since, as point close to one another are more alike than those that are farther away, as the locations get farther away, the measured values will have little relationship to the value of the prediction location.
Thus, to speed calculations and get more accurate result, we can exclude the more distant points that will have little influence on the prediction.
So, number of points largerly depends on our use, if we want to preserve local phenomenons, we should take smaller number but if we need more smoother result and do not care about local changes then user larger number.
Temperature Map generated
The Temperature map prepared from the interpolated GSOD points is as shown in the map view below. I believe that the same series of colour will show the temperature changes more efficiently. Temperature values in Belgium are ranging very low so I preferred in using a series of blue colour to bright. The darker the colour, the lower the temperature.
Here, I classified the results Geometric interval but avoided using equal area or equal interval as we know for temperature changes with height. The equal class interval might show the temperature with an equal area which might not be the real case and so is with equal interval.
Temperature Map
Here predicted temp at Brussels was interpolated as 8.66 with reference to the GSOD dataset with a Standard deviation of 0.42. By standard deviation here, I mean that the true temp value at the desired location is 8.66 but could be as low as (8.66+(0.42*2)) or (8.66-(0.42*2)). We can see from the results that the temperature of Belgium is changing as we move from East to West.
Comparative Disscusion on output from IDW with seprate power value
To determine the parametric effect on deterministic interpolation output with a different set of power value, processing was done with power 2 and 4 respectively. From the output which is shown in the image below we could say that the result of IDW is affected by power.
Interpolated Map for 12 points with power 2
When performing the IDW interpolation with power 2, with the assumption of power as the weight/ emphasis given to neighbouring point, output, as shown in the image in the right, was obtained, which was more rough surface measurement but had less standard deviation.
Interpolated Map for 12 points with power 4
Similarly, when performing the IDW interpolation with power 4, with the assumption of power as the weight/ emphasis given to neighbouring point, output, as shown in the image in the right, was obtained, which was more smoother surface measurement but had high standard deviation higher Standard deviation.
Difference of IDW with power 2 from power 4
From the result, it shows that power controls the significance of the surrounding points that means if we assign high power there will be less influence by distant points and vice versa. The higher power means that the surrounding points make more effect on the results and the surface will be not smooth. What can be seen from the results is that the standard deviation of power 2 is lower than that of power 4. It suggest that the value obtained from IDW with power 2 is more stable that when the power is increased to 4. But, it was observed that the mean value and the median value was almost same in both the case
The map in the figure is the subtracted raster with power 2 from power 4. The main shortcoming of this approach was its “bull’s eye” effect where the highest values will be assigned to points that are near the sampled locations and neglect the effect done by points that are located far apart.
In spatial data one of the most faced problems is missing data, thus with this spatial interpolation we can interpolate values for the missing observations.