POINT PATTERN ANALYSIS
Topic 5

Analyzing Point Patterns
Do you think landscapes have order, or are they merely a random assortment of objects in space? Consider the map below that displays coffee shops in Portland, Oregon.
Portland Coffee Shops
One of the main characteristics of someone who is spatially literate is the ability to look at landscapes and describe the spatial pattern of objects. That is, spatially literate individuals can clearly articulate how objects are spatially distributed using specific terminology. Here, are three main ways to describe the spatial pattern of objects.
- Clustered: occurs when objects exist in close proximity to one another.
- Dispersed: occurs when objects are spread out from one another.
- Random: occurs when objects exist in neither a clustered or dispersed pattern. This is what we also refer to as a "hypothetical" or "normative" pattern.
It should be somewhat obvious that the spatial distribution of coffee shops in the map above represent a clustered pattern. Coffee shops exist in close proximity to one another, which is not uncommon for many different types of cities.
Question 4.1: Why do you think the pattern of coffee shops is clustered?
One explanation for this is something called Hotelling's Law, which describes that businesses will locate themselves where they can gain the greatest share of the market, which typically results in being close to other businesses that are similar to them (learn more about Hotelling's Law here ). Hotelling's Law helps us understand why similar businesses appear clustered in space. What about natural phenomena, though?
Look at the spatial distribution of landslides in Oregon in the map below, which displays data provided by the State of Oregon's Department of Geology and Mineral Industries . How would you describe the spatial pattern of landslides? Most if not all of us would agree that landslides exhibit a clustered pattern as well, and we will all likely agree that we cannot explain this spatial pattern by simply apply Hotelling's Law. Instead, we need to look at the topography of the earth's surface to be able to explain why landslides are clustered. Zoom into different areas across Oregon (hold down the shift button and draw a box where you want to zoom in) where landslides have occurred and examine the human-made and natural features that may be responsible for the clustered spatial pattern of landslides that we see.
Landslides in Oregon
Question 4.2: Why does the pattern of landslides in Oregon appear to be clustered?
Now, let's look at the next map below, which displays nuclear facilities across the United States. The locations on the map, which was provided by the United States' Nuclear Regulatory Council , represent a single facility that is responsible for generating nuclear power. How would you describe the spatial distribution of facilities? While the eastern seaboard appears to have a high density of facilities that might lead one to state the pattern is clustered, we get a different picture if we look at the data at a slightly finer scale. Zoom in to the map a little and look at the pattern of facilities in the eastern part of the county. How would you describe the spatial pattern? An interesting observation is that some, but certainly not all, facilities appear to have a relative similar distance to their nearest neighboring facility. In this sense, we might decide to state that the spatial pattern of nuclear facilities in the eastern United States is dispersed. Why would that be?
Eastern US Nuclear Facilities
Question 4.3: Why does the pattern of nuclear power facilities in the eastern U.S. appear dispersed at finer scales?
By examining the three maps above on this page and by answering the relevant questions, you have begun to train your brain to become spatially literate. Examining spatial patterns of objects, describing how these patterns look, and then explaining why such patterns exist, is collectively a single process that the spatially literate mind undertakes when looking at the world.
What if we want to go beyond just visualizing spatial patterns, and actually measure patterns and have some level of certainty attached to our observations? That's when we apply various types of spatial analyses, first starting with the nearest neighbour analysis.
Nearest Neighbour Analysis
Nearest Neighbour Analysis is a simple and popular approach to characterize the spatial arrangement of points in a study area. Here's how it works: first, you measure the distance between each point and its nearest neighbour. Distance is most commonly Euclidean, but can also be along a winding road or river. Next, all the distances are summed together and divided by the number of points in the study area, which gives us a measure called the average nearest neighbour distance. The equation for the mean NND is:
Next, we can compare our mean NND value to the mean NND of a random pattern for an area with the same spatial density of points. If our mean NND is smaller than that of the random pattern, we can conclude that our pattern is clustered (imagine if all observations were clustered together on the exact same location, we would have a mean NND of zero). Conversely, our mean NND is larger than that of a random pattern, we would conclude that the pattern is dispersed.
Where in the world do we find a random pattern for whatever it is that we are examining, especially one that has the same density of points? This is one of the times when statistics comes in really handy. Instead of searching for a random pattern in the real world, we can estimate a hypothetical, or "idealistic" value of a mean NND for a random pattern by just using the density of points in our study site. The equation for doing this is:
Therefore, values below the mean NND for a random distribution are clustered, and values above are dispersed.
We know that a perfectly clustered pattern has a mean NND of 0, which we write as:
We can also figure out what the mean NND would be for a perfectly dispersed pattern by using our density of points (the numerator is just a constant number that helps us with this formula):
We now have a spectrum of mean NND values to help us evaluate our spatial pattern, which means we can make some conclusions about the spatial pattern of our observations. However, there are two issues here that we need to address. First, we can't really compare patterns from one study site to another because the mean NND for a dispersed distribution is dependent on the density of points. What if one study site is larger than the other, or what if one has more points? This would change our estimate of a mean NND for a perfectly dispersed pattern. So what do we do? You got it, we standardize our mean NND value. We do this by calculating a standardized value by dividing the mean NND by the corresponding value for a random distribution with the same point density:
We can then use this value for one study area to compare against another study area and make a statement regarding their spatial patterns.
Second, how do we know if our spatial pattern is significantly clustered, dispersed, or random? Well, just like our Z-test in the previous section that we used to determine if a sample is significantly different than a population, we can conduct a Z-test to determine if a pattern is significantly different than random using the following formula:
where the denominator represents the standard error of the mean NND, which can be estimated using the formula:
We can now use our Zn value and the Normal Table to conduct either a one- or two-tailed hypothesis about our spatial pattern, just as we would with the difference of means test we conducted in the last section. And that's it! You are now ready to conduct your first inferential spatial analysis.
Your Turn
Part A:
Below is a study area with seven points. The map is georeferenced on a simple cartesian grid that shows the study area as 8 x 8 units.
The table below contains information on the nearest neighbour and nearest neighbour distance for each point.
Perform a nearest neighbour analysis to quantify the point pattern in the above map (please note that the below questions are worth 5 points each for a correct answer versus previous questions that are worth 0.5 for participation).
Question 4.4 The spatial pattern of points in the map above is (use a 95% confidence (level):
- significantly dispersed
- random
- significantly clustered
Part B
Below is a study area with seven points at different locations from Part A. The map is georeferenced on the same cartesian grid. The table below contains the new information on nearest neighbour distance.
Question 4.5: The spatial pattern of points in the map above is (use a 95% confidence (level):
- significantly dispersed
- random
- significantly clustered
Part C
Below is a study area with seven points at different locations from Part A and Part B. The map is georeferenced on the same cartesian grid. The table below contains the new information on nearest neighbour distance.
Question 4.6: The spatial pattern of the points in the map above is (use a 95% confidence level):
- significantly dispersed
- random
- significantly clustered
Quadrat Analysis
Below is a map with point observations. How would you describe this pattern?
Question 4.7: The pattern above is:
- random
- clustered
- dispersed
What if we place a grid on top of this map.
Question 4.8: How would you describe the relationship between the locations of points and cells in the grid?
Here's another pattern.
Question 4.9: How would you describe the pattern above?
- random
- clustered
- dispersed
Let's place the same grid on top.
Question 4.10: How would you describe the relationship between the locations of points and cells in the grid?
What you've just accomplished is pretty much what a quadrat analysis does. A quadrat analysis is an alternative way of testing whether or not a spatial pattern is significantly different than a random spatial pattern. In the first pattern above, all points were in one cell (or quadrat), so there is a high degree of variation regarding the number of points per cell. Conversely, the second pattern had the exact same number of points per cell, so the variance is zero as there is no difference between a cell and any other cell. Given this logic, answer the following two questions:
Question 4.11: A high variance of points in a grid of cells represents a pattern that is:
- random
- clustered
- dispersed
Question 4.12: A low variance of points in a grid of cells represents a pattern that is:
- random
- clustered
- dispersed
But how do we calculate variance for quadrat analysis? The VAR equation looks rather complex, but all you actually need are two variables: the number of points per cell (x), and the frequency (f) of cells that contain x number of points.
where m is the number of cells or quadrats. Below is a dataset of wildfires in New Mexico between 2000 and 2009 (this exercise is taken from Statistical Applicaitons for Geographers by McGrew and Monroe, 2014). Imagine a map with 2509 points spread out across the state, each one representing a wildfire, sort of like our wildfire map of BC in the introductory chapter. Now imagine a 10 x 12 grid placed on top of that map of points, and from this someone has counted the number of points per cell (x), and the frequency (f) of cells that contain x points. These values have been entered in the table below.
See if you can calculate the VAR directly in the Google sheets below. Otherwise, copy and paste the values into another worksheet editor such as Excel and calculate VAR.
Quadrat Analysis
Question 4.13: What is the VAR for the dataset above? Your calculated answer should at least be close to one of the answers below.
- 1.4
- 18.9
- 31.98
- 410.04
And there you have it; you just calculated the variance of cell frequencies for a quadrat analysis. So what do we do with this value? We can't use VAR as an effective measure on its own because it is heavily influenced by the density of points, i.e. the mean number of points per cell. Think back to our discussion on descriptive statistics when we wanted to use the standard deviation as a useful measure of variability, but we ran into problems comparing the standard deviation of different samples because they are dependent on the size of the values being used. If you recall, we simply divided the standard deviation by the sample mean to get the coefficient of variation, which provided us with a standardized measure (i.e. the coefficient of variation) to compare samples. Similarly, we can compare the variance in cell frequencies by the mean number of points per cell
where n is the number of points in the dataset, and m is the number of cells. To compare the VAR and the MEAN, we simply divide one by the other to get a variance-mean ration (VMR).
Question 4.14: A relatively high VMR value indicates a pattern that is:
- clustered
- dispersed
For a perfectly random distribution, our VAR and MEAN are equal. This is kind of like our mean nearest neighbour distance being equal to a mean nearest neighbour distance for a random distribution.
Question 4.15: For a perfectly random spatial point pattern, the VMR is:
- relatively high
- relatively low
Finally, we can also perform an inferential test statistic to determine if the spatial pattern is indeed significantly different than random. However, rather than conducting a z-test like we did for nearest neighbour analysis, we need to use a chi-square test because we are operating under the assumption that our data follow a Poisson distribution as we are counting the number of points per area. It is not crucial to understand this last point, but if you do want to become more knowledgeable in why we use a Poisson distribution, I recommend that you find some resources online to help you do so, and of course a good place to start is here . The chi-square test statistic is as follows:
Just like the nearest neighbour analysis, the null hypothesis is that there is no difference between the observed distribution of points and a distribution resulting from a random pattern (i.e. the VMR is equal to 1). But rather than looking at the normal table, we have to obtain our p-value from a chi-square table. To do this, I recommend using an online p-value calculator as most tables won't contain all the information we need. To get the p-value for a chi-square test, you need two things:
- chi-square value
- number of degrees of freedom (m - 1)
Plug those values in and see what your p-value is. Is it small enough that you can reject the null hypothesis?
Question 4.16: Given the results of your test:
- you accept the null hypothesis
- you reject the null hypothesis
Something to Consider
Look at the two figures below. Both have the same VMR; is this a problem?