April Demonstration Census Data
Spatial Analysis
Total Population by Census Tract
This map shows Census Tract population counts. Click on a tract to see the 2010 SF1 value, the 2010 Differential Privacy (DP) estimated value (Privacy Loss Budget = 12.1), and the population density.
Map legend
Since mapping population counts by census tract can present a misleading view of where people live, here's a map showing people per square mile. Click the map to see tract data.
Map legend
Clicking on a tract in Idaho, for example (tract 103.22), shows there were 6,910 people in 2010 (SF1 value), but only 6,872 people after differential privacy is applied (DP value). The population density for this tract is 3889.28 people per square mile.
Map popup
A scatterplot of total population (SF1) against the injected DP noise (DP - SF1) shows a slight negative relationship (R2=0.14). Interpretation: As a tract's population increases, the DP noise usually decreases so that the most populated tracts tend to report fewer people after DP is applied.
Notice the two extreme cases highlighted in the chart. A tract with 3,961 people gets 143 additional people with DP. Another tract with 7,540 people loses 211 people with DP.
So what? Why do we care? Ideally, the noise added by differential privacy will be random. The number of people in a tract should not influence whether DP noise is positive or negative.
This hot spot map shows where the DP overestimates are spatially concentrated (red), as well as where the DP underestimates are spatially concentrated (blue). This analysis evaluates differences (DP - SF1). Keep in mind that a hot spot (red) isn't just a single large difference, but rather, it is a large difference surrounded by other large differences. Red identifies areas where the DP values overestimate the actual population. Blue identifies areas where the DP values underestimate the SF1 Total Population values.
Map legend
Parameters used in ArcGIS Pro to obtain the map results shown
So what? Why do we care? Ideally, the spatial placement of positive or negative noise will exhibit a random spatial pattern. When the pattern is random, there are no statistically significant clusters of either positive or negative values. Here, however, we DO see statistically significant clustering. Regions with clustering of negative noise (around Miami, for example) could translate to less representation or underfunding for programs based on population counts.
This map shows mean absolute differences (DP-SF1) between the estimated and published total population values summarized for each tract and its nearest 50 neighbors . The darkest blue areas are places with the largest distortions due to differential privacy.
Map legend
There are actually 64 tracts where the mean absolute difference is larger than 40, but they're small and difficult to see (south of Phoenix, and near Houston and Colorado Springs, for example). Zoom in if you want to identify these.
The parameters in ArcGIS Pro used to create this map result
So what? Why do we care? Ideally, differential privacy will impact locations similarly. In the map, the mean value associated with each tract is obtained from the mean (DP-SF1) values for the tract and it's 50 closest neighbors. Large mean values indicate a concentration of poor estimates. It would be unfair if some regions of the country (shown in darker blue) had larger census count distortions than other regions.
Large mean absolute differences for tracts with thousands of people will be less impactful than for tracts with only a handful of people. To see the tracts that are impacted most because of few people and large absolute errors, compute the percent absolute error:
(|DP-SF1|/SF1) * 100
To avoid division by zero, the percent absolute error is set to zero for tracts where both SF1 and DP are zero (201 tracts fall into this category). For tracts where SF1 is zero but DP is larger than zero (13 tracts), use the following formula to compute percent error:
((|DP-SF1|)/0.49) * 100
The map shows the tracts with large percent absolute errors, using the legend below (based on standard deviations):
Map legend
Some findings: There are 185 tracts where the absolute percent error is more than 100%. The largest percent difference is 10,000%. All of the tracts with percent errors larger than 100 are overestimates. The largest error adds 138 people to a location with only 22 people (SF1).
So what? Why do we care? Errors larger than 100% would seem to border on data fabrication, and could result in misappropriated funds or unjustified representation for programs that base allocation on population totals.
Ideally the distortion associated with the percent absolute errors would be randomly distributed across the country. To see if there are regions where the error congregates, compute the mean absolute error for each tract and its nearest 50 neighbors. The dark blue areas on the map to the right, are associated with the largest mean relative distortion of the SF1 total population values. Notice that large areas of Texas are unduly impacted.
Map legend
Clicking a tract south of Dallas, for example (tract 103), indicates the mean percent absolute difference for that tract and its nearest 50 neighbors is 145.22%.
Map popup
So what? Why do we care? It doesn't seem right (fair) that some regions of the country would experience more distortion of their data than other regions.
Data Relationships
For programs that base representation or funding on population counts, DP creates both winners and losers. Regions that lose population will potentially lose some funding or representation. Ideally the positive and negative noise will not cluster spatially; if they do cluster spatially, they will create regions of advantage/disadvantaged. In addition, there shouldn't be any statistically significant relationships between the DP noise and other variables. It would be tragic, for example, if decreases in population were only associated with communities of color.
This is where spatial analysis is especially important.
While the global relationship between the DP injected noise (DP-SF1) and BIPOC (Black, Indigenous, and other people of color in 2010) is negative and weak (R2=0.10), a large number of statistically significant regions emerge when each tract is assessed within the context of its nearest 50 neighbors. Most of these statistically significant local relationships are negative, indicating that the DP noise increases (adding people) as the number of BIPOC goes down. Indeed, communities of color appear to be losers with differential privacy.
Map legend
Click on the map to examine the data and local scatterplots.
ArcGIS Pro tool parameters to create the map results shown here
If the negative relationships are only associated with tracts that have few BIPOC, there is less concern. Explore the maps to see if that's the case.
Left side map legend
Right side map legend
It's actually difficult to visually identify where the largest communities of color correspond to strong local relationships. This bivariate map helps. Southern California, broad regions in Arizona and New Mexico, Nevada, and Florida stand out.
So what? Why do we care? The darker areas of the map are regions of concern because they are the Differential Privacy "losers". If differential privacy artificially decreases population counts primarily in these communities of color, it could inadvertently increase racial inequities.
Summary
The spatial analyses presented here only look at the Total Population variable, and Total Population DP noise (DP-SF1) in relation to communities of color. Repeating these analysis for other variables and definitely repeating them with the upcoming August demonstration data is recommended.
Contact LGriffin@esri.com with questions or comments.
Thank you!