April Demonstration Census Data

Spatial Analysis

Total Population by Census Tract

This map shows Census Tract population counts. Click on a tract to see the 2010 SF1 value, the 2010 Differential Privacy (DP) estimated value (Privacy Loss Budget = 12.1), and the population density.

Map legend
Map legend

Map legend

Esri, TomTom, Garmin, FAO, NOAA, USGS, EPA, USFWS
Powered by Esri

Since mapping population counts by census tract can present a misleading view of where people live, here's a map showing people per square mile. Click the map to see tract data.

Map legend

Clicking on a tract in Idaho, for example (tract 103.22), shows there were 6,910 people in 2010 (SF1 value), but only 6,872 people after differential privacy is applied (DP value). The population density for this tract is 3889.28 people per square mile.

Map popup

A scatterplot of total population (SF1) against the injected DP noise (DP - SF1) shows a slight negative relationship (R2=0.14). Interpretation: As a tract's population increases, the DP noise usually decreases so that the most populated tracts tend to report fewer people after DP is applied.

Notice the two extreme cases highlighted in the chart. A tract with 3,961 people gets 143 additional people with DP. Another tract with 7,540 people loses 211 people with DP.

So what? Why do we care? Ideally, the noise added by differential privacy will be random. The number of people in a tract should not influence whether DP noise is positive or negative.

This  hot spot map  shows where the DP overestimates are spatially concentrated (red), as well as where the DP underestimates are spatially concentrated (blue). This analysis evaluates differences (DP - SF1). Keep in mind that a hot spot (red) isn't just a single large difference, but rather, it is a large difference surrounded by other large differences. Red identifies areas where the DP values overestimate the actual population. Blue identifies areas where the DP values underestimate the SF1 Total Population values.

Map legend

Parameters used in ArcGIS Pro to obtain the map results shown

So what? Why do we care? Ideally, the spatial placement of positive or negative noise will exhibit a random spatial pattern. When the pattern is random, there are no statistically significant clusters of either positive or negative values. Here, however, we DO see statistically significant clustering. Regions with clustering of negative noise (around Miami, for example) could translate to less representation or underfunding for programs based on population counts.

This map shows mean absolute differences (DP-SF1) between the estimated and published total population values  summarized for each tract and its nearest 50 neighbors . The darkest blue areas are places with the largest distortions due to differential privacy.

Map legend

There are actually 64 tracts where the mean absolute difference is larger than 40, but they're small and difficult to see (south of Phoenix, and near Houston and Colorado Springs, for example). Zoom in if you want to identify these.

The parameters in ArcGIS Pro used to create this map result

So what? Why do we care? Ideally, differential privacy will impact locations similarly. In the map, the mean value associated with each tract is obtained from the mean (DP-SF1) values for the tract and it's 50 closest neighbors. Large mean values indicate a concentration of poor estimates. It would be unfair if some regions of the country (shown in darker blue) had larger census count distortions than other regions.

Large mean absolute differences for tracts with thousands of people will be less impactful than for tracts with only a handful of people. To see the tracts that are impacted most because of few people and large absolute errors, compute the percent absolute error:

(|DP-SF1|/SF1) * 100

To avoid division by zero, the percent absolute error is set to zero for tracts where both SF1 and DP are zero (201 tracts fall into this category). For tracts where SF1 is zero but DP is larger than zero (13 tracts), use the following formula to compute percent error:

((|DP-SF1|)/0.49) * 100

The map shows the tracts with large percent absolute errors, using the legend below (based on standard deviations):

Map legend

Some findings: There are 185 tracts where the absolute percent error is more than 100%. The largest percent difference is 10,000%. All of the tracts with percent errors larger than 100 are overestimates. The largest error adds 138 people to a location with only 22 people (SF1).

So what? Why do we care? Errors larger than 100% would seem to border on data fabrication, and could result in misappropriated funds or unjustified representation for programs that base allocation on population totals.

Ideally the distortion associated with the percent absolute errors would be randomly distributed across the country. To see if there are regions where the error congregates, compute the mean absolute error for each tract and its nearest 50 neighbors. The dark blue areas on the map to the right, are associated with the largest mean relative distortion of the SF1 total population values. Notice that large areas of Texas are unduly impacted.

Map legend

Clicking a tract south of Dallas, for example (tract 103), indicates the mean percent absolute difference for that tract and its nearest 50 neighbors is 145.22%.

Map popup

So what? Why do we care? It doesn't seem right (fair) that some regions of the country would experience more distortion of their data than other regions.

Data Relationships

For programs that base representation or funding on population counts, DP creates both winners and losers. Regions that lose population will potentially lose some funding or representation. Ideally the positive and negative noise will not cluster spatially; if they do cluster spatially, they will create regions of advantage/disadvantaged. In addition, there shouldn't be any statistically significant relationships between the DP noise and other variables. It would be tragic, for example, if decreases in population were only associated with communities of color.

This is where spatial analysis is especially important.

While the global relationship between the DP injected noise (DP-SF1) and BIPOC (Black, Indigenous, and other people of color in 2010) is negative and weak (R2=0.10), a large number of statistically significant regions emerge when each tract is assessed within the context of its nearest 50 neighbors. Most of these  statistically significant local relationships  are negative, indicating that the DP noise increases (adding people) as the number of BIPOC goes down. Indeed, communities of color appear to be losers with differential privacy.

Map legend

Click on the map to examine the data and local scatterplots.

ArcGIS Pro tool parameters to create the map results shown here

If the negative relationships are only associated with tracts that have few BIPOC, there is less concern. Explore the maps to see if that's the case.

Left side map legend

Right side map legend

It's actually difficult to visually identify where the largest communities of color correspond to strong local relationships. This bivariate map helps. Southern California, broad regions in Arizona and New Mexico, Nevada, and Florida stand out.

So what? Why do we care? The darker areas of the map are regions of concern because they are the Differential Privacy "losers". If differential privacy artificially decreases population counts primarily in these communities of color, it could inadvertently increase racial inequities.

Summary

The spatial analyses presented here only look at the Total Population variable, and Total Population DP noise (DP-SF1) in relation to communities of color. Repeating these analysis for other variables and definitely repeating them with the upcoming August demonstration data is recommended.

Contact LGriffin@esri.com with questions or comments.

Thank you!

Map legend

Map legend

Map popup

Map legend

Parameters used in ArcGIS Pro to obtain the map results shown

Map legend

The parameters in ArcGIS Pro used to create this map result

Map legend

Map legend

Map popup

Map legend

ArcGIS Pro tool parameters to create the map results shown here

Left side map legend

Right side map legend