Census data reliability

Exploration and takeaways

Denver's City Park with the downtown skyline in the background.

Introduction

Like many federal, state, peer metropolitan planning organizations and local jurisdictions, the Denver Regional Council of Governments is focused on improving transportation outcomes for marginalized communities. Much of this work is grounded in data products from the U.S. Census Bureau to map populations such as people of color and people with low income. Environmental justice and equity analyses play an important role in the transportation planning and funding process. Understanding the underlying data source, including its limitations, is worth considering when developing a robust, defensible and meaningful analysis. Environmental justice and equity analysis at DRCOG supports foundational programs, such as the 2050 Metro Vision  Regional Transportation Plan ,  Transportation Improvement Program  and  Metro Vision .

The information presented here details our investigation into census data reliability as it pertains to selecting scale, variables and visualization options for data. Best practices are derived from comprehensive data analysis exercises, past DRCOG practices, academic papers and federal guidelines. 

A person holds a magnifying glass, looking at statistic figures.
A person holds a magnifying glass, looking at statistic figures.

Source

To create a dataset for environmental justice and equity analyses, the first step is identifying the data source. The best publicly available data source for DRCOG’s purposes is the Five-Year American Community Survey from the U.S. Census Bureau, which balances reliability, breadth of demographic statistics and fine geographic scale. The five-year ACS samples roughly 3.5 million households per year over a five-year period and provides estimates for sub-county geographies like tracts and block groups. Alternatively, one-year ACS estimates use only 12 months of collected data, resulting in larger margins of error than the five-year ACS estimates, and are not available for sub-county geographies.  1   The decennial census does not include the same breadth of variables and the 10-year update interval is too infrequent for most DRCOG purposes. Metropolitan planning organizations typically use the five-year ACS products as their primary data source.

A cover image of the American Community Survey from the United States Census Bureau, including several small images of people.

 U.S. Census Bureau 

The primary limitation of the five-year ACS is the inconsistent reliability of the estimates. In the DRCOG region, many estimates have high margins of error, especially in areas with low total population or for estimates referring to highly specific subsets of the population (e.g., total adults over 65 with a disability). Margins of error for the five-year ACS data are provided at a 90% confidence level. For example, an estimate of 75 people of color with a margin of error of 25 is expected to have a true count of people of color between 50 and 100 90% of the time.  2   The reliability of five-year ACS data should not be taken for granted.

DRCOG uses the most currently available version of the five-year ACS data, while remaining aware of its limitations and making decisions that maximize reliability where possible.

Scale

The five-year ACS data is available at multiple scales; block groups and tracts are the most popular for metrpolitan planning organization use. Census tracts “generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people.”  3   Block groups are divisions of tracts, “generally defined to contain between 600 and 3,000 people."  4   As of 2020, the DRCOG modeling area contains roughly 2,318 block groups and 808 tracts. Data scale affects reliability, level of detail available and visualization options.

In general, tract-level data is more reliable than block group-level data, while block groups provide more geographic detail. Particularly in rural areas, tracts can be relatively large. In both and Arapahoe Counties, for instance, one tract makes up over half of the county’s land area. On the other hand, , excluding the airport, consists of many smaller, similarly sized tracts.

The DRCOG region is uniquely made up of both densely populated urban and sparsely populated rural areas, meaning block group and tract areas vary widely across the region. Regardless of the use of block groups or tracts, it should be remembered that neighborhoods do not follow census boundaries.

Across other metropolitan planning organizations, the use of block groups or tracts for the scale of analysis varies. Some metropolitan planning organizations use transportation analysis zones. DRCOG has previously cross-walked census data from block group or tract geographies to traffic analysis zones.  5   However, margins of error, unlike estimates, cannot be computed for TAZs and thus TAZ geographies should be limited only to projects where they are necessary.

For environmental justice-specific analyses, DRCOG can produce block group scale data as the required variables for the statute are satisfactorily reliable at that scale. DRCOG will crosswalk data to TAZs only when necessary, since margins of error cannot be translated. In general, DRCOG will default to producing tract scale data due to its better reliability.

Reliability

Reliability is measured as the estimate’s margin of error divided by the estimate itself, or the proportion of the margin of error to the estimate. For example, for a block group with an estimate of 100 people with low income and margin of error of 50, the reliability score for that estimate is 50/100 or 0.5. The lower the reliability score, the better the reliability of the estimate.

The U.S. Census Bureau recommends calculating the coefficient of variance score to determine reliability.  6   The coefficient of variance score represents the proportion of standard error to the estimate, which yields results very similar to the reliability score above but is harder to interpret. Furthermore, many environmental justice and equity variables require combining several estimates. To create combined margins of error following guidance from the U.S. Census Bureau, DRCOG staff took the square root of the sum of squared margins of error for each estimate in the combination.  7  * Thus, most reliability scores presented below equate to the square root of the sum of squares divided by the combined estimates.

DRCOG staff examined reliability scores through histograms as well as maps. These graphs plot the count of geographies in a region that have a range of reliability scores. The x-axis consists of a range of reliability scores from 0 to 1 and more; the y-axis shows the number of geographies with a reliability score in that range.

 *Producing exact margins of error for combined estimates requires variance replication tables provided by the U.S. Census Bureau; however, these tables are not available for all of the variables explored during this project.  

Takeaways

Assessing reliability scores revealed several key lessons that DRCOG will incorporate into future data decisions. The overarching takeaway is the more people included in an estimate, the more reliable the data.

The scatterplot below demonstrates this lesson. Each point represents one block group and is placed according to its number of people with low income estimate and its associated reliability score. The orange points are block groups where the estimate is relatively low (below 50) to show that these block groups also have generally high reliability scores (less reliable).

1. Tracts are more reliable than block groups

The table below shows the differences in reliability between the block group and tract scales for several variables. Every variable has a lower reliability score (better reliability) at the tract scale compared to its block group counterpart. In some cases, such as people with limited English proficiency or people with a disability, this difference is significant.

 Table 1: Median reliability (margin of error/estimate) scores for selected variables in the DRCOG region, 2016-2020 ACS 

 * Block group level data only includes people with disabilities ages 20-64; the tract data counts people of all ages 

The histograms to the right compare people of color reliability data at the block group and tract scales. The more blue bars, the more reliable the data is.

2. Individual estimates are more reliable than household estimates

The five-year ACS includes poverty data for both households and individuals above or below the poverty line. Although in the past DRCOG used households below the poverty line, household data is less reliable than population or individual data.

3. Some variables are more reliable than others

Instead of using the poverty line as the threshold, many metropolitan planning organizations use a ratio of income to the poverty line to “get a broader count of people experiencing economic hardship” since “many people who are not officially living in poverty still have a hard time making ends meet.”  8   A common ratio to use is income at or below 200% of the poverty line, which is used by the Metropolitan Transportation Commission in the Bay Area, Puget Sound Regional Council and the San Diego Association of Governments, among others. Denver’s cost of living is higher than the national average, which further reduces the effectiveness of the federal poverty line.  9  

The histograms below demonstrate how poverty variables become more reliable the higher the ratio of income to poverty line, while still effectively counting people with low incomes. Click the arrows to compare the histograms.

Conclusion

While this discussion focuses on numbers and data points, it is important to remember that census data provides estimates of real people. It can be tempting to dismiss data outliers as errors, but sometimes outliers represent true situations.

For instance, in the course of this work, DRCOG staff identified a block group with a significantly high estimate of people with low income but which was also highly unreliable based on the reliability score. After further research, however, staff discovered that the block group represented a prison facility, where there indeed is a high population of people with low income, but valid "low" reliability, due to frequent turnover of the population.

Going forward

Based on the lessons learned from investigating data reliability, DRCOG will incorporate reliability into future data decisions, using tract-scale data when possible and intentionally selecting variables that highlight larger portions of the population if sensible.

Denver skyline is viewed from Red Rocks at dawn.

The Denver Regional Council of Governments logo and address: 1001 17th St. Denver, CO 80202.