2017-2021 American Community Survey

Esri Methodology Statement, June 2023

Introduction

The 2017-2021 data from the American Community Survey (ACS) for the United States and Puerto Rico is available from Esri. Esri provides reports, data enrichment, and thematic mapping for ACS estimates in standard geographies, current ZIP Codes, and user-defined polygons. Reports include three summary profiles: Population, Housing, and Key Population and Housing Facts. Esri's reports and maps are designed to simplify the data and enhance its usability with reliability thresholds. Online help is also provided to explain the data.

The ACS is one of the most frequently used public datasets for research about the country’s population and housing characteristics. Since the survey’s introduction in 2005, it has become the standard publicly available data source for understanding neighborhood-level demographics. It is a continuous, nationwide survey that provides timely information for a variety of demographic and socioeconomic characteristics. Data from the ACS survey is published annually and covers a range of detailed topics, many of which were only available every 10 years from a larger household sample survey administered alongside complete count enumeration from decennial censuses prior to 2010.

If your analysis requires public data on income and poverty status, school enrollment, journey to work, household type and relationships, languages spoken, migration, citizenship, disability, health insurance, ancestry, military service, or housing characteristics, the American Community Survey is the source to use.[1]


What's new

While the 2017-2021 ACS data covers multiple decades, this data is summarized by the new 2020 Census geographic boundaries. Additional information on age and enrollment status by computer ownership/internet subscription status and poverty status of people in housing units has been added to Esri’s ACS release this year. Due to data collection challenges in 2020 because of the COVID-19 pandemic, the ACS was only able to collect approximately two-thirds of the data normally collected in a year. There was also significant nonresponse bias present in the data according to the United States Census Bureau. Users can expect to find inflated margins of error.[2]


The survey

While decennial census data is a complete enumeration of the population for a single point-in-time reference period (April 1 of a census year), the ACS is an ongoing, rolling survey with data published annually. The decennial census and ACS have different purposes, with the former largely intended to provide counts for congressional seat apportionment and redistricting, while the latter provides more current information on detailed subject matter. The continuous data collection of the ACS, along with annual data releases, provides a timelier and more comprehensive view of a changing demographic and socioeconomic landscape.

The continuous data collection of the ACS necessitates differences in variable definitions compared to the decennial census. For example, residency rules are different. The ACS defines a resident by a two-month rule in which a person is considered a resident if they are currently living at an address for more than two months. Alternatively, the decennial census defines a resident based on their “usual residence” on April 1. Hence, ACS period estimates may include seasonal populations in addition to year-round residents.

The ACS program collects data across the country on a nearly daily basis, surveying approximately 3.5 million addresses per year, with an address having an approximately 1-in-40 chance of being selected each year. Over the course of a five-year period, no address is surveyed more than once. Approximately 295,000 addresses across the country receive an initial survey each month, with addresses from every county included in the ACS sample each year.

Two separate samples are established, one for housing unit addresses and another for group quarters facilities. Data is collected through mail questionnaires[3], in-person visits, and internet response (internet response not available in Puerto Rico). The ACS response rate generally exceeds 90 percent, though it was lower in 2020 due to the pandemic.

Due to the relatively small yearly sample sizes, the ACS survey pools 60 months of data to produce suitable estimates for small areas. These are referred to as period estimates because they represent an interval of time, unlike the single-day reference period from the decennial census. Multiyear estimates released in consecutive years consist mostly of overlapping years and shared data, so data users must be cautious when making these types of temporal comparisons.

The American Community Survey uses official Census Bureau estimates from the Population Estimates Program (PEP) as survey controls to align ACS estimates for data that are also part of the PEP. Due to challenges that the Census Bureau has faced in recent years, the release of the intercensal estimates was delayed and the 2020 Census could not be fully incorporated into the vintage 2021 population estimates. This resulted in changes to controls that were used for ACS releases after the 2020 Census. The 2016-2020 ACS 5-year used the vintage 2020 evaluation postcensal estimates as controls which are based on the 2010 Census and did not incorporate the 2020 Census results.[4] 

The 2017-2021 ACS also had to use a different approach than what was used for the 2007-2011 ACS due to delays. The years 2020 and 2021 used the vintage 2021 blended base population estimates as controls. The vintage 2021 blended base population controls were established by incorporating Census 2020, 2020 Demographic Analysis estimates and vintage 2020 postcensal population estimates given the delay in the decennial data.[5] Moreover, since the 2010-2020 intercensal estimates were not available at the time, an alternative method also had to be established for the other years of the ACS five-year estimate (2017, 2018, and 2019). A revised version of the vintage 2020 population estimates that are consistent with vintage 2021 were used for these years instead.[6]


Margin of error

The margin of error (MOE) allows data users to measure the range of uncertainty around each estimate. This range can be calculated with 90 percent confidence by taking the estimate plus or minus the MOE. For example, if the ACS reports an estimate of 100 +/- 20, there is a 90 percent chance that the value for the total population falls between 80 and 120. The larger the MOE, the lower the precision of the estimate and the less confidence one should have that the estimate is close to the true population value.

The MOE measures the variability of an estimate due to sampling error. Sampling error occurs when only part of the population is surveyed to estimate the total population. There will always be differences between the sample and the total. Sampling error is directly related to sample size: the larger the sample size, the smaller the sampling error. Different areas are sampled at different rates to make the sample representative of the total population. Due to these complex sampling techniques, estimates in some areas have more sampling error than estimates in other areas. All MOEs are approximations of the true sampling error in an area and should not be considered exact. In addition, MOEs do not account for nonsampling error in the data and therefore should be thought of as a lower bound of the total error in a survey estimate.

The ACS reports MOEs with estimates for most standard census geographies. ACS estimates of total population and collapsed age, sex, and Hispanic origin estimates are controlled to annual estimates from the PEP for counties or groups of less populous counties. For the ACS in Puerto Rico, estimates are controlled by age and sex only.[7] Since these estimates are directly controlled to independent estimates, there is no sampling error, and MOEs are zero. However, controlling a period estimate to the average of five-point estimates imparts additional errors in the data that are not measured by MOEs.

In some areas, missing values are prevalent for medians and the aggregate estimates used to calculate averages. When estimates are zero, the Census Bureau models the MOE calculation by comparing ACS estimates to the most recent census counts and deriving average weights for states and the country.[8] At the state, county, tract, or block group level, state-specific MOEs for zero estimates will be the same regardless of the base of the table.


Geography

Most ACS geography corresponds to boundaries as of January 1, 2021. ACS geography is generally consistent with 2020 geography and the areas available with Esri’s 2023 Updated Demographics; however, there are differences. For example, the 2017-2021 ACS data is released for places as of January 1, 2021, while Esri’s 2023 Update Demographics data reflects the more recent TIGER 2022 boundaries. For congressional districts, the 2017-2021 ACS data was released for the 116 th  Congress, while Esri’s 2023 Update Demographics reflect the 118 th  Congressional Districts. To make ACS data comparable with Esri’s estimates, ACS data was converted into the boundaries used for Esri’s 2023 Update Demographics.

Additionally, Esri has made ACS data available for designated market areas (DMAs), ZIP Codes, and user-defined polygons. ACS data for ZIP Codes is not provided by the Census Bureau, but Esri has created ZIP Code data by aggregating the block-group-level ACS data using a block-to-block group apportionment methodology. ZIP Code boundaries are current as of Q3/2022, and the source is HERE. For the United States only, Esri produces ACS data for DMAs, representing the 2022–2023 definitions from Nielsen; this data is not provided by the Census Bureau either.


Esri and ACS

Clearly, ACS data is a valuable tool to understand social and economic characteristics. However, it is important to interpret the data appropriately. To help data users utilize the data fully, Esri provides reports, thematic mapping, and online help. All products include the display of MOEs for the estimates. The reports include the Population, Housing, and Key Population and Household Facts summary profiles for the United States and Puerto Rico. 

Esri's reports and maps are designed to simplify the data and enhance its usability, including the following:

■      Enhanced geographic coverage: user-defined polygons and ZIP Codes ■      Reliability thresholds to simplify interpretation of MOEs in summary profiles and mapping

Esri provides the ability to query ACS data for the most popular geographies—user-defined polygons and ZIP Codes. Since these areas are not available from the Census Bureau, there are no tabulated MOEs. Estimating data for these custom areas requires aggregation of ACS estimates and recalculation of MOEs. Esri has developed algorithms to calculate MOEs using guidelines from the Census Bureau. These algorithms account for full and partial areas within the custom area.

There are several considerations to note when viewing MOEs for custom areas. As the number of estimates involved in the sum of a derived estimate increases, the approximate MOE becomes increasingly different from the MOE that would be derived directly from ACS microdata. The direction of this difference (positive or negative) is based on the correlation and covariance of the estimates. In addition, MOEs are not scalable. MOEs at smaller geographic levels do not add up to MOEs at larger levels. Therefore, analyses should always make use of the largest standard geographic unit possible. For example, if your area of interest includes 90 percent of a county, the MOE for the total county will be more accurate than the MOE derived from county parts.


Medians and averages

A median represents the middle of a distribution. Many variables are reported as distributions with median values, such as contract rent, year householder moved in, or year structure built. The Census Bureau estimates medians from standard distributions that are not released to the public.[9] Therefore, the bureau's estimated medians will differ from medians that are calculated from the reported tables. For standard geographic areas, Esri displays the medians that are reported by the Census Bureau with its calculations of MOEs. Note that there are missing medians in the Census Bureau's tables, primarily for smaller areas such as tracts and block groups. It is possible to find a distribution reported for a given variable, even if the median is missing. If the median is not reported by the Census Bureau for a standard geographic area, Esri reports display N/A, or not available.

Medians are shown for nonstandard areas such as ZIP Codes and polygons, which are not available from the Census Bureau. For these areas, Esri calculates the medians from the reported distributions. However, MOEs are not available.

Averages are commonly calculated from the aggregate value of a variable, such as the sum of all contract rent paid or the total number of vehicles reported, divided by the total number of cases (for example, renter-occupied housing units or households). Aggregates may also be tabulated as missing by the Census Bureau, even if a distribution is reported for the area. This is particularly the case for smaller geographic areas and is due to suppression rules that the ACS uses.[10] If an aggregate value is missing, an Esri-calculated average cannot be determined and will be displayed as N/A whether for standard or nonstandard areas.


Summary profiles/mapping: Reliability of ACS data

The summary reports display MOEs for the estimates plus an additional column that Esri has included to help data users interpret the MOEs relative to the estimates. Decisions about the quality of an estimate based on the MOE alone can be difficult. A reliability symbol is displayed on the reports to give the user some perspective on the MOE. The symbol is based on an estimate's coefficient of variation (CV) and is meant to be used as a quick reference to gauge the usability of an ACS estimate.

The CV is a measure of relative error in the estimate. It measures the amount of sampling error in the estimate relative to the size of the estimate itself. A large amount of sampling error in a small estimate will generally discount the usefulness of the estimate; however, a small amount of sampling error in a large estimate shows that the estimate is reliable.

The reliability is based on thresholds that Esri has established based on the usability of the estimates. Users should be aware that these are generalized thresholds:

■  High Reliability: Small CVs (less than or equal to 12 percent) are flagged green to indicate that the sampling error is small relative to the estimate, and the estimate is reasonably reliable.[11]

 ■  Medium Reliability: Estimates with CVs greater than 12 and less than or equal to 40 are flagged yellow---use with caution.

■  Low Reliability: Large CVs (over 40 percent) are flagged red to indicate that the sampling error is large relative to the estimate. The estimate is considered unreliable. 

■  Some estimates do not indicate reliability. In these cases, either the estimate or MOE is missing, or the estimate is zero.

The amount of acceptable error in an estimate is subjective to the current analysis. Data users can compute a CV directly from the MOE; the CV is calculated as the ratio of the standard error to the estimate itself. To get the standard error, divide the MOE by 1.645 (for a 90 percent confidence interval). To calculate a CV, use the following equation:

The CV is commonly expressed as a percentage. For example, if you have an estimate of 80 +/- 20, the CV for the estimate is 15.2 percent. This estimate should be used with caution since the sampling error represents more than 15 percent of the estimate.


Summary

The American Community Survey is a product of its design. Data users will have to balance the benefits of timely data with the drawbacks of estimate quality. To do this effectively, data users will have to make use of tools to evaluate the quality of ACS data, such as MOEs, CVs, and tests for significant differences between samples.

In addition to statistical tools, data users can employ larger areas of analysis or collapse some of the distributions if the reliability of the estimates is a problem. When comparing areas, the Census Bureau recommends focusing on percentages of distributions rather than estimate values.

Changes to the sample size, time frame, data collection, and survey methodology make ACS data something completely different from the sample data previously collected from the decennial census. When the Census Bureau reports sampling error with the survey estimates, it's time to pay attention to the differences.


Glossary

ACS estimates incorporate new definitions that emphasize the importance of the statistical tools that are unique to survey estimates---and key to effective use of the data.

Coefficient of variation (CV)---The CV measures the amount of sampling error relative to the size of the estimate, expressed as a percentage. A large amount of sampling error in a small estimate will generally discount the usefulness of the estimate; however, a small amount of sampling error in a large estimate shows that the estimate is reliable.

Confidence interval---The confidence interval is another way to measure the uncertainty of an estimate. The upper bound is the estimate plus the margin of error; the lower bound is the estimate minus the margin of error. (If the lower bound is negative, zero is assumed for the lower bound.) Confidence intervals for ACS estimates represent a 90 percent certainty that the interval around the estimate includes the true population value.

Margin of error (MOE)---The MOE is a measure of the variability of the estimate due to sampling error. MOEs allow the data user to measure the range of uncertainty for each estimate with 90 percent confidence. The range of uncertainty is called the confidence interval, and it is calculated by the estimate plus or minus the MOE. For example, if the ACS reports an estimate of 100 with an MOE of +/- 20, you can be 90 percent certain the value for the estimate falls between 80 and 120.

Nonsampling error---All other survey errors that are not sampling errors are collectively classified as nonsampling error. This type of error includes errors from interviewers, respondents, coverage, nonresponse, imputation, and processing. Nonsampling error also includes unchecked methodological errors from controlling ACS estimates to independent population estimates.

Period estimates---These are estimates based on data collected over a period of time. ACS five-year data is collected monthly over 60 months and is sometimes referred to as a rolling survey.

Point estimates---Point estimates are based on data collected at a single point in time. The decennial census refers to April 1 and captures a snapshot of the population at that time.

Reliability---These symbols represent threshold values that Esri has established from the coefficients of variation to designate the usability of the estimates:

Residence rules---These rules are used to establish a primary residence to reduce duplication. The ACS defines a resident by a two-month rule. The census rule is "usual place of residence" or wherever a person spends most of the year. ACS data may include seasonal populations in addition to year-round residents.

Sampling error---Errors that occur from making inferences about the whole population from only a sample of the population are collectively referred to as sampling error. Sampling error measures the variability within each sample as well as the variability between all possible samples. All survey data has sampling error.

Statistical significance---Tests for statistical significance are used to determine whether the difference between two survey estimates is real or likely due to sampling error alone. Statistical significance is shown at the 90 percent confidence level. Therefore, if estimate differences are statistically significant, there is less than a 10 percent chance that the difference is due to sampling error.


End notes

[1] General information about the American Community Surveys is summarized here. For more detailed information about the survey, refer to the  ACS handbooks .

[8] U.S. Census Bureau, "Variance Estimation," Design and Methodology American Community Survey (Washington, DC: U.S. Government Printing Office, 2010), 12-4–12-5.

[9] For more information on the standard distributions, see the  Census Bureau's documentation , Appendix A.

[11] National Research Council, Using the American Community Survey: Benefits and Challenges (Washington, DC: The National Academies Press, 2007).

[12] National Research Council, Using the American Community Survey: Benefits and Challenges (Washington, DC: The National Academies Press, 2007).


Data resources

Learn more about  Esri's ACS data  or contact sales at 1-800-447-9778.


Esri, the global leader in geographic information system (GIS) software, offers the most powerful mapping and spatial analytics technology available.

Since 1969, Esri has helped customers unlock the full potential of data to improve operational and business results. Today, Esri software is deployed in more than 350,000 organizations including the world's largest cities, most national governments, 75 percent of Fortune 500 companies, and more than 7,000 colleges and universities. Esri engineers the most advanced solutions for digital transformation, the Internet of Things (loT), and location analytics to inform the most authoritative maps in the world. Visit us at  esri.com 


Contact Esri

380 New York Street Redlands, California 92373-8100 USA

1 800 447 9778 | T 909 793 2853 | F 909 793 5923

About Esri's Data Development Team

Led by chief demographer Kyle R. Cassal, Esri's Data Development Team has more than 40 years of experience in market intelligence. The team's economists, statisticians, demographers, geographers, and analysts produce independent small-area demographic and socioeconomic estimates and forecasts for the United States. The team develops exclusive demographic models and methodologies to create market-proven datasets, many of which are now industry benchmarks, such as Tapestry™ Segmentation, Consumer Spending, Market Potential, and annual Updated Demographics. Esri ®  demographics power ArcGIS ®  through dynamic web maps, data enrichment, reports, and infographics.

The information contained in this document is the exclusive property of Esri. This work is protected under United States copyright law and other international copyright treaties and conventions. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or by any information storage or retrieval system, except as expressly permitted in writing by Esri. All requests should be sent to Attention: Contracts and Legal Services Manager, Esri, 380 New York Street, Redlands, CA 92373-8100 USA.

The information contained in this document is subject to change without notice.

Esri, the Esri globe logo, The Science of Where, Tapestry, ArcGIS, esri.com, and @esri.com are trademarks, service marks, or registered marks of Esri in the United States, the European Community, or certain other jurisdictions. Other companies and products or services mentioned herein may be trademarks, service marks, or registered marks of their respective mark owners.

All rights reserved

Copyright © 2023 Esri

Printed in the United States of America