Assessing the Area Deprivation Index

UW Medicine - Dr. Sophia Hayes

Background

Chronic Obstructive Pulmonary Disease (COPD) is a leading cause of morbidity and mortality worldwide, characterized by persistent respiratory symptoms and airflow limitation. This chronic condition places a significant burden on both individuals and healthcare systems, driving up healthcare costs and utilization rates. Disparities in COPD prevalence, diagnosis, access to treatment, and health outcomes are evident across various demographic and socioeconomic groups, underscoring the need for a deeper understanding of the social determinants of health.

The  Area Deprivation Index  (ADI) is a critical tool in public health research that quantifies neighborhood-level socioeconomic disadvantage through indicators such as income, education, employment, and housing quality. The ADI provides valuable insights into the socioeconomic factors that influence health outcomes, including those related to COPD. However, the ADI’s reliability and validity, particularly in the context of COPD research, have come under scrutiny. This project aims to address these concerns and evaluate the utility of the ADI in understanding COPD disparities.

Our project has three primary objectives: 

  • Assess the reliability and validity of the ADI’s composite variables.
  • Explore geographic variations in neighborhood deprivation using the ADI.
  • Investigate urban-rural differences in COPD outcomes.

The objective of this project is to assess the reliability and collinearity of the 17 variables comprising the ADI. The scope of our work extends to reviewing the ADI methodology, evaluating its validity and reliability, pinpointing collinear measures, and providing a recommendation of variables to exclude from a new index structure.

The 17 variables that comprise the ADI are:

  1. Percentage of population aged ≥ 25 years w/ <9 y of education
  2. Percentage of population aged ≥ 25 years w/ at least a high school diploma
  3. Percentage of employed persons ≥ 16 years in white-collar occupations
  4. Median family income
  5. Income disparity
  6. Median home value
  7. Median gross rent
  8. Median monthly mortgage
  9. Percent of owner-occupied housing units (homeownership rate)
  10. Percentage of civilian labor force population aged ≥ 16 years unemployed (unemployment rate)
  11. Percentage of families below the poverty level
  12. Percentage of single-parent households with children aged < 18 years
  13. Percentage of occupied housing units without a motor vehicle
  14. Percentage of occupied housing units without a telephone
  15. Percentage of occupied housing units without complete plumbing (log)
  16. Percentage of occupied housing units with > 1 person per room (crowding)

The 17th variable, the Percentage of the population below 150% of the poverty threshold was excluded from our analysis due to a lack of accessible data.

Coefficient of Variation

Statistical Analysis

To assess reliability, we chose the Coefficient of Variation (CV), defined as the ratio of the standard deviation to the mean. This measure is particularly appropriate for ACS data, as each ACS estimate comes with an associated margin of error. The CV provides a normalized measure of variability, making it easier to compare the reliability of different variables.

CV = Margin of Error / 1.645 / Estimate

Considerations

  • Block groups with a CV Under 12 - High Reliability
  • Block groups with a CV between 12 and 40 - Moderate Reliability
  • Block groups CV over 40 - Low Reliability
  • Block Groups missing data are highly unreliable tracts and often have CVs that trend towards positive infinity - Very Low Reliability

Establishing Collinearity

Collinearity refers to the phenomenon where two or more predictor variables in a multiple regression model are highly correlated, meaning they contain similar information about the variance in the dependent variable. High collinearity can inflate the variance of coefficient estimates and make the model's predictions unreliable.

To establish collinearity among our variables, we chose two measurements:

Collinearity matrix

Purpose of the Collinearity Matrix

The collinearity matrix is a critical tool used to evaluate the interrelationships among the 16 variables that we are analyzing. By examining these relationships, we aim to identify redundant variables that may introduce bias and affect the reliability of the ADI in measuring neighborhood-level socio-economic disadvantage.

Values closer to 1 or -1 reveal a higher level of collinearity between 2 variables, indicated by dark blue or dark red on the matrices below. Values closer to 1 indicate a positive correlation, meaning that as one variable increases, others also tend to increase. Values closer to -1 indicate a negative correlation, implying that as one variable decreases, others also decrease.

The presence of variables that are very highly collinear impacts the relationships between other variables included in the matrix. As we remove variables, it allows for the matrix to reveal new relationships. The removal process can be seen below.

The Variance Inflation Factor

For evaluating collinearity, we selected the Variance Inflation Factor (VIF). The VIF measures the degree of multicollinearity among independent variables in a linear regression model. A high VIF indicates that a variable is highly collinear with other variables in the set, which can distort the results of regression analyses. Identifying heavily collinear variables is crucial, as these can disproportionately influence the regression, leading to misleading conclusions.

Increased values display higher levels of collinearity among all other variables.

All Variables Included

Values from 0-5 have low correlation.

Values from 5-10 have a moderate correlation.

Values above 10 indicate a high correlation with other variables.

VIF values are influenced by outliers.

To investigate deeper, we removed High School Grads and Median Monthly Mortgage Cost to rerun our calculations.

Removal of High School Grad

Now a new outlier has emerged, Median Family Income.

One layer deeper, now that median family income was removed there are three new variables above 10 VIF.

These are % of White Collar Individuals, Income Disparity, and Median Gross Rent.

The remaining ten variables have low collinearity values.

In other words, these ten variables are the least correlated to each other.

Remaining 10 Variables

Lack of collinearity should not be misconstrued as variable validity.

In practice, perhaps the highly collinear variables were correlated because they tracked socioeconomic patterns, which is the intended goal of the ADI.

RUCA Code Analysis

Dr. Hayes underscored the significance of quantifying the disparities between rural and urban census block groups. To explore potential urban-rural disparities in the reliability and applicability of the ADI, we conducted an in-depth analysis utilizing Rural-Urban Commuting Area (RUCA) codes. These codes categorize census tracts based on population density, urbanization levels, and commuting patterns, offering a nuanced perspective for examining geographic variations. The 10 codes are as follows:

1 .   Metropolitan area core: primary flow within an urbanized area (UA) 2 .   Metropolitan area high commuting: primary flow 30% or more to a UA 3.   Metropolitan area low commuting: primary flow 10% to 30% to a UA 4 . Micropolitan area core: primary flow within an Urban Cluster of 10,000 to 49,999 (large UC) 5.   Micropolitan high commuting: primary flow 30% or more to a large UC 6.   Micropolitan low commuting: primary flow 10% to 30% to a large UC 7.    Small town core: primary flow within an Urban Cluster of 2,500 to 9,999 (small UC) 8. Small town high commuting: primary flow 30% or more to a small UC 9 .   Small town low commuting: primary flow 10% to 30% to a small UC 10.  Rural areas: primary flow to a tract outside a UA or UC We grouped the RUCA codes into four categories: 1. Metropolitan Areas (Codes 1-3): Representing urban cores and areas with high and low commuting levels to urbanized areas. 2. Micropolitan Areas (Codes 4-6): Encompassing core areas and high/low commuting regions within urban clusters of 10,000 to 49,999 residents. 3. Small Towns (Codes 7-9): Capturing core areas and high/low commuting zones within urban clusters of 2,500 to 9,999 residents. 4. Rural Areas (Code 10): Encompassing regions with primary commuting flows outside urban areas or urban clusters. Variability in Reliability Measures: Our analysis of the reliability of ADI variables, measured by the Coefficient of Variation (CV), revealed considerable variability across RUCA code groups. To assess reliability, we binned each census tract within our thresholds and then calculated the counts for each reliability category. These counts were then turned into ratios to better understand the distribution of reliability across different geographic settings. However, no consistent pattern emerged suggesting a clear urban-rural divide in reliability. Key Findings: Unemployment Rate: Challenges in estimating unemployment rates were observed, particularly in small towns, where the median CV was the highest at 79%. This suggests potential difficulties in precisely estimating unemployment in these areas. Housing Conditions Variables: Variables related to housing conditions, such as the percentage of occupied housing units without complete plumbing, exhibited exceptionally high CV values across all RUCA groups, indicating low reliability. This highlights the complexity of urban-rural disparities in the ADI's applicability, with housing-related variables showing limited reliability regardless of geographic classification. Income-Related Variables: Variables related to income, such as median rent and mortgage, generally showed moderate reliability. However, rural and micropolitan areas exhibited slightly higher CVs for mortgage-related variables, reflecting potential differences in housing markets across urban-rural gradients. Transportation Needs: The percentage of households without a car exhibited acceptable reliability across most groups, with slightly higher CVs observed in rural areas. This may reflect transportation needs in less dense areas, contributing to variability in this indicator. While some variables demonstrated reliable estimates across geographies, others displayed considerable variability and limited reliability, particularly those related to housing conditions. Overall, the reliability of ADI component variables varies significantly across different geographic settings, with urban areas generally exhibiting higher reliability for certain variables like homeownership and mortgage data, while rural areas show higher reliability for variables like car ownership and educational attainment. Understanding these reliability patterns is crucial for accurately measuring and addressing urban-rural disparities in social determinants of health.

Notably, the urban-rural divide did not consistently predict variability, suggesting a complex relationship influenced by socioeconomic factors, housing conditions, and local demographics. These findings underscore the need for nuanced interpretations and targeted efforts to enhance the ADI's sensitivity to local socioeconomic conditions and housing characteristics across diverse geographic areas.

Median CV Across ADI Variables

ADI Reliability GREENRED

This map shows the median coefficient of variation for each block group across the 16 variables we analyzed. This further illustrates how the urban-rural divide has little observable or consistent impact on the reliability of variables on the block group level. This map helps to visualize the tabular data above in our analysis of the relationship between RUCA Codes and reliability.

Recommendations

Our evaluation of the ADI revealed variations in the reliability of its 17 composite variables. Several variables, like the percentage of households without a telephone, exhibited low reliability due to technological changes, indicating a need to update the index.

We recommend removing the following 5 variables to enhance the ADI's reliability and relevance:

  • Percentage of adults with <9 years of education
    • This variable may not accurately capture educational attainment levels, as it only considers those with less than 9 years of education, overlooking higher levels of education.
  • Percentage of households without a telephone
    • With the widespread adoption of mobile phones and advanced communication technologies, this variable has become increasingly obsolete and unreliable.
  • Percentage of households without complete plumbing
    • While this variable may have been relevant in the past, it is now less representative of housing quality in most areas, especially urban regions, and may skew the index's overall reliability.
  • Percentage of crowded households
    • This variable's reliability may be affected by cultural differences in household size preferences and living arrangements, potentially introducing bias in certain contexts.
  • Median monthly mortgage
    • Our analysis revealed a high degree of collinearity between this variable and median home value, suggesting redundancy and potential inflation of variance in the index.

To address geographic variations and urban-rural disparities, we recommend:

  • Conducting in-depth analyses of the ADI's performance in specific geographic regions and communities, considering local socioeconomic contexts and housing conditions.
  • Incorporating alternative housing quality indicators that may be more relevant and reliable in certain geographic contexts, such as access to basic amenities, housing affordability, or infrastructure quality.
  • Exploring the inclusion of additional socioeconomic variables that capture unique aspects of rural or urban deprivation, such as access to healthcare, transportation, or employment opportunities.
  • Developing region-specific weighting strategies or tailored ADI versions that account for the varying importance of certain variables in different geographic contexts.
  • Collaborating with local stakeholders, community organizations, and policymakers to gather insights on the most relevant and culturally appropriate indicators of deprivation in their respective areas.

Additionally, our work highlights the need for greater transparency and clarity regarding the ADI's composition, calculations, and intended applications. Many sources of ADI data lack sufficient explanations and descriptions, which may contribute to potential misinterpretations and misapplications by users who may not fully comprehend its nuances. The ADI was originally designed to study 1969-1998 mortality gradients, not broadly applied to current healthcare research. Its 1990 Census-based variables may now be outdated.

These findings underscore the importance of critically evaluating composite indices like the ADI and ensuring their continued relevance, reliability, and applicability in various research contexts. Periodic updates, the incorporation of new variables, the removal of outdated or unreliable measures, and improved documentation and transparency are crucial steps to enhance the utility and validity of such indices in understanding and addressing health disparities.

All Variables Included

Removal of High School Grad

Remaining 10 Variables