Analyzing Chicagoland COVID-19 Mortality:
Data Limitations and Solutions Project (07.31.2020 data)
Introduction
The progression of the COVID-19 public health crisis prompted the UIC-SPH-PHGIS team to introduce an ArcGIS “story map” where updated information and detailed visualizations of the phenomenon are presented. One objective of this project is to identify the implications of the magnitude of Long-Term Care Facilities (LTCF) losses for methods of analyzing and visualizing the pandemic data in the Chicago area. After identifying the limitations, practical solutions are proposed for deriving reliable data-driven decision support information. Identifying and overcoming these limitations is critical for public health policymakers and government officials, who require practical and reliable information to implement mitigation measures (e.g., expected hospital bed utilization per region, disease clusters, etc.) and allocate resources to serve the impacted public. Location becomes critical for many mitigation measures, and, as we will see, by not accounting for LTCF, the identification of high-mortality clusters is distorted.(1)
The results from this project and the ensuing publications support the recommendation that public health agencies report health outcomes by accounting for LTCF-related mortality. These findings are valid for the Chicago area; however, given that high LTCF-related mortality is widespread on a global scale, these recommendations and findings likely have a broad application as well.
Data Limitations
A major finding of this project is the need for a data-quality/limitations section as an essential part of all COVID studies. The inherent limitation for analyzing COVID-19 mortality data is time and space. Depending on the study area, spatial analysis to detect interactions is not reliable if the pandemic phenomenon is not yet fully established. The majority of COVID-19 online portals from public health agencies provide overall population (OP) information and do not separate directly the COVID-19 health indicators in terms of the population living in a residential community setting from the vulnerable residents in LTCF. Consequently, the majority of published work is likely to be based on the overall population morbidity and/or mortality records without considering the spatial stability of the phenomenon, which was critical during the first few months (see Figure 1). To flag the data limitations, public health agencies disseminating LTCF-related data tag this data with disclaimers such as “Information provided below is provisional, subject to change, and updated weekly. Facilities report data to their local health departments, which in-turn report to IDPH, so lag time and discrepancies are to be expected.”(6)
The principal data source for this project is the Medical Examiner (ME) Case Archive of COVID-19-related Deaths.(2) This archive is organized in a searchable online database format and contains information about “deaths that occurred in Cook County that were under the ME’s jurisdiction.” Some of the records in this database were not from Cook County and they have been removed. The ME database (ME-DB) has no specific identifier for LTCF-related deaths, and this information was derived using data mining techniques. The overall quality of the ME-DB was assessed by comparisons with similar databases.(3) In addition, in deriving the ME-DB, a comprehensive data-quality process was implemented which achieved an address/coordinates matching rate close to 99%. This metric embodies the necessary data quality to ensure an accurate spatial analysis of the pandemic; however, in the majority of published studies it is not reported. For the objectives of this project, the derived ME-DB was found to be superior to the available alternatives.(3,4)
Figure 1. Five-day interval evolution of pandemic in Cook County depicted by the ratio of new mortalities to the ZIP codes registering losses.
To assess the spatial evolution of the pandemic and to identify a stabilization threshold, the ratio of new mortalities to the ZIP codes registering losses was used at a five-day interval scale for the overall population (OP). As seen in Figure 1, this metric based on the overall population mortality (red) enters a plateau phase, with ZIP codes stabilizing by the end of June at a 150 level and mortality at 5 losses per day. OP mortality was selected to demonstrate the progression of the phenomenon, since this outcome was commonly used in studies even before its peak.
1.0 IMPLICATIONS ON MORTALITY BASED ON RACE
By accounting for the two distinct populations -- those living in LTCF and the household population (HP) -- we can see that the COVID-19 mortality based on race in the Chicago area differs from the commonly reported overall population mortality.
As seen in Table 1, the Black household residents have the highest number of COVID-19-related mortalities, not those in LTCF. This tabulation reverses the commonly reported ranking based on the overall population. This is significant, since approximately 30% of Chicago residents are Black. This finding is an example of what is known as Simpson’s paradox (5), which occurs when a seeming trend in the data reverses at a group level. For the White population, the majority of losses are in LTCF settings. The Latinx HP had nearly the same percent mortality as the White HP; however, at the OP level, the Latinx mortality rate is 20% less than that of the White population, due to the substantially lower number of Latinx COVID-19 deaths in LTCF.
2.0 ANOTHER DIMENSION OF THE CRISIS
Analyzing the COVID-19 fatalities among household residents, as opposed to the overall population, by race/ethnicity is likely to add another dimension to the health inequity crisis occurring in major metropolitan centers like Chicago. Loss of life per ZIP code is a highly skewed distribution, with the majority of ZIP codes registering few fatalities.(1)
To identify patterns and summarize outcome characteristics, the 10 ZIP codes with the highest race-specific mortality rates were selected for comparison (see Table 2). They include ZIP codes with race-specific mortality above, approximately, the 94th percentile for each category. In this table, the highest death toll for the White HP is less than the 10th lowest of the Black HP. The averages in Table 2 underline the health outcome disparities between races. In addition, this HP-based analysis revealed that the highest death toll for all races occurs in a predominantly Black/Latinx area encompassing the South Lawndale and Little Village communities (i.e., 60623 ZIP code).
Comparison of the ZIP code rankings between the overall population and the race-specific columns demonstrates a total discordance for all races. Exemplifying this discordance:
- Two of the top ZIP codes in the overall column are not even listed in any of the HP race-specific rankings (60714 and 60626);
- Nne of the highest-mortality overall population ZIP codes (60649; 95% Black), with 143 fatalities, is only listed for the Black HP with 51 fatalities.
These findings demonstrate the importance of using LTCF-related mortality data in conducting race-specific studies, and the inherent tendency to overall population records to disguise potentially critical findings. In addition, the findings reveal the distinctiveness of the two populations at the ZIP code level, which is further confirmed in the visualization section of this project.
Table 3 starkly shows the alarming reality for the older LTCF population in the Chicago area. At a comparison level, relying on the overall population numbers conceals that for the White and Black population, another inequitable public health crisis lies within the LTCF.(1) It is worth noting that the majority of LTCF-related COVID-19 deaths are not found within the top HP ZIP codes. The picture becomes much more alarming if the population living in “group quarters” is used as a denominator for the LTCF-related mortality rate. LTCF residents belong to this group designation, and the US Census Bureau enumerates them.(6) With this amendment, the results are presented in the PM.GQ column, where PM.GQ is the mortality rate as a percent of the people living in those group quarters. Given that the overall mortality rate for the City of Chicago is approximately 0.1%, the LTCF rates in Table 3 reveal the alarming level of this disparity.
3.0 SOCIOECONOMIC CONDITIONS AND HEALTH OUTCOMES
The magnitude of the toll on LTCF residents is likely to distort the association between the socioeconomic characteristics of communities and disease outcomes.(1) This finding has significant implications for the methodology used to develop social vulnerability models for the pandemic, since the core of the methodology relies on a preselected number of variables that are assumed to define the vulnerability construct. This topic is examined in a forthcoming publication.(4)
4.0 VISUALIZING THE IMPLICATIONS ON THE SPATIAL DISTRIBUTION OF MORTALITY
As we see, residency status is likely to have an impact on the spatial distribution of COVID-19-related mortality within the study area. From a geographic point of view, the LTCF and HP groups are distinct. LTCF mortality is reported as a point variable (with coordinates), whereas the HP mortality is defined for practical reasons within a polygon (per ZIP code, per census tract, etc.). Attempts to visualize a causal relationship by overlaying point and polygon layers is a common practice, widely utilized in many recent publications. Such maps often overlay mortality as points on layers depicting the socioeconomic status of geographic areas. Conceptually this is a valid approach; however, it presupposes that each point is a single value. But disregarding the differences between the two distinct populations leads to the visualizations of mortality or infection giving a distorted view, since many of the points are, in reality, mortality “clusters” representing LTCF with multiple deaths. These mortality clusters are displayed on maps as single points, signifying an individual loss. In reality, the LTCF points comprise clusters often with more than 40 deaths per LTCF.(6) Without accounting for LTCF-related mortality, spatial correlation measures such as Moran’s I may yield questionable results.(1,3,7) For such an estimate, the use of the overall mean and sample size for significance testing is required, both of which are distorted if OP records are used. In addition, distance weights matrix based on the centroid of each polygon is problematic, due to the LTCF clusters (i.e., invalid random spatial distribution of mortality assumption).(3)
4.1 Visualization of differences for overall population
Figure 2. Comparison of the spatial distribution of mortality in long term care facilities (LTCF) and household population (HP).
To visualize the spatial distribution differences, we use the percent of deaths by ZIP code, for the LTCF and HP groups. If these two groups were not geographically distinct, then the outcome would have been randomly distributed. As seen in Figure 2, the relatively high-mortality ZIP codes for the HP group are in the majority of cases in different Cook County location from those for the LTCF group.
Figure 3. Difference of Long-Term Care Facility (LTCF) and Household Population (HP)-Related Mortalities per ZIP Code as a Function of the Number of HP Mortalities (as of July 31, 2020). Shape and color of points indicate total population.
Figure 3 displays these differences in distribution. If the difference between LTCF and HP mortality were randomly distributed across ZIP codes, we would expect the difference values to cluster around zero along the horizontal axis. Instead, Figure 3 demonstrates that ZIP codes with a high HP mortality are likely to have a low level of LTCF mortality (i.e., difference is negative and high). The size and color of each ZIP code difference is a function of the total population, demonstrating that high HP mortality is likely to occur in the relatively high-population ZIP codes. Based on this analysis, we can state that high LTCF mortality is likely to occur in ZIP codes with relatively low population (less than 80,000).
4.2 Visualization of impact on races
An additional contrast can be seen when comparing mortality by races.
Figure 4. Comparison of mortality clusters (hot spots) for the overall White population and the residential White population by ZIP code.
Figure 4 contrasts the clustering (or "hot spots") of mortality among the White OP (left) with that of White HP. This spatial difference shows the importance of the locations of LTCF mortality in the White population whose distribution more closely matches the figure on the left. (Clusters were determined using the Getis-Ord Gi* spatial autocorrelation statistical method in the GeoDa software.)(8) This completely different spatial pattern of mortality clusters is presumed to be due to the significant White LTCF losses (24.9% versus 15.0% for the residential White population, see Table 1).
Figure 5. Comparison of mortality clusters (hot spots) for the overall Black population and the residential Black population by ZIP code.
For the Black population, these clusters are distorted to a lesser degree, as seen in Figure 5. A more thorough analysis of these patterns is presented in a forthcoming article.(3)
From Table 1 and this section, it becomes clear that spatial analysis of race-related mortality data cannot be based on the overall population mortality numbers. From a practical point of view, use of overall population mortality data can create serious areal confinement errors if OP-based spatial clusters are used.
More sections related to visualization will be added.
REFERENCES
1. Blaser M., Cailas M.D., Canar J., Cooper B., Geraci P., Osiecki K., Sambanis A. Analyzing COVID-19 Mortality Within the Chicagoland Area: Data Limitations and Solutions. Research Brief No. 117. Policy, Practice and Prevention Research Center, University of Illinois Chicago. Chicago, IL. July 2020. https://p3rc.uic.edu/wp-content/uploads/sites/561/2020/08/Analyzing_COVID-19_Methods_508-1.pdf .
2. Medical Examiner Case Archive - COVID-19 Related Deaths. https://datacatalog.cookcountyil.gov/Public-Safety/Medical-Examiner-CaseArchive-COVID-19-Related-Dea/3trz-enys . Accessed 20 July 2020.
3. Canar J., Osiecki K., Sambanis A., Cooper B., Roth C., Blaser M., Cailas M.D. A comprehensive analytic framework for COVID-19 mortality applicable to major metropolitan centers. BMC Public Health, Springer Nature, submitted.
4. Sambanis A., Canar J., Blaser M., Cooper B., Osiecki K., Cailas M.D. A methodology for COVID-19 vulnerability: advantages and applications. Journal of Emergency Management: Special Issue on COVID-19 and the 2020 Pandemic, in preparation.
5. Hernan M, Clayton D, Keiding N. The Simpson’s paradox unraveled. International Journal of Epidemiology. 2011; DOI:10.1093/ije/dyr041.
6. Long-Term Care Facility Outbreaks COVID-19. http://dph.illinois.gov/covid19/long-term-care-facility-outbreaks-covid-19 . Accessed 20 July 2020.
7. Goovaerts P. Geostatistics for Natural Resources Evaluation. Applied Geostatistics Series, Oxford University Press, New York; 1997.
8. Anselin L, Syabri I, Kho Y. GeoDa: An Introduction to Spatial Data Analysis. Geographical Analysis 38 (1), 5-22; 2006.
POINT OF CONTACT: Dr. Michael D. Cailas (mihalis@uic.edu)
For publication reasons, this Story Map is "static", i.e., only updates based on data through 07.31.2020 will be added here.
© October 2020 John Canar and Michael D. Cailas.
This is an open-access document distributed under the terms of the Creative Commons Attribution NonCommercial-NoDerivatives 3.0 Unported International License (CC BY-NC-ND 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source (UIC-SPH-PHGIS program) are notified and credited. See http://creativecommons.org/licenses/by-nc-nd/3.0/ .
Suggested citation:
Blaser M., Cailas M.D., Canar J., Cooper B., Geraci P., Osiecki K., and Sambanis A. Analyzing Chicagoland COVID-19 Mortality: Data Limitations and Solutions Project. UIC-SPH-PHGIS program story map. Original publication date: 20 July 2020.
The order of the authors is alphabetical.