
When big data falls short
The Case of the Economic Tracker Employment Data
When Big Data Falls Short
What follows is a report of our anaylsis of Economic Tracker employment series first published in early April 2021.
Please visit our When Big Data Falls Short Updates page to follow our ongoing review of the Opportunity Insights Economic Tracker.
Since the beginning of the COVID-19 recession, the Opportunity Insights Economic Tracker has been a go-to data source for policymakers. Its creators claim that their novel Economic Tracker (ET) employment series, which is constructed from proprietary data, closely follows existing government statistics. Yet in many cases this is simply not the case. We have found a concerning number of large deviations from comparable government-produced data in both the current and prior vintages of the ET employment series. Additionally, large revisions and methodological changes to the ET employment data over the past eight months raise important questions about its suitability for policy design and assessment. These revisions could go undetected by outside researchers and policymakers as there been no public communication from the Opportunity Insights team.
The Economic Tracker data portal has shaped policy responses at the state, local, and federal level. According to a National Public Radio article , the Opportunity Insights team “has briefed a range of lawmakers in both political parties” as well as the Senate Finance Committee and (then-candidates) Joe Biden and Kamala Harris. An accompanying paper on the ET website reports that policymakers in Maine, Kansas, Missouri, and Texas have also incorporated ET data into their responses to the ongoing crisis. More recently, an analysis of the $600 stimulus payments sent out in January 2021 using ET data argued for limiting subsequent stimulus payments to low-income households.
What explains the appeal of the ET data? Developed under the guidance of Harvard economist and Opportunity Insights Director Raj Chetty, the overall menu of ET data series are available at a much higher frequency than comparable government statistics. With access to a steady stream of data from several large businesses, it takes only a few days for the Opportunity Insights team to process and publish the ET data series to their website. Comparable government statistics on employment, such as those published by the Bureau of Labor Statistics (BLS), collect data through a well-established surveying procedure. This takes time. After collecting and processing survey responses, the BLS generates monthly employment level estimates which are released with a month lag. A quicker turnaround time combined with access to high frequency private sector data means that the ET series are - potentially at least - available much closer to real-time than most comparable government statistics.
If the ET provides the same insights as BLS figures but quicker, it could have an edge over government statistics for policy purposes in the sense that policymakers would more rapidly detect the economic changes to which they are suposed to respond. Indeed, Chetty et. al. believe that their "data permit rapid policy evaluation - often within three weeks of implementation - opening a path to fine-tuning policy responses in an evidence-based manner” (Chetty et. al. 2020b: 41). This is arguable, but the relevance of the frequency of the ET data for policy depends on how well the ET data track existing government statistics. Whether daily series are suitable for analysis of labor market policies is left aside here. As researchers focused on labor markets, we have been following the evolution of the ET employment series since summer 2020. Although Chetty et. al. stress that the ET employment series “provides a good representation of employment rates across sectors, wage groups, and geographic areas” (2020b: 25), we are less convinced.
An important aspect of the ET project is to provide novel public data for studying the ongoing crisis “in a timely, publicly verifiable manner” (Chetty et. al. 2020a: 7). To this end, Chetty and the Opportunity Insights team have made the ET data series publicly available and it is in this spirit that we have compiled this page. We have saved each vintage of the ET data since September 2020 , and present a few of our findings and concerns below (detailed paper to follow). The series have experienced significant revisions that would go undetected since they are not publically communicated or explained by Chetty and the Economic Tracker team.
Discrepancies
The current ET employment series contain a number of discrepancies with comparable Current Employment Statistics (CES) data published by the BLS. Figure 1 below compares the percent change in Wyoming’s employment in the ET and CES data respectively. According to the latest ET vintage, Wyoming’s private sector employment as of February 12, 2021 is twelve percent higher than in January 2020. The ET series also reports that private sector employment in Wyoming had fully rebounded by summer 2020 and only continued to grow since. Yet this growth is completely absent from the CES figures. Instead, the latest CES data indicates that Wyoming’s private sector employment in February 2021 remains below its January 2020 level. In fact, the CES figures suggest that the number of workers employed in Wyoming’s private sector is currently at levels not seen since 2006 (excluding spring 2020).
Kentucky’s employment patterns diverged sharply across the ET and CES data at the beginning of 2021. As Figure 2 shows, private sector employment in the ET series shows a complete recovery and as of February 2021 was up nearly four percent over January 2020 levels. CES data , however, suggests that Kentucky’s private sector employment remains depressed relative to the pre-pandemic period.
Similar discrepancies can be found amongst states with larger populations. Figures 3 and 4 below contrast the ET employment series with CES data for Florida and Washington.
In the ET data, both states appeared to have essentially climbed back to their pre-pandemic employment levels by February 12, 2021. Once again, this recovery is absent from comparable CES data.
The Economic Tracker Employment Series
The Opportunity Insights team use data from three companies — Paychex, Earnin, and Intuit — to construct the ET employment series. Paychex and Intuit are payroll processing firms with customer bases made up of many small- and medium-sized businesses. Earnin is a cash-advance mobile phone application used primarily by workers receiving very low wages. Additional data from Kronos, a workforce management service provider, is used to generate ET series forecasts but not in the construction of the ET series itself. Due to the nature of these data sources, the ET employment series is designed to track private sector non-seasonally adjusted employment. In order to protect confidential business information, the ET series reports changes in employment relative to January 2020 rather than employment levels directly. For more details, see Appendix D of the ET technical documentation .
Discrepancies Across Economic Tracker Vintages
The anomaly that first caught our attention was when the ET series reported that private-sector employment in several states was higher at the end of July 2020 than just six months earlier, before the onset of the COVID-19 recession.
While revisions are a normal part of any work of this sort, there have been several overhauls of the entire series over the past month which are concerning from a policy-making perspective. Some of these revisions came about because of methodological changes in the treatment of outliers in the raw data while others are less easily explained.
The first major revision occurred in the ET data released on October 10, 2020. In this vintage, most state-level ET data were revised downwards (see Figure 5). The next major revision happened more recently. The data released on March 27, 2021 resulted in many state-level series being revised upwards. Figure 6 shows the substatial revision between March 20th and 27th vintages. Revisions of this magnitude matter for two important reasons. First, these large revisions imply that policymakers using the data in real-time may actually be seeing a completely different picture than revealed in future revisions. Second, the data from prior ET vintages are unavailable. Nor are there any references in the technical documentation to the magnitude or even existence of these revisions, unlike the CES data . As a result, it appears that the ET employment series tracked CES figures much more closely in real time than was actually the case.
The two maps in Figures 5 below compare the differences between ET and CES percent change in employment estimates over the period from January 15 - July 15, 2020. The map on the left shows the percentage point difference between the September 2020 ET vintage and CES estimates. The difference between the October 10, 2020 ET vintage and CES data is shown in the right pane. Positive values, shown in shades of blue, indicate the given ET state-level estimates were above those found in CES estimates. Negative values, shown in shades of red and orange, indicate that ET estimates of the percent change in employment from January 15 - July 15, 2020 for the given state were below CES estimates.
Figure 5: Mapping the Difference between ET and CES State Estimates Sept. to Oct. 2020
The high number of blue-shaded states in the left pane of Figure 5 shows that the September 2020 vintage portrayed a less dire employment situation across many states than CES data indicated. The magnitude of the downward revision in the October 10, 2020 vintage is starkly shown by the prevalence of red and orange states in the right pane. Whereas only a handful of state employment estimates for July 15, 2020 came in lower than CES figures in the September vintage, only a few estimates were higher in the October 10 vintage.
Figure 6: Mapping the Difference between ET and CES State Estimates Mar. 20 - 27 2021
Figures 7-10 below compare different ET state-level vintages and CES data. These comparisons are the most relevant for real-time policy analysis, design, and evaluation.
In the March 27, 2021 ET vintage South Dakota’s private-sector employment at the beginning of February 2021 came in over twenty-five percent higher than in January 2020. Employment growth of twenty-five percent over twelve months is unheard of in ‘normal’ times, much less over the course of a steep recession. Yet this growth fails to show up in the CES figures. As of April 6th, 2021, no employment data for South Dakota is available on the ET website.
In the September 2020 vintage, private-sector employment in both Arkansas and Missouri had already exceeded their January 2020 level by the summer. The ET provided a much rosier diagnosis of the employment situation in these states than shown in CES figures. A methodological change dealing with outliers in the underlying data resulted in large downward revisions in vintages released throughout October and November 2020 (details in our forthcoming paper). The ET suggests that private sector employment declines in Arkansas and Missouri were much more severe than CES figures indicated. Vintages released on February 19 and March 20, 2021 reported that private sector employment in Arkansas was even lower in January 2021 than in April and May of 2020! Only in the vintage released on March 27, 2021, do the employment series for Arkansas and Missouri generally track the CES data. The different Utah vintages show a similar story.
Figures 11 and 12 show large swings in the ET employment series persisted across vintages for states like Mississippi and Oregon before largely disappearing in the most recent vintages — long after policymakers would have needed the data.
Large revisions are also evident in the ET industry (2-digit NAICS) employment data. The West Virginia Education & Healthcare industry detailed in Figure 13 is an extreme example of these industry-level difference across vintages. Figure 14 reveals similar patterns in the ET employment data for the Leisure & Hospitality industry in Nebraska.
Correlation Between Economic Tracker and CES Estimates
Although the examples above point to a number of issues with the ET employment data, we have not as yet considered how closely the state-level data tracks the CES in general. In their most recent paper, Chetty et. al. plot the change in state-level employment from January to April 2020 in the ET and CES data against each other. Reporting a correlation coefficient of 0.99, they conclude that “in almost all states (excluding North Dakota and Hawaii), employment changes from January-April in our combined series align very closely with changes in the CES” (Chetty et. al. 2020b: 25). However, these calculations were made with a November 2020 vintage which, as Figures 6 and 7 above show, is now quite out of date.
In Figures 15-27 below, we make similar calculations using the most recent ET and CES data. The latest ET vintage was released on April 6, 2021 and the most recent CES figures in March 2021. In each figure, we calculate a correlation coefficient for the percent change in employment in each data series from January 2020 to the month specified in the figure title. Beginning with the January-February 2020 period we compute correlations in each of the following twelve months.
The correlation coefficients in the months prior to April 2020 are effectively zero. As expected, the coefficient over the January - April 2020 period is much higher. At 0.87 the January-April coefficient calculated from the most recent ET data is large, but not as high as the coefficient reported in the technical documentation found using an earlier ET vintage. Throughout the summer and fall of 2020, however, the correlation coefficients generally decline from one month to next. Figure 27 compares the percent change in employment from January 2020 - February 2021 and finds a coefficient of 0.36.
While it seems the ET data captured the largest employment decline on record just as well as government figures, this was no guarantee that the ET data would continue to do so in the ensuing months. Though not negligible, a coefficient of 0.36 is a far cry from that reported for April 2020, the value cited by the Opportunity Insights team as evidence of their data’s ability to closely track comparable government employment statistics. The values of the correlation coefficients over the past five months — ranging from 0.30 to 0.46 — raise doubts as to the reliability of the ET series to follow CES data, and therefore the place for the ET numbers in policy formation.
Conclusion
Policymakers should seriously reconsider their use of the ET employment data. While it is commendable that the Opportunity Insights team make their data available for exactly the type of evaluation undertaken here, we believe that the large revisions across vintages and ongoing discrepancies with government statistics make the ET employment data problematic for policy use. As new vintages arise, the data shown here will become outdated and likely unavailable on the ET website. We hope that this brief piece, along with a forthcoming paper in which we present our claims in more detail, contributes to the wider discussion on the reliability of proprietary data as a guide for public policy and research.
References
Chetty, Raj, John H. Friedman, Nathaniel Hendren, Michael Stepner, and the Opportunity Insights Team. 2020a. “The Economic Impacts of COVID-19: Evidence from a New Public Database Built from Private Sector Data.” Opportunity Insights Economic Tracker. September 2020.
Chetty, Raj, John H. Friedman, Nathaniel Hendren, Michael Stepner, and the Opportunity Insights Team. 2020b. “The Economic Impacts of COVID-19: Evidence from a New Public Database Built Using Private Sector Data” Opportunity Insights Economic Tracker. November 2020. https://opportunityinsights.org/wp-content/ uploads/ 2020/05/tracker_paper.pdf .