Real-Time Air Monitoring

Data Ingesition and Analysis with ArcGIS Velocity

PurpleAir's public-facing map

 PurpleAir  creates easy-to-deploy, low-cost air monitoring sensors which allow anyone to collect hyper-local air quality data and share it with the public. As of 2021, there were over than 9,000 PurpleAir sensors deployed in California. Their widespread distribution make them a potentially valuable data source for understanding localized air pollution.

So much data, but how do we access it?

Ingest and Alert

We created a real-time analytic in  ArcGIS Velocity , which is the real-time and big data processing and analysis capability of ArcGIS Online. This real-time analytic connects to the PurpleAir API, which provides a mechanism to programmatically interact with PurpleAir data.

PurpleAir API documentation page.

This allows us to leverage real-time data being streamed back by thousands of publicly shared PurpleAir sensors.

ArcGIS Velocity provides us with a workspace and tools to perform real-time data ingestion and analysis. It has a similar look and feel to ModelBuilder and allows us to create entire workflows with minimal coding.

Our real-time analytic ingests PurpleAir data, and outputs a series of datasets for streaming, archiving, and visualization.

Real-time analytic for PurpleAir data ingestion and incident detection.

The US EPA maintains a nation-wide network of air monitoring stations called AirNow. These stations receive regular maintenance and calibration and may be useful in validating the sensor results coming from PurpleAir.

We incorporate the US EPA AirNow sensors into our real-time analytic and perform a spatial join between a PurpleAir sensor and an AirNow sensor if they're located within a certain distance. This join allows us to compare rea-time PM 2.5 sensor readings from PurpleAir to PM 2.5 sensor readings from AirNow.

PurpleAir sensors (in purple) and AirNow sensors (in blue). Sensors with a red halo have recorded elevated PM 2.5 concentration values within the last 8 hours.

We added in elementary schools using a layer from the ArcGIS Living Atlas and use the Incident Detection tool in ArcGIS Velocity to identify when certain criteria have been met. In our situation, we're interested in times when PM 2.5 concentration levels recorded by the PurpleAir sensor and the AirNow sensor are in the unhealthy level and if that particular PurpleAir sensor is within 0.5 miles of an elementary school.

If these conditions are met, then an automated email is triggered. This information could be useful in helping schools identify conditions whether students should limit outdoor activities.

Analyze

We can also use ArcGIS Velocity to perform big data analysis. Big data analysis within ArcGIS Velocity is performed against stored, or historical data. Big data analytics can be performed ad-hoc (on button-click) or scheduled and can be run against millions or hundreds of millions of features.

Big data analytic to process historical PurpleAir data from CSVs stored in an S3 bucket.

We downloaded historical data from PurpleAir from over 1,000 sensors in the San Francisco Bay Area from July 1-8, 2021. The data came in a series of CSVs that we added to an Amazon S3 bucket. Historical data contains averaged PM 2.5 concentrations every 30 minutes. This one week time period was chosen because of its proximity to the 4th of July, a holiday in the United States with heavy fireworks usage. Research has been conducted into the effects of fireworks on short-term PM 2.5 air pollution. We use a big data analyt to process our input CSVs, perform some data cleaning and filtering, and establish an output dataset that is ready for additional analysis.

Big data analytic used to compute a magnitude-per-unit area for PM 2.5

Here we used the Calculate Density tool to calculate a magnitude-per-unit area for PM 2.5 values per hex bin. Our input dataset is time-enabled and the Calculate Density tool is able to compute results based on our data's time intervals. This allows us to visualize change over time.

This animation shows data from 8 AM on July 4, 2021 to 5 AM on July 5, 2021. The darker the orange, the higher the magnitude-per-unit area for PM 2.5. Notice the changes that occur around 9 PM. Could this PM 2.5 air pollution be caused by fireworks related to 4th of July celebrations?

Magnitude-per-unit area of PM 2.5 concentrations over time.

We also use ArcGIS Velocity to better understand the nature of poor air quality conditions in this region from July 1-8. We set up a Big Data Analytic and configured the Incident Detection tool to identify observations that exceeded a certain threshold and then categorized those incidents as Unhealthy for Sensitive Groups, Unhealthy, Very Unhealthy, or Hazardous based on US EPA PM 2.5 pollution breakpoints.

Big data analytic to understand the nature of air pollution incidents in the SF Bay Area from July 1-8, 2021.

We use the Join Features tool in ArcGIS Velocity to enrich our PurpleAir data with attributes from CalEnviroScreen 4.0, a dataset created by the California Office of Environmental Health Hazard Assessment. This allows us to better understand the populations that may be adversely affected by poor air quality.

Elevated PM 2.5 air pollution incidents in SF Bay Area from July 1-8, 2021. PurpleAir data has been enriched with data from CalEnviroScreen 4.0 (CA OEHHA).

What's Next?

This analysis was a proof of concept to ingest PurpleAir data in real-time and perform incident detection, and conduct big data analysis against historical PurpleAir data. There are a number opportunities for future analysis like:

  • Expanding out the historical data analysis to identify if the PM 2.5 air pollution patterns detected by PurpleAir the week of July 4th are unique in comparison to other time periods.
  • Analysis of the distribution of PurpleAir sensors---where are the gaps in sensor deployment and what communities could benefit from additional sensors?
  • Analysis of PurpleAir sensors that are installed indoors during sever air pollution incidents like during wildifres

PurpleAir's public-facing map

PurpleAir API documentation page.

Real-time analytic for PurpleAir data ingestion and incident detection.

PurpleAir sensors (in purple) and AirNow sensors (in blue). Sensors with a red halo have recorded elevated PM 2.5 concentration values within the last 8 hours.

Big data analytic to process historical PurpleAir data from CSVs stored in an S3 bucket.

Big data analytic used to compute a magnitude-per-unit area for PM 2.5

Magnitude-per-unit area of PM 2.5 concentrations over time.

Big data analytic to understand the nature of air pollution incidents in the SF Bay Area from July 1-8, 2021.

Elevated PM 2.5 air pollution incidents in SF Bay Area from July 1-8, 2021. PurpleAir data has been enriched with data from CalEnviroScreen 4.0 (CA OEHHA).