Water Quality Classification Inference Using GM/P90 Values

Non-linear regression neural network to infer bacterial scores and classification in coastal Maine

Brandon Engstrom

March 8, 2025

The Goal

P90 and GM scores are used to evaluate water quality and are an indication if a waterbody is able to be used for shellfishing or recreationally.

Create a neural network that is capable of inferring a station's P90 and GM scores for a year based on all of the previous year's data. Then use the inferred values to classify the water quality for all the stations.

The Standards

The standards for Approved classification are 14 CFU or less (geomean) and 31 CFU or less (P90).
The standards for Restricted are 88 CFU or less (geomean) and 163 CFU or less (P90).
The standards for Prohibited are greater than 88 CFU (geomean) and greater than 163 CFU (P90).

The Data

There are 10 datasets for years 2013 - 2023 that were acquired from the Maine GeoLibrary Data Catalog.

The Department of Marine Resources (DMR) Bureau of Public Health collects surface water grab samples at approximately 1,400 water quality stations along the Maine coast year-round. For each station, after at least 30 samples have been collected under systematic random sampling and analyzed for fecal coliforms, DMR scientists calculate final scores which, along with sanitary surveys of the area, help determine whether the water quality is acceptable for harvesting shellfish.
https://dmr-maine.opendata.arcgis.com/datasets/c17c113d275b49bebb48ced5c6d3bf72_0/about

Visualizing the Data

Neural Network for the Data

############################################
################## MODEL ###################
#for stations that have more than 2 samples#
############################################

class StationP90Model(nn.Module):

    def __init__(self, input_size):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(input_size, 32),
            nn.ReLU(),
            nn.BatchNorm1d(32),
            nn.Dropout(0.2),
            nn.Linear(32, 16),
            nn.ReLU(),
            nn.BatchNorm1d(16),
            nn.Dropout(0.1),
            nn.Linear(16, 1)
        )

    def forward(self, x):
        return self.network(x)


############################################
############## SIMPLER MODEL ###############
#for stations that have less than 2 samples#
############################################
 
class SimpleModel(nn.Module):
    def __init__(self, input_size):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(input_size, 16),
            nn.ReLU(),
            nn.Linear(16, 1)
        )
                        
    def forward(self, x):
        return self.network(x)

ArcPy Workflow for P90/GM Values

The goal of the script is to identify stations with significant differences 
between actual and predicted P90/GM values.


1. Reads a .csv file containing the P90/GM predictions created by the neural network.
2. Creates a table view from the .csv file.
3. Converts the table view to a point feature class for the predictions.
4. Performs a spatial join between the actual and predicted P90/GM values.
5. Calculates the difference between actual and predicted P90/GM values.
6. Selects stations with significant differences and creates a new feature class.
7. Exports the selected stations to a new .csv file.
8. Confirms the completion of the process.

Results

P90 Predictions (2013 - 2023)

approximately 92.1% accuracy for 2015 predictions using data up to 2014 ( 112 inaccurate stations out of 1416)
approximately 93.8% accuracy for 2016 predictions using data up to 2015 (88 inaccurate stations out of 1427)
approximately 92.4% accuracy for 2017 predictions using data up to 2016 (105 inaccurate stations out of 1393)
approximately 93.6% accuracy for 2018 predictions using data up to 2017 (94 inaccurate stations out of 1479)
approximately 94.6% accuracy for 2019 predictions using data up to 2018 (80 inaccurate stations out of 1503)
approximately 96.7% accuracy for 2020 predictions using data up to 2019 (50 inaccurate stations out of 1516)
approximately 95.8% accuracy for 2021 predictions using data up to 2020 (64 inaccurate stations out of 1528)
approximately 95.6% accuracy for 2022 predictions using data up to 2021 (67 inaccurate stations out of 1543)
approximately 95.0% accuracy for 2023 predictions using data up to 2022 (77 inaccurate stations out of 1560)

GM Predictions (2013 - 2023)

approximately 94.9% accuracy for 2015 predictions using data up to 2014 ( 71 inaccurate stations out of 1416)
approximately 98.3% accuracy for 2016 predictions using data up to 2015 (24 inaccurate stations out of 1427)
approximately 98.8% accuracy for 2017 predictions using data up to 2016 (16 inaccurate stations out of 1393)
approximately 98.8% accuracy for 2018 predictions using data up to 2017 (17 inaccurate stations out of 1479)
approximately 99.1% accuracy for 2019 predictions using data up to 2018 (13 inaccurate stations out of 1503)
approximately 99.6% accuracy for 2020 predictions using data up to 2019 (5 inaccurate stations out of 1516)
approximately 99.2% accuracy for 2021 predictions using data up to 2020 (11 inaccurate stations out of 1528)
approximately 99.2% accuracy for 2022 predictions using data up to 2021 (12 inaccurate stations out of 1543)
approximately 99.3% accuracy for 2023 predictions using data up to 2022 (10 inaccurate stations out of 1560)

P90/GM Predictions and Classification for 2024

2024 Inferred Water Quality Classification

Sources

1) https://www.maine.gov/geolib/catalog.html

MaineDMR Public Health - 2013 P90 Scores
MaineDMR Public Health - 2014 P90 Scores
MaineDMR Public Health - 2015 P90 Scores
MaineDMR Public Health - 2016 P90 Scores
MaineDMR Public Health - 2017 P90 Scores
MaineDMR Public Health - 2018 P90 Scores
MaineDMR Public Health - 2019 P90 Scores
MaineDMR Public Health - 2020 P90 Scores
MaineDMR Public Health - 2021 P90 Scores
MaineDMR Public Health - 2022 P90 Scores
MaineDMR Public Health - 2023 P90 Scores