Agriculture and Water Impairment in Minnesota
Examining the correlations between livestock, landcover, and causes of stream impairment throughout the state.

This study sought to identify and characterize relationships within watersheds between different landcover categories, density of livestock, and stream impairment by asking the following questions:
How do rates of impairment vary by watershed?
How do those rates of impairment correlate with both land use and livestock density within watersheds?
All stream data was attained from the United States Environmental Protection Agency (EPA). In accordance with the Clean Water Act, states are required to assess their "navigable waters" for pollutants and determine if they meet state and federal quality standards. If they do not, they are placed on the section 303(d) list of impaired water bodies with the cause of impairment indicated. While these regulations extend to lakes and other bodies of water, my study focuses on stream data.
All stream data were attained from the United States Environmental Protection Agency (EPA). In accordance with the Clean Water Act, states are required to assess their "navigable waters" for pollutants and determine if they meet state and federal quality standards. If they do not, they are placed on the section 303(d) list of impaired water bodies with the cause of impairment indicated. While these regulations extend to lakes and other bodies of water, my study focuses on stream data.


The United States Geological Survey (USGS) has delineated the entire country into set of watershed basins at graduated levels of specificity based on a hydrological unit code (HUC). These boundaries are released as the Watershed Boundary Dataset. The codes range from two to twelve digits, with more digits indicating a greater specificity and smaller area. To optimize the accuracy of my data and their correlations, I used the twelve-digit (HUC-12) delineations and removed basins in which the majority of the area is outside the state boundaries of Minnesota. The final set, shown here, includes 2,345 basins. You may notice the discontiguous Northwest Angle at the top of the image - this is the only part of the continental United States north of the 49th parallel and held as an enclave within Canada.

This map represents all watersheds containing an assessed stream. Note the many gaps - these are areas in watersheds which did not contain an assessed stream in the dataset. 1,463 HUC-12 basins are included. Also note the concentration of higher rates of impairment in the southern and western regions of the state.
I used the Spatial Autocorrelation tool to produce a Global Moran's Index of the impairment rate data. The high z-score and Moran's Index indicate that there is a non-random spatial distribution of watershed impairment. As can be seen in the map above, rates of impairment in the data tend to follow geographic patterns.
Minnesota spans several major biomes. The southern and western regions were historically prairie grasslands and are now largely agricultural. The undeveloped areas in the center and eastern border of the state along the Mississippi River are covered by hardwood forests of ash, elm, oak, maple, and aspen. The northern-central and northeastern areas of the state are largely undeveloped, with large portions of wilderness characterized by boreal forests, wetlands, and lakes.

This image shows 30-meter resolution classified raster data of landcover in Minnesota. This was derived from Landsat imagery, classified, and distributed by the United States Department of Agriculture (USDA) National Agricultural Statistics Service (NASS). I have deliberately left out the lengthy legend here - simply note the general regional differences.
Using the Zonal Statistics tool, I applied the raster data above to the HUC-12 boundaries to identify each watershed's "majority" landcover value. In the tool's language "majority" is equivalent to "statistical mode" - the pixel value with the most occurrences. This method is imperfect for this study, as it simplifies areas with varied landcover into a single value. The value assigned may itself only represent a non-majority portion of the area. For example, a watershed with landcover that was 40% corn, 20% soybeans, 10% mixed forest, and 10% open water would simply be identified as "corn". Since landcover values are categorical, other options which may have been used (such as statistical mean) would not work.
This table shows the count of assessed watersheds of each landcover class as well as the mean impairment of those watersheds. It is sorted to have watershed count in descending order - note that of the five most prevalent landcover classes, the two definitively agricultural classes have noticeably higher rates of impairment (0.55 for corn and 0.57 for soybeans).
This table shows the same data, but sorted in descending order by mean impairment. Note that all developed and agricultural landcover classes have higher mean rates of impairment than any of the undeveloped classes.
The Minnesota Pollution Control Agency (MPCA) requires all feedlots above a specified minimum capacity to be registered and adhere to regulations which aim to mitigate soil, water, and air pollution. MPCA feedlot data measures both animal count by species and "animal units" by species at each location. An "animal unit" is a quantity of livestock which produce manure equivalent to a 1,000 pound steer - for example, a sheep is listed as 0.1 animal units. Conveniently, the data also comes with the HUC-12 as a listed field. I used the Summary Statistics tool to identify the total animal units per watershed, then joined the output table to the HUC-12 shapefile to create this map.
As with impairment rates, the Spatial Autocorrelation tool produced values strongly indicating a non-random spatial distribution of livestock density. This is evident in the map.
I joined all of the variables mentioned (HUC-12, livestock density, impairment rate, and classname) into a master table, then used the Table to Excel tool to export the table into Excel. I then imported that table into RStudio to calculate correlation coefficients and to create graphs. This is a plot of livestock density and impairment. It is interesting to note the frequency of both total impairment (1.0) and total unimpairment (0). The correlation coefficient between the two variables is 0.21 - a modest positive relationship, but lower than I expected.
This chart provides a line to better visualize the correlation. A perfect correlation (coefficient of 1.0) would be a straight line from the origin to the top right corner.
A big lesson that I am taking from this study is the importance of structuring a question and planning a workflow that follows it. For much of the project, I felt like I was wandering through different variables without a clear plan of how to tie them together. I still feel like I am lacking a solid conclusion, or something to tie them together. With more time, I think it would be interesting to do more of a proper regression analysis and to incorporate additional information, such as cause of impairment, into the study.