Machine Learning of Wildfire Fuel Types
From BC VRI Dataset
Introduction
Forests are widely acknowledged as one of the most important components within the Canadian natural environment, and it is also one of the main economic pillars of British Columbia. Wildfire, however, in recent years has been a hot topic.
Currently, our society still lacks understandings regarding wildfires. What we currently know about fire is far from sufficient to help us control these fires, no matter which cause. In the 2017 summer, British Columbia experienced one of the largest wildfire events over the entire history. According to the records from B.C. Wildfire Service, there were 2,344 fire events happened throughout the whole year across the province. It has caused 121,6053 hectares of land burned and $563,000,000 in terns of economic loss (BC Wildfire service). On July 7th, lightning strikes, surprisingly, lead to up to 594 fire events (BC Wildfire service). But what, on earth, cause the megafires? The fire triangle combines fuel, oxygen and heat. During the 2017 summer, the long-lasting drought period creates extreme dry condition that is adequate for ignition. With the combined effect of enough fuels, due to B.C.’s massive woody material storage, result in widely spread-out fire.
In 2017 and 2018, the whole interior of British Columbia has seen significant fire hazards caused by fire seasons. Fire hazards have demonstrated that its existence on the land base and huge impacts on both societies and high-value objects under extreme weather conditions (Pickell et.al 2020).
New Technologies (imaging spectroscopy and LIDAR) enable using fuel characteristics to assist in categorizing fuel classifications as well as predicting landscape-scale fire behavior (Stavros et.al, 2018). Remote sensing, therefore, does prove its value in aiding the process of updating fuel maps over large areas (Alonso-benito et.al, 2016), as it uses Earth-observing satellites and other remotely-sensed data (Xiao-rui et.al, 2005 and Arroyo et.al, 2008). Hence, multiple remotely-sensed datasets were used as predictor variables in the fuel classification process as well as within the workflow (Pickell et.al, 2020). To ascertain wellness of functioning in terms of characterizing fuel type based on remote sensing data at different spatial scales. Well–established classification and spectral mixture analysis were adopted as two approaches for fuel type mapping (Lasaponara and Lanorte, 2006).
Popular approaches have been adopted incorporated into the fuel type mapping, including object-based image analysis, maximum likelihood algorithm, clustering and classification, ground-truth validation, and regression. In addition, the main approach is to apply machine learning algorithms by utilizing remotely-sensed datasets as training data and validation data, in terms of solving the problem of fuel type mapping (Pickell et. al, 2020).
The purpose of this study is to use the model to test the hypothesis – that the machine learning algorithm can replace the current manual wildfire fuel types identification system. The accuracy of the algorithm can meet the standards of BC FLNRO.
Study Area Description
The study area was defined as a 200km x 200km (40,000 km2) square grid under the projection of NAD 1983 BC Albers coordinate system to ensure spatial agreement of other data. The cell size is set as 30m, which is centered over the 2017 Elephant Hill megafire perimeter (51° 6' 30.57" N, 121° 12' 35.7"W; Figure 1). The study area was selected due to the variation of fuel types as well as the availability of post-fire field-based assessments. The terrain is mainly defined by a plateau which lies between the coastal mountains and the Rocky Mountains, with elevation ranging from 81m to 2854 m.a.s.l (Scudder and Smith, 2011). Climate at the study area contains obvious continental characteristics, mean annual temperature ranges from -4° to 10°C given available ClimateNA data. Nonetheless, while mean annual precipitation ranges from 219 to 1622 mm, 16 and 86% of precipitation falling as snow (Wang et. al, 2016).
Mainly dry interior forests thrive within the study area, considering the terrain and climate., which is composed of Douglas-Fir (Pseudotsuga menziesii var glauca (Beissn.) Franco), ponderosa pine (Pinus ponderosa Dougl. Ex Laws.), lodgepole pine (Pinus contorta var latifolia Douglas), and Engelmann spruce (Picea engelmannii Parry ex. Engelm.). Some commonly found plants in the valleys are segments of mixedwood forest with co-growing deciduous trees (Betula papyrifera Marshall and Populus species). Shrubs (Artemisia tridentataNutt. and Purshia tridentata (Pursh) DC.) and grasses (Agropyron spicatum (Pursh) Scribn. & J.G.Sm. and Stipa comata Trin. & Rupr.) cover a huge proportionate of non-forested areas of valley bottoms. There are 12 major biogeoclimatic zones that occur in the study area and contribute to the diversity of fuel types in the landscape (ordered descending by area): Interior Douglas-Fir; Engelmann Spruce-Subalpine Fir (Abies lasocarpa (Hooker) Nutt.); Montane Spruce; Sub-Boreal Spruce; Interior Coastal Hemlock; Bunchgrass; Sub-Boreal Pine-Spruce; Ponderosa Pine; Coastal Western Hemlock (Tsuga heterophylla (Raf.) Sarg.); Mountain Hemlock (Tsuga mertensiana (Bong.) Carr.); and Alpine Tundra (Demarchi, 2011).
Methods
Data Summary
Training samples were extracted from the current British Columbia Wildfire Service fuel layer and will then been imported into a decision-tree algorithm (Perrakis et. al, 2018). In the meanwhile, a series of land cover and stand attributes found in the provincial Vegetation Resource Inventory (VRI) will be used as training samples as well, especially tree species, forest height, canopy cover, and stem density (B. F. I. and A. Branch, 2011).
Input Data & Pre-Processing
In the meanwhile, a series of land cover and stand attributes found in the provincial Vegetation Resource Inventory (VRI) will be used as training samples as well, especially tree species, forest height, canopy cover, and stem density (B. F. I. and A. Branch, 2011). The VRI datasets were first been repeatedly overlaid with Elephant Hill study area parameter, and then the datasets received various quality control process via ArcGIS. The quality control step involves removing VRI polygons that are missing data source identifiers for attributes considered critical in the fuel typing decision-tree, such as stem density (stems per hectare), stand age (years), stand height (m), and canopy cover (%). The above steps were repeated annually to collect training samples from 2013-2018.
Modelling
Training polygon samples were collected for nine fuel types: Fire (none-fuel), C-2, C-3 (Boreal Spruce (Picea spp.)), C-5 (Mature Jack (Pinus banksiana) or Lodgepole Pine (Pinus contorta)), C-7 (Ponderosa Pine (Pinus ponderosa) Douglas-Fir (Pseudotsuga
menzieseii)), D-1/2 (Aspen (Poplar spp.)), M-1/2 (Mixedwood associations of boreal conifers and deciduous species), O-1a/b (Continuous grass fuels either matted in the early spring (O-1a) or standing cured in the summer (O-1b)), and S-1 (Moderate-to-heavy loading of downed, dead woody material following clear-cut harvesting). As C-4 (regenerating Jack Pine (Pinus banksiana Lamb.) or Lodgepole Pine), S-2 (White Spruce (Picea glauca (Moench) Voss), Balsam Fir (Abies balsamae (L.) Mill.) slash), and S-3 (Coastal Cedar, Hemlock, Douglas-Fir slash) were too rare to find sufficient training polygon samples, so they were combined with their similar abundant types. Thus, C-4 was grouped with C-3 (mature Jack or Lodgepole Pine), and S-2 and S-3 were grouped with S-1 (Pine slash). A balanced set of polygon samples were randomly chosen for each fuel type class proportional to the abundance of that class in each year.
The most critical step is to train the random forest algorithm using the above created polygon samples. The overall training samples will be split into a training set (50%) and a testing set (50%) for cross-validation assessment. The validation set received a further test after using the predict function. A confusion matrix was generated afterwards to acquire the overall accuracy.
After setting up the tuning input using the random forest library within Rstudio, the random forest algorithm was then be applied to the input variables to predict the nine fuel types for all polygon samples given above, from 2013 to 2017 (Figure 1). The output prediction fuel type maps were exported as feature layers in ArcGIS Pro for later interpretation.
Post-Processing and Validation
The cross validation has been set up during the algorithm designing stage, and then the training set went into a cross validation process to check the performance of the model. The cross validation has been repeated multiple times to ensure the durability and reliability of the chosen model.
Results
The training samples contain 452190 training samples(polygons), with 241458 samples were randomly selected as training set and the other half went to the validation set. The training samples collected for 2013 to 2017 is 91809, 111817, 111766, 84397, 83127 respectively. To better visualize the fuel types assigned via the current FBP classification system, the comparison of FBP assigned fuel type map is shown below to identify the changes in fuel types between 2013 (Figure 3) to 2017 (Figure 4).
As can be seen on the fuel type map of 2013, the polygons that took up the most areas are O-1a/b, following those are C-5, C-7 as well as C-2. The most identical polygons within the elephant hill fire perimeter are conifers, mixed wood and O-1a/b. Whereas, in 2017, the most identical type is N-Fire, which can be explained by the 2017 mega fire.
After applying the random forest algorithm upon the training set – the created polygon samples, there is an automatically generated random forest classifier confusion matrix given the 9 fuel types (Table 1). The result of the 100-tree model shows an overall OBB estimate of error rate at 1.43 overall %, with the Fire (None-fuel) has the lowest error rate at 0.3% whereas C-5 has the highest one at 13.5%. It makes sense as it is hard to distinguish conifer species based on human interpretation considering the usage of historical VRI dataset. Further improvement will be applied on the algorithm by repeatedly training the current model to acquire a finalized model that can perform with the highest accuracy for the prediction map.
Given that the fact that the cross-validation accuracy of the random forest algorithm reached 98.57% (Table 1), the model was then further tested by applying it on the validation set. The confusion matrix was then obtained showing an overall accuracy at 92.35% (Table 2). M-1/2 has the highest user’s accuracy at 99.56%, while the C-5 has the lowest user’s accuracy at 65.26%. Producer’s accuracy has the similar result – the highest one is of M-1/2, whereas C-5 is the lowest.
Thus, as both of the accuracy met the proposed assumption, the model was then used to predict the fuel type maps from 2013 to 2017, given the pre-processed VRI datasets from 2013 to 2017 (Figure 5, 6, 7, 8, 9).
Predicted fuel type maps of the 2017 VRI dataset, areas that are considered as invalid fuel types are shown in white.
Table 1: The output cross-validation confusion matrix of the applied random forest algorithm given the training set, with 100 trees and 8 variables, and it is based on the 9 corresponding fuel types. The overall estimate of error rate is 1.43%.
Table 2: The output confusion matrix of the applied random forest algorithm given the validation set, with 100 trees and 8 variables, and it is based on the 9 corresponding fuel types. The overall accuracy is 92.35%.
Discussion
A. Featured Advantages of Using Machine Learning Algorithm to Map Fuels
Machine learning algorithms grant multi-perspective advantages for mapping fuels. In British Columbia, tremendous effort has been made to interpret forest inventory attributes into fuel types, following the guidelines of BC Wildfire Services. It left a large pool of labeled samples that are available for any deep learning method which drastically rely on a large volume of training data. In comparison with the traditional labour-intensive classification of millions of features, machine learning methods are way more convenient, time-saving, and flexible to update (Pickell et. al, 2020). The random forest algorithm, by design, is fully capable of reclassifying novel data as well as updating itself with more training data being ingested. The object-based image analysis along with the machine learning algorithm present important benefits in updating the fuel maps of any large scale, corresponding to Tobler’s first law of geography: “near things are more related than distant things” (Tobler, 1970).
B. Considerations for Social Implementations of Fuel Maps
B. Considerations for Social Implementations of Fuel Maps
The objective of this study was to evaluate whether a machine learning algorithm could be applied with the existing fuel classification scheme to predict the fuel classes given provincial VRI datasets in British Columbia. Although three were numerous quality control procedures as well as other delicate evaluations within the study area. Nonetheless, it should caution against using these maps as an absolute reference at the site or stand level. The maps are recommended to be used align with other available data sources, such as remotely sensed datasets or field collected data for cross reference validation, if further analysis like fire behavior prediction is going to be conducted.
The low user’s accuracy for the C-2 and C-5 class may partly be explained by the laziness of the random forest algorithm. The algorithm might start to behave lazily by memorizing all the patterns of the combinations of the selected parameters of all training inputs. Given that both of these two classes are flammable, how to reduce the user’s accuracy of them should be outlined as the future improvement. As the major components of fuel classes, low user’s accuracy can lead to some credibility issues from the fuel maps. For instance, burn probability (BP) modelling demands some fuel maps as inputs for its applications in direct BP examination, neighbourhood processes, fire hazard and risk as well as integration with secondary models (Parisien et.al, 2019). Thus, the errors in major fuel classes, could mislead the BP modelling process and later result in serious incorrection predictions of location-based measures of fire likelihood and fire behaviour (e.g. fire intensity, biomass consumption).
The overall accuracy obtained seems to outcompete other studies that have used machine learning algorithms to predict or classify fuel types at the same study area. Most recently, Pickell et.al achieved 63.1% overall accuracy on 9 fuel type classes using artificial neural network. The reason behind that is possibly related to the sampling design, the study used random sampling during the selection of the training set and the validation set. Conversely, the latter adopted stratified sampling design, which leads to the difference in the accuracy rate.
C. How to Improve Machine Learning Fuel Mapping
C. How to Improve Machine Learning Fuel Mapping
1) Training Data Quality and Quantity:
The current method lacks a quality control step of cross checking the selected parameters. Some parameters are related to each other, for instance, for those polygons which are labelled as non-vegetated fuel types should have a live stem volume at 0. By cross checking the selected parameters, the training data were further refined that maintain high data quality standards.
2) Integrate Additional Remote Sensing Data:
Only VRI datasets have been used in my approach, which constrained the accuracy of the algorithm as the VRI datasets were not acquired very frequently comparing to remotely sensed datasets, such as optical imagery and some ancillary geospatial datasets (Pickell et. al, 2020), which can provide rapid spectral information. In recent years, with the rapid advances in unmanned aerial vehicles (UAV) technologies (Goodbody et.al, 2018), 3-D point clouds could be generated from digital aerial photogrammetry, which offers detailed information of species composition and forest canopy structure that are considered as key attributes for the current fuel type classification (Goodbody et.al, 2019). Beyond using optical imagery, active remote sensing
imagery such as radio detection and ranging (RADAR) or LiDAR aids in characterizing the structural features, which supports later process of distinguishing the finer-level fuel type (Fernández-Álvarez et.al, 2019).
References
G. Scudder and I. Smith, “Introduction and Summary of the Montane Cordillera Ecozone,” in Assessment of Species Diversity in the Montane Cordillera Ecozone, G. Scudder and I. Smith, Eds. 2011, pp. 1–24.
T. Wang, A. Hamann, D. Spittlehouse, and C. Carroll, “Locally Downscaled and Spatially Customizable Climate Data for Historical and Future Periods for North America,” PLoS One, vol. 11, no. 6, p. e0156720, Jun. 2016.
D. A. Demarchi, “An Introduction to the Ecoregions of British Columbia,” Victoria, BC, 2011.
T. Tachikawa et al., “ASTER Global Digital Elevation Model Version 2 –Summary ofValidation Results,” 2011.
E. F. Vermote and R. E. Wolfe, “MYD09GQ MODIS/Aqua Surface Reflectance Daily LG2 Global 250m SIN Grid V006 [Data set].” 2015.
D. D. B. Perrakis, G. Eade, and D. Hicks, “British Columbia Wildfire Fuel Typing and Fuel Type Layer Description,” Victoria, BC, 2018.
B. F. I. and A. Branch, “VRI - Forest Vegetation Composite Polygons and Rank 1 Layer [Data set].” Victoria, BC, 2011.
Pickell, P. D., Chavardès, R. D., Daniels, L. D. & Li, S. J. (2020 unpublished). A heuristic approach on how the spatial distribution of fuels influences fire behavior in Interior British Columbia. Transactions on Geoscience & Remote Sensing.
A. Alonso-benito, P. A. Hernandez-leal, M. Arbelo, J. A. Morenoruiz, and J. R. Garcia-lazaro, “fuels maps updating,” vol. 999821, no. October 2016, 2020.
T. Xiao-rui, D. J. Mcrae, S. H. U. Li-fu, and W. Ming-yu, “Fuel classification and mapping from satellite imagines,” vol. 16, no. 4, pp. 311–316, 2005.
L. A. Arroyo, C. Pascual, and J. Manzanera, “Fire models and methods to map fuel types: The role of remote sensing,” For. Ecol. Manage., vol. 256, no. 6, pp. 1239–1252, Sep. 2008.
S. Yu and H. Kobayashi, “Practical Implementation of an Efficient Forward – Backward Algorithm for an Explicit-Duration Hidden Markov Model,” IEEE Trans. Signal Process., vol. 54, no. 5, pp. 1947–1951, 2006.
B.C. Wildfire Service. (2020, October 08). Wildfire Season Summary. Retrieved December 13, 2020, from https://www2.gov.bc.ca/gov/content/safety/wildfire-status/about-bcws/wildfire-history/wildfire-season-summary
W. R. Tobler, “A computer movie simulating urban growth in the Detroit region,” Econ. Geogr., vol. 46, pp. 234–240, Jun. 1970.
Parisien Marc-André, Dawe Denyse A., Miller Carol, Stockdale Christopher A., Armitage O. Bradley (2019) Applications of simulation-based burn probability modelling: a review. International Journal of Wildland Fire 28, 913-926.
T. R. H. Goodbody, N. C. Coops, T. Hermosilla, P. Tompalski, and P. Crawford, “Assessing the status of forest regeneration using digital aerial photogrammetry and unmanned aerial systems,” Int. J. Remote Sens., vol. 39, nos. 15–16, pp. 5246–5264, Aug. 2018.
M. Fernández-Álvarez, J. Armesto, and J. Picos, “LiDAR-based wildfire prevention in WUI: The automatic detection, measurement and evaluation of forest fuels,” Forests, vol. 10, no. 2, p. 148, Feb. 2019.