
Detecting invisible roads
using the S2DR3.0 super-resolution algorithm on Sentinel-2 imagery
Introduction
Objectives
The development of roads is a significant driver of deforestation in Indonesia, making monitoring roads with satellites crucial for identifying threats to tropical forests (Southworth et al., 2011). However, detecting small-scale forest roads using open-source satellite-based remote sensing data, such as Sentinel-2 (S2), is limited by insufficient spatial resolution. Because of this, the project aimed to explore the possibility of using upscaled Sentinel-2 images (S2+) using deep-learning-based super-resolution, instead of relying on expensive high spatial resolution data.
This translated into the following 4 sub-objectives:
- Apply the super-resolution algorithm on cloud-free S2 imagery to obtain S2+ imagery.
2. Create and prepare a road annotation dataset for the study areas.
3. Develop two deep learning models for road detection trained on S2 and S2+ imagery.
4. Assess the quality of both road detection models and compare them.
Study area
A total area of 1543 km 2 in the Itci area and 1008 km 2 in the Kiani area were hand-annotated. Additional annotations were incorporated from the Congo Basin (confidential data).
The road detection algorithm was trained on study areas in the Congo Basin and Indonesia. The model performance was assessed on the remaining areas in Indonesia.
The generated S2 and S2+ mosaics were then split into tiles that could be processed by the road detection model. Next the training tiles were randomly split into a training and validation dataset with a proportion of 80:20.
Deep resolution algorithm
Super-resolution is the process of constructing high-resolution images from low-resolution ones. This method has been widely applied in photographic and remote-sensing imagery. Shim et al. (2022) used a Super-Resolution Generative Adversarial Network (SRGAN) for the detection of road damage, while Kand and An (2020) used a deep learning-based super-resolution for enhancing Ground Penetrating Radar (GPR) profiles. Bae et al. (2020) used a deep learning super-resolution algorithm to detect cracks in bridges and found that using a deep super-resolution network can significantly increase crack detection.
We used the Sentinel-2 Deep Resolution 3.0 (S2DR3.0) super-resolution model from Gamma Earth, developed to upscale S2 imagery. This model is specifically developed to capture subtle spectral variations of soil and vegetation across all 12 multi-spectral bands of Sentinel-2 L2A. Moreover, it is capable of accurately reconstructing objects and textures with individual spatial features down to 3 meters (Akhtman, 2024), making this potentially useful for the problem of detecting small roads in tropical areas.
The Sentinel-2 image (left) and the generated Sentinel-2+ image (right) in Kiani, Indonesia.
Visual inspection
Sentinel-2+ versus Sentinel-2
Sentinel-2+ versus Airbus Pléiades
Sentinel-2+ versus PlanetScope
Road detection model
Road detection is a pixel-wise classification problem. Thus, a semantic segmentation model was necessary. For this project, a Fully Convolutional Network Residual Network-50 (FCN ResNet-50) was used without pre-trained weights. The original FCN ResNet-50 structure takes in a three-colour channel image as input. However, in this project image tiles had 5 channels (red, green, blue, NIR and NDVI) , thus the first convolutional layer in the encoding block had to be modified to accommodate the different input data. This resulted in a model with 33 million parameters.
In this project, there was a necessity to train two road detection models: one based on the S2 and one based on the S2+ mosaiced imagery. To train two models, specific hyperparameters were established: a learning rate of 0.001 and the use of the Adam optimizer. To prevent overshooting, the ReduceLROnPlateau scheduler with a patience of 3 was employed to adjust the learning rate during training. The training process involved tracking both training and validation metrics, specifically the binary F1 score and binary Dice coefficient, following the methodology of Sloan et al. (2024). The S2+ model was trained for 30 epochs, while the S2 model underwent 49 epochs of training to ensure an equivalent number of backpropagation steps, accounting for the different number of training tiles and batch sizes (16 for S2+ and 32 for S2). The version of each model that achieved the lowest validation loss was saved for final performance assessment.
Workflow of the data acquisition and road prediction process.
To assess the accuracy of the road detection model and to compare how the super-resolution model implementation affects road mapping capabilities, an evaluation of the model performance was required. This presented a challenge on how to quantify model performance, given that the models produce road presence predictions with different spatial resolutions. To overcome this, a strategy to match resolutions was developed. Firstly, the test sets were fed through both models to obtain predictions of road presence. These predictions were then either upsampled (for S2 road prediction to match the S2+ grid) or downsampled (for S2+ road prediction to match the S2 grid). This allowed for a comparison to the ground truth road rasters, providing insights into the differences in model performances.
The strategy for comparing S2+ and S2 model performance. GT - coarse ground truth data (road raster); GT+ - fine ground truth road raster; S2 - Sentinel-2; S2+ -Sentinel-2 super-resolution.
Results
The two road detection models, trained on different spatial resolution datasets, exhibited varying time and computer-resource requirements. The S2+ road detection model required approximately 4 hours to train, while the S2 model was trained in approximately 13 minutes. Additionally, the S2+ model required higher computing power.
The S2 model reached convergence around epoch 25, while the S2+ model was still improving towards the end of training.
The training and validation binary Dice loss per epoch.
Considering the test dataset performance comparisons, the S2 road detection model scored lower on each accuracy metric, compared to the S2+ model , as shown below.
Metric | S2 model | S2+ model |
---|---|---|
Test Dice loss | 0.53 | 0.33 |
Test F1 score | 0.36 | 0.54 |
Test Dice coefficient | 0.35 | 0.54 |
Precision (at S2+ resolution) | 0.21 (S2 upscaled) | 0.75 |
Recall (at S2+ resolution) | 0.20 (S2 upscaled) | 0.58 |
Precision (at S2 resolution) | 0.43 | 0.67 (S2+ downscaled) |
Recall (at S2 resolution) | 0.40 | 0.53 (S2+ downscaled) |
The test performance of road detection models (S2 and S2+).
Based on the performance metrics and visual inspection (below), the S2+ model outperformed the S2 model.
Prediction accuracy of both models with S2 (left) and S2+ (right).
Discussion, Conclusion
This project explored the use of deep-learning-based super-resolution techniques for road detection in tropical forests, revealing several challenges and insights. The high cloud coverage in study areas complicates monitoring, suggesting a need for combining active and passive sensors. Developing an open-source super-resolution model could improve road prediction in cloud-free imagery, but it requires significant resources and storage. The quality of road annotation has to be high, which demands human effort and high-resolution ground truth.
The S2+ model, though more resource-intensive and not fully converged, outperformed the faster-converging S2 model. Despite some limitations, the S2+ pipeline showed promise with a higher precision score and better road detail, indicating the potential benefits of super-resolution algorithms for road detection in challenging environments.
Recommendations
This leads to the following recommendations:
- Utilise the ground truth dataset generated in this project.
- Get more extensive access to the S2DR3.0 model or create a custom super-resolution model.
- Try different methods and model architectures for road detection.
- Include more bands (for example, SWIR).
- Include data from active sensors to deal with high cloud coverage.
- Explore alternatives to Google Colab's slow storage and expensive computation units.
References
Akhtman, Y. (2024). Sentinel-2 Deep Resolution 3.0 - Yosef Akhtman - Medium. Medium. https://medium.com/@ya_71389/sentinel-2-deep-resolution-3-0-c71a601a2253
Bae, H., Jang, K., & An, Y. (2020). Deep super resolution crack network (SrcNet) for improving computer vision–based automated crack detectability in in situ bridges. Structural Health Monitoring, 20(4), 1428–1442. https://doi.org/10.1177/1475921720917227
Galar, M., Sesma, R., Ayala, C., Albizua, L., & Aranda, C. (2020). LEARNING SUPER-RESOLUTION FOR SENTINEL-2 IMAGES WITH REAL GROUND TRUTH DATA FROM a REFERENCE SATELLITE. ISPRS Annals Of The Photogrammetry, Remote Sensing And Spatial Information Sciences, V-1–2020, 9–16. https://doi.org/10.5194/isprs-annals-v-1-2020-9-2020
Kang, M., & An, Y. (2020). Frequency–Wavenumber Analysis of Deep Learning-based Super Resolution 3D GPR Images. Remote Sensing, 12(18), 3056. https://doi.org/10.3390/rs12183056
Not-for-profit sustainability strategy organisation. (2024, 23 mei). AidEnvironment - Not-for-profit sustainability strategy organisation. AidEnvironment. https://aidenvironment.org/
Slagter, B., Reiche, J., Marcos, D., Mullissa, A., Lossou, E., Peña-Claros, M., & Herold, M. (2023). Monitoring direct drivers of small-scale tropical forest disturbance in near real-time with Sentinel-1 and -2 data. Remote Sensing Of Environment, 295, 113655. https://doi.org/10.1016/j.rse.2023.113655
Sloan, S., Talkhani, R. R., Huang, T., Engert, J., & Laurance, W. F. (2024). Mapping Remote Roads Using Artificial Intelligence and Satellite Imagery. Remote Sensing, 16(5), 839. https://doi.org/10.3390/rs16050839
Southworth, J., Marsik, M., Qiu, Y., Perz, S., Cumming, G., Stevens, F. R., Rocha, K., Duchelle, A., & Barnes, G. (2011). Roads as Drivers of Change: Trajectories across the Tri‑National Frontier in MAP, the Southwestern Amazon. Remote Sensing, 3(5), 1047–1066. https://doi.org/10.3390/rs3051047
Shim, S., Kim, J., Lee, S., & Cho, G. (2022). Road damage detection using super-resolution and semi-supervised learning with generative adversarial network. Automation in Construction, 135, 104139. https://doi.org/10.1016/j.autcon.2022.104139