the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
TSECfire v1.0: Quantifying Wildfire Drivers and Predictability in Boreal Peatlands Using a Two-Step Error-Correcting Machine Learning Framework
Rongyun Tang
Daniel M. Ricciuto
Anping Chen
Yulong Zhang
Abstract. Wildfires are becoming an increasing challenge to the sustainability of boreal peatland (BP) ecosystems and can alter the stability of boreal carbon storage. However, a quantitative understanding of natural and anthropogenic influences on the changes in BP fires remains elusive. Here, we quantified the predictability of BP fires and their primary controlling factors from 1997 to 2016 using a two-step correcting machine learning (ML) framework that combines multiple ML classifiers, regression models, and an error-correcting technique. We found that (1) the adopted oversampling algorithm effectively addressed the unbalanced data and improved the recall rate by 26.88 %–48.62 % when using multiple datasets, and the error correcting technique tackled the overestimation of fire sizes during fire seasons, (2) non-parametric models outperformed parametric models in predicting fire occurrences, and the machine learning model of Random Forest performed the best with the area under the Receiver Operating Characteristic curve ranging from 0.83 to 0.93 across multiple fire data sets, and (3) four sets of factor-control simulations consistently indicated the dominant role of temperature, air dryness, and climate extreme (i.e., frost) for boreal peatland fires, overriding the effects of precipitation, wind speed, and human activities. Our findings demonstrate the efficiency and accuracy of ML techniques in BP fire prediction and disentangle the primary factors determining BP fires, which are critical for predicting future fire risks under climate change.
- Preprint
(1007 KB) - Metadata XML
-
Supplement
(3190 KB) - BibTeX
- EndNote
Rongyun Tang et al.
Status: final response (author comments only)
-
CEC1: 'Comment on gmd-2023-14', Juan Antonio Añel, 17 Mar 2023
Dear authors,Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlYou have archived your code on GitHub. However, GitHub is not a suitable repository. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Therefore, please, publish your code in one of the appropriate repositories, and reply to this comment with the relevant information (link and DOI) as soon as possible, as your manuscript should not have been accepted for the Discussions stage because of a lack of compliance with the policy. Also, please, include the relevant primary input/output data.Moreover, you must include in a potentially reviewed version of your manuscript the modified 'Code and Data Availability' section, the DOI of the code (and another DOI for the dataset if necessary).Also, in the GitHub repository, no license is listed for the code. If you do not include a license, the code remains your property, and nobody can use it. Therefore, when uploading the model's code to the new repository, you could want to choose a free software/open-source (FLOSS) license. We recommend the GPLv3. You only need to include the file 'https://www.gnu.org/licenses/gpl-3.0.txt' as LICENSE.txt with your code. Also, you can choose other options that Zenodo provides: GPLv2, Apache License, MIT License, etc.Please, reply as soon as possible to this comment with the necessary links and DOIs, so they are available for the peer-review process, as they should be.
Be aware that failure to comply promptly with this request can result in rejecting your manuscript for publication.Juan A. AñelGeosci. Model Dev. Exec. EditorCitation: https://doi.org/10.5194/gmd-2023-14-CEC1 -
AC1: 'Reply on CEC1', Mingzhou Jin, 25 Mar 2023
Dear Dr. Añe,
The data and code have been archived at Zenodo (https://zenodo.org/record/7754018#.ZBi62uyZPK0) along with an included GLPv3 License. On Zenodo, the GLPv2 license is also applied. The description of the code and data availability is also updated in the attached manuscript. The ESS-DIVE archive standard is quite similar to the GMDD policy. So, I will also include the ESS-DIVE document here. If you have any comments or suggestions, please let us know.
Best Regards,
Ming
Mingzhou Jin-
CEC2: 'Reply on AC1', Juan Antonio Añel, 26 Mar 2023
Dear authors,
Many thanks for addressing our request. We can consider now this problem solved.
Best regards,
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/gmd-2023-14-CEC2
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 26 Mar 2023
-
AC1: 'Reply on CEC1', Mingzhou Jin, 25 Mar 2023
-
RC1: 'Comment on gmd-2023-14', Anonymous Referee #1, 30 Mar 2023
The authors present a two-step error-correcting machine learning framework, TSECfire v1.0, which is used to predict BP fires and their primary controlling factors from 1997 to 2016 in Boreal Peatlands. While the method has potential value, it is important to compare it to existing research, including other machine learning methods and data-driven fire models that have been developed in recent years. I recommend that the authors provide a more extensive evaluation of their method's accuracy and its unique contributions to the field in comparison to other existing methods. Additionally, the methodology section contains ambiguous expressions that require further clarification and specificity.
1. The title of the study focuses on BP fires, but the paper does not provide sufficient clarification on how the method used in this study differs from other approaches used to predict BP fires. Please provide additional details to highlight the unique aspects of their methodology. In the Introduction section, there is a significant discussion on smouldering fires from BPs; but there is no mention of this phenomenon in the Method and Results sections. Can the GFED or FireCCI products distinguish between peatland fires that are smouldering or flaming?
2. In Step One (the classification step), author presented the datasets and algorithm employed in the classification process, as well as the preprocessing methods utilized for input data. However, author did not provide sufficient information regarding how the prediction of fire occurrence was made. Please provide additional details regarding the methodology employed to predict fire occurrences.
3. Line 195- “Fire size that might be caused by the wrong classification (namely, no fire happens in reality) could be expressed by EPnm”. Since this analysis takes place during the prediction period, it may not be possible to determine which input variables (Xfm or Xnm) were wrongly classified if there is no real-time information on whether a fire has occurred or not. Please provide more explanation. And the meaning of True Positive (TP) in Equation 9 is not clear, it is uncertain how the authors obtained the "true" data during the prediction period.
4. If you do not consider the fire occurrence from the Step One, just predicted fire emissions or burned area based on training datasets listed in Table S1, is there a significant difference between the predicted results and the actual fire occurrences, which resulted in this approach?
5. Line 136- “The evaluation metrics from Step One, denoting the model uncertainties, are used at Step Two to correct fire size prediction uncertainties. The two-step ML framework is detailed in Figure 1”. Please provide more explanation about what is the meaning of the model uncertainties in Step One.
6. Line 140- “(i.e., there are more nonoccurrence records than occurrence records)”. This sentence need clarification.
7. Line 145- Figure 1. Please increase the size of the text for better readability (also for Figures in SI). The resolution of the images is quite low.
8. Line 149- “An oversampling algorithm called Synthetic Minority Oversampling Techniques (SMOTE) was applied onto the training dataset to address the imbalance between the two fire occurrence classes”. Please elaborate on how an imbalance between different datasets can affect the model's performance in predicting fire occurrences, and provide more details regarding procedures of using the SMOTE algorithm to address the imbalance problem.
9. Line 217- “…could be one primary source of feature collinearity,” This sentence need clarification.
10. What is the meaning of “training and testing” in Section 3 and 6 of SI, please provide clarification on this matter and explain the purpose of presenting both training and testing results in the figures. Also, what is the differences between "(a-1) observed" and "(a-2) observed" from Figures S3 to S12?
11. Figure S35 lacks axis titles.Citation: https://doi.org/10.5194/gmd-2023-14-RC1 -
RC2: 'Comment on gmd-2023-14', Anonymous Referee #2, 08 Jun 2023
The authors presented a two-step machine learning framework to quantify wildfire drivers and predictability in boreal peatlands: The first step use classification models to identify fire occurrences, and the second step use regression models to the burned area and C emissions, as well as the relative importance of environmental drivers. This work tested multiple datasets and tested various ML methods. While the effort is appreciated, it lacks a clear rationale for the choice of methods and what is novel compared to previous studies.
In its current form, the manuscript is a bit convolved and hard to digest (too much jargon and acronyms). There are redundancies in many places. The results could be further distilled to provide more insights. Analysis on a regional scale would be interesting. Do the models perform better in some regions than others? Currently, data are randomly split for training (70%) and predicting (30%); if we use recent years with mega boreal fires as predicting, how well would the models perform?
Smoldering is a key focus of the introduction and is mentioned as a challenge in the discussion. However, it is not clear how the results are relevant to it. C emissions are mentioned, but results are not shown.
-Line 164-166: I would start this paragraph starting from “In Step Two…”
-Line 173-200: in my opinion, this part is unnecessarily hard to follow. The essence of all these could be summarized for better readability
-Line 335-341: reiterating introduction. The discussion is a bit disconnected from the work done here.
Figure 1: hard to read
Figure 2: are there regional differences in their behavior?
Figure 3: it is not a very informative plot, as they all look similar.
Figure 4: maybe focus on the best-performing models? Or use different regional subsets of samples to see if there are differences?
Figure 5: it is not clear how the results of this paper are tied to all the processes shown here (only a tiny fraction of the variables are tested)
Citation: https://doi.org/10.5194/gmd-2023-14-RC2
Rongyun Tang et al.
Rongyun Tang et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
570 | 130 | 19 | 719 | 48 | 7 | 7 |
- HTML: 570
- PDF: 130
- XML: 19
- Total: 719
- Supplement: 48
- BibTeX: 7
- EndNote: 7
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1