Building a machine learning surrogate model for wildfire activities within a global Earth system model

Zhu, Qing; Li, Fa; Riley, William J.; Xu, Li; Zhao, Lei; Yuan, Kunxiaojia; Wu, Huayi; Gong, Jianya; Randerson, James

doi:https://doi.org/10.5194/gmd-15-1899-2022

Articles | Volume 15, issue 5

https://doi.org/10.5194/gmd-15-1899-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/gmd-15-1899-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 15, issue 5

Model description paper

|

08 Mar 2022

Model description paper |

| 08 Mar 2022

Building a machine learning surrogate model for wildfire activities within a global Earth system model

Qing Zhu, Fa Li, William J. Riley, Li Xu, Lei Zhao, Kunxiaojia Yuan, Huayi Wu, Jianya Gong, and James Randerson

Download

Final revised paper (published on 08 Mar 2022)
Supplement to the final revised paper
Preprint (discussion started on 23 Apr 2021)

Interactive discussion

Status: closed

RC1:
'Comment on gmd-2021-83', Joe Melton, 28 May 2021

Zhu et al. develop a machine learning (ML) burnt area model that can be used in place of a process-based algorithm in ELM. This approach was first used to surrogate the fire model of Li et al. which was in CLM (and then now ELM). The ML approach uses a deep neural network to reproduce the process model result (they call it Base). Then by altering the parameters they tuned it to match GFED4 burned area. The paper is clearly written and results are generally well presented. I found the work interesting as this is an important problem. Present process-based fire models are not overly skillful. Much of this stems from the many complexities of fire modelling - especially anthropogenic influences. I am optimistic this paper can be published but I would like to see some careful consideration of my comments below. At present the manuscript is what I would consider an absolute bare minimum of what can be published and there are many opportunities to make this paper into a much better resource to the community. This particular approach could be valuable but I think it needs some expansion to demonstrate how useful imbedding ML approaches in process models can be. As a result I would like to see some expansion of the work to better demonstrate the utility of the approach.

Main comments:

- The DNN-Fire model was subsequently tuned to match GFEDv4 but this is not the only burned area product available (e.g. Chuvieco et al. 2019). Indeed there are many other products now available and they don't agree so well (e.g. Padilla et al. 2015, Humber et al. 2019). I worry that by tuning the model to reproduce one dataset you may get a result closer to that dataset but at the expense of adopting its same biases and thereby potentially not getting as admirable advances in accuracy at it seems. Why not consider all of the available burned area products to produce a burned area estimate that could then be less biased by a single dataset? As, in reality, we are most interested in increasing our predictive skill - not just reproducing an observation.

- By surrogating Base-Fire, the DNN-Fire then integrates/assumes the biases and issues apparent in ELM's simulations (e.g. too much/little biomass, too dry/wet soil, etc.) and produces a model that aims to get the right result (burned area matching GFED) potentially for the wrong reasons (based on biased inputs). Why not run an ensemble approach with different forcing datasets (e.g. met forcing of CRUJRA in addition to GSWP3, or a different land cover (if using prescribed), etc.) to try and give at least a measure of the uncertainty in these inputs to the DNN? We have found for our model (run in normal process-based mode) the results can be surprising and have some strong impacts for certain variables. Gitta Lasslop looked at this too and found a large impact upon fire, primarily due to the wind speed differences (e.g. Fig 3 in Lasslop et al. 2014). Alternatively using an observation-based product of one of the ELM variables (Table 1) like soil wetness or above ground biomass as another means to look at the influence of input bias.

- Around line 188 you describe the training/testing split. This approach of doing it randomly makes me wonder if the influence of spatial autocorrealtion will result in an overly optimistic error estimate. Especially as fire is likely autocorrelated. There are many papers in the literature discussing the dangers of random sampling on spatially correlated data (e.g. Roberts et al. 2017; Meyer et al. 2019; Ploton et al. 2020; Kühn and Dormann, 2012). I would suggest an alternate strategy be employed. It also wasn't clear how this test/train split results were integrated. I think it was just in the model score?

- What is the impact of training on such a short timeseries of fire observations when some regions have fire return intervals of >100 years? Also how representative are those years chosen? Would it matter if you instead trained on 2006 - 2015 and tested on 2001 - 2005?

Small comments:

- Figure 7 is the same as the years trained upon so there is little interesting information here. Basically this is showing that it can do an ok job when tested over the same training region. Why not expand this out beyond the satellite era? How does this do from say 1900 on? Yes there is no satellite data but there are other means to check results (see e.g. Arora and Melton 2018)

- Didn't GFEDv4 offer some uncertainy bounds?

- Fig 8 to make a stronger demonstration that this is a signifcant improvement, what about plotting the models of FireMIP as further reference points? E.g. Hantson et al. 2020.

- line 41, a more up to date reference would be Lasslop et al. 2020 as it was done with more advanced models

- l 90: A good reference could be Rabin et al. 2017 as there are some figures showing explictly how the models differ.

- l 186 - to be clear, the 14 submodels were combined to produce the global estimates right? Would there be benefit from doing even more sub-regions? What about 20, 50, etc? Where are the diminishing returns here?

- L 276 - was this talking about the speed of creating DNN-Fire or DNN-Fire-GFED? Several minutes on a laptop? HPC?

- Fig 8 - it seems that DNN-Fire-GFED might be less variable than GFEDv4, is that correct? Is this due to the inputs to the ML or is it a result of the ML approach itself?

Lit. cited:

Chuvieco, E., Mouillot, F., van der Werf, G. R., San Miguel, J., Tanase, M., Koutsias, N., García, M., Yebra, M., Padilla, M., Gitas, I., Heil, A., Hawbaker, T. J., and Giglio, L.: Historical background and current developments for mapping burned area from satellite Earth observation, Remote Sens. Environ., 225, 45–64, 2019.

Padilla, M., Stehman, S. V., Ramo, R., Corti, D., Hantson, S., Oliva, P., Alonso-Canas, I., Bradley, A. V., Tansey, K., Mota, B., Pereira, J. M., and Chuvieco, E.: Comparing the accuracies of remote sensing global burned area products using stratified random sampling and estimation, Remote Sens. Environ., 160, 114–121, 2015.

Humber, M. L., Boschetti, L., Giglio, L., and Justice, C. O.: Spatial and temporal intercomparison of four global burned area products, Int J Digit Earth, 12, 460–484, 2019.

Lasslop, G., Thonicke, K., and Kloster, S.: SPITFIRE within the MPI Earth system model: Model development and evaluation, J. Adv. Model. Earth Syst., 6, 740–755, 2014.

Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera-Arroita, G., Hauenstein, S., Lahoz-Monfort, J. J., Schröder, B., Thuiller, W., Warton, D. I., Wintle, B. A., Hartig, F. and Dormann, C. F.: Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure, Ecography , 40(8), 913–929, 2017.

Ploton, P., Mortier, F., Réjou-Méchain, M., Barbier, N., Picard, N., Rossi, V., Dormann, C., Cornu, G., Viennois, G., Bayol, N., Lyapustin, A., Gourlet-Fleury, S. and Pélissier, R.: Spatial validation reveals poor predictive performance of large-scale ecological mapping models, Nat. Commun., 11(1), 4540, 2020.

Kühn, I. and Dormann, C. F.: Less than eight (and a half) misconceptions of spatial analysis, J. Biogeogr., 39(5), 995-998, 2012.

Meyer, H., Reudenbach, C., Wöllauer, S. and Nauss, T.: Importance of spatial predictor variable selection in machine learning applications - Moving from data reproduction to spatial prediction, Ecol. Modell., 411, 108815, 2019.

Hantson, S., Kelley, D. I., Arneth, A., Harrison, S. P., Archibald, S., Bachelet, D., Forrest, M., Hickler, T., Lasslop, G., Li, F., Mangeon, S., Melton, J. R., Nieradzik, L., Rabin, S. S., Prentice, I. C., Sheehan, T., Sitch, S., Teckentrup, L., Voulgarakis, A., and Yue, C.: Quantitative assessment of fire and vegetation properties in historical simulations with fire-enabled vegetation models from the Fire Model Intercomparison Project, Biogeosciences, https://doi.org/10.5194/gmd-2019-261, 2020.

Lasslop, G., Hantson, S., Harrison, S. P., Bachelet, D., Burton, C., Forkel, M., Forrest, M., Li, F., Melton, J. R., Yue, C., Archibald, S., Scheiter, S., Arneth, A., Hickler, T., and Sitch, S.: Global ecosystems and fire: multi-model assessment of fire-induced tree cover and carbon storage reduction, Glob. Chang. Biol., https://doi.org/10.1111/gcb.15160, 2020.

Rabin, S. S., Melton, J. R., Lasslop, G., Bachelet, D., Forrest, M., Hantson, S., Kaplan, J. O., Li, F., Mangeon, S., Ward, D. S., Yue, C., Arora, V. K., Hickler, T., Kloster, S., Knorr, W., Nieradzik, L., Spessa, A., Folberth, G. A., Sheehan, T., Voulgarakis, A., Kelley, D. I., Prentice, I. C., Sitch, S., Harrison, S., and Arneth, A.: The Fire Modeling Intercomparison Project (FireMIP), phase 1: experimental and analytical protocols with detailed model descriptions, Geoscientific Model Development, 10, 1175–1197, 2017.

Arora, V. K. and Melton, J. R.: Reduction in global area burned and wildfir emissions since 1930s enhances carbon uptake by land, Nat. Commun., 9, 1326, 2018.

Citation: https://doi.org/10.5194/gmd-2021-83-RC1
- AC1: 'Reply on RC1', Qing Zhu, 11 Aug 2021
  
  We very much appreciate the reviewers’ comments and feel that they have allowed us to substantially improve our manuscript. Please see the uploaded file of response letter.
  
  Citation: https://doi.org/10.5194/gmd-2021-83-AC1
RC2:
'Comment on gmd-2021-83', Matthias Forkel, 22 Aug 2021

Review of Zhu et al. 2021: “Building a machine learning surrogate model for wildfire activities within a global earth system model”

This study presents approach to build a deep learning-based model to better simulate burned area as part of an Earth system model. Although several machine learning and data-driven fire models were developed in the last years, this is a first study that directly aims to implement a deep neural network (DNN)-based fire model with a Earth system model. The paper is well written.

However, I have several questions and concerns.

1 Integration of DNN-based fire model with the Earth system model

The paper is not clear about how the DNN-based model with implemented with the Earth system model (ESM). For the title and abstract, I expect the DNN model was implemented in the ESM. This would allow analyses about how the improved simulation of fire affects the simulated carbon fluxes and stocks in the ESM. But as the paper does not represent such results, I assume that DNN-based fire model was just applied outside of the ESM and that both models were actually not coupled. Hence, I’m wondering how the authors to imagine to couple both models. Especially the final DNN-Fire-GFED setup simulates clearly different burned area then the original BASE-Fire or DNN-Fire models setups. This implies, that for example a much higher simulated burned area in Africa should result also in a much lower biomass in Africa and hence changes the fuel load variable as input to the DNN-fire models. In the coupled model, the DNN-Fire-GFED model would lead to results that are inconsistent with the feature space that was initially used to train the DNN-Fire model. Ideally, the authors should do a sensitivity analysis in the coupled DNN-Fire-GFED and ESM models to see if the results are still consistent and reliable. If this is not feasible, the authors should at least discuss how they would address such inconsistencies. I assume that only a joint optimization of fire and fuel loads/biomass in the coupled model would solve this issue (Drüke et al., 2019).

2 Training and testing

The authors trained a DNN model for each GFED region. Training the model for different regions is an unfair approach in comparison to process-based fire models as these models are truly global models, maybe with a PFT-dependent parametrization. Hence the authors should provide a good reasoning why they trained the model per GFED region. In addition, it does make sense at all that a fire model is parametrised per GFED region for an application in an Earth system model. As Earth system models are applied to assess future changes, a parametrisation per region will fast lead to useless results. For example, if climate and vegetation conditions change in future, which regional model should be applied in a certain region? Fire should be only simulated as a response to climate, vegetation and socioeconomic conditions. If regional parametrisation is necessary, the parameters should be based on vegetation or socioeconomic conditions.

The monthly burned area data from all grid cells in each regions was splitted randomely in 80% training data and 20% for testing. This is one of the simplest tests as the underlying conditions and statistical distribution of both samples is the same. However, in the context of an Earth system model, we expect non-stationary conditions and hence the model should be tested how well it can predict into 1) different regions, 2) different time periods (was done but the conditions in the two time periods are very similar), and 3) to different environmental conditions (Klemeš, 1986).

3 Input data

Most of the input data for the DNN model comes from climate, land use or socioeconomic datasets. Information on fuel loads, fuel wetness and temperature, however, was taken from ELMv1 model simulations. I wonder about how good are these simulated variables in comparison with independent (e.g. Earth observation) data. For example, any biases in simulated biomass will directly affect the simulated burned area. Please compare the simulated biomass and soil moisture with useful datasets. Alternatively, a residual analysis would be also useful to see if any errors in simulated burned area rea related to errors in the simulated input.

Can you please demonstrate that the tree cover from the LUH2 dataset is consistent with the simulated biomass. Are there any areas where the simulated biomass does not correspond to tree cover?

Specific comments

L 26-27: From this statement it is not clear if the DNN is implemented as part of the E3SM or if it is independent of the ESM and just returns the same output. Please clarify

L 30-31: It is not clear what the R2 means. Is it the R2 between the observed and predicted global annual total burned area in 2001 and 2015?

L 41: The statement should be updated with newer estimates, e.g. by (Lasslop et al., 2020)

L 78-93: You should clarify the scale of wildfire models. Fire behaviour models aim to model the spread and intensity of individual fires and are widely used in fire management. Fire models as parts of global vegetation or Earth system models have a different purpose. I assume that you are mainly addressing the second group of models, so please clarify it.

L 102-107: Here you should specify that the first group focus mostly on predicting large-scale regional fire dynamics, whereas the second group focus more on predicting fire in individual grid cells.

Chapter 2.2: The text might be easier to understand if you draw the network structure as a figure including all input variables, the hidden layers, neurons and output.

L 163-171: The description of the training of DNN-fire-GFED is not completely clear. From the text it reads that only the weights were readjusted by using observed GFED data. Does that mean that original bias parameters from DNN-Fire-BASE were kept? Is there any reasoning?

L 180: “spunup”

L 197-201: The readability would be improved if each equation is in a new line and not within the text line.

L 244: Should this be Figure 7? (Wu et al., 2021; Drüke et al., 2019)

L 273-275: Yes, but not many process-based fire models have been really calibrated. It would be good to provide examples in the text where this has been done.

L 276-277: The statement is not really valid as you do not calibrate the parameters of the process-based model but of the DNN-based model.

L 332-334: I do not understand this sentence because you previously wrote that you were training models for different regions and not a global model. Please clarify.

Table 1: It would be good to combine the columns data source and reference in one column. Otherwise it seems odd because population density and GDP do not have a data source.

Figure 1: check “burn” area

Figures 3, 5, 6: I recommend to combine these figures in one figure (with 4 columns per region) in order to directly compare the experiments in one plot. In addition, it would be good to also draw in a same way boxplots or violin plots of monthly burned area in order to check if the different experiments capture the statistical distribution of fire.

Figure 4: This figure includes a lot of spatial aggregation. Can you draw a density scatter plot of the original monthly data in the used 1.9 x 2.5° resolution?

Figure 7 b: Is this a global averaged seasonal cycle? How do the seasonal cycles look like in different GFED regions?

References

Drüke, M., Forkel, M., Bloh, W. von, Sakschewski, B., Cardoso, M., Bustamante, M., Kurths, J., and Thonicke, K.: Improving the LPJmL4-SPITFIRE vegetation–fire model for South America using satellite data, Geosci. Model Dev., 12, 5029–5054, https://doi.org/10.5194/gmd-12-5029-2019, 2019.

Klemeš, V.: Operational testing of hydrological simulation models, Hydrol. Sci. J., 31, 13–24, https://doi.org/10.1080/02626668609491024, 1986.

Lasslop, G., Hantson, S., Harrison, S. P., Bachelet, D., Burton, C., Forkel, M., Forrest, M., Li, F., Melton, J. R., Yue, C., Archibald, S., Scheiter, S., Arneth, A., Hickler, T., and Sitch, S.: Global ecosystems and fire: Multiâmodel assessment of fireâinduced treeâcover and carbon storage reduction, Glob. Change Biol., https://doi.org/10.1111/gcb.15160, 2020.

Wu, C., Venevsky, S., Sitch, S., Mercado, L. M., Huntingford, C., and Staver, A. C.: Historical and future global burned area with changing climate and human demography, One Earth, https://doi.org/10.1016/j.oneear.2021.03.002, 2021.

Citation: https://doi.org/10.5194/gmd-2021-83-RC2
- AC2: 'Reply on RC2', Qing Zhu, 07 Sep 2021
  
  The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2021-83/gmd-2021-83-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/gmd-2021-83-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Qing Zhu on behalf of the Authors (07 Sep 2021) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (20 Sep 2021) by Gerd A. Folberth

RR by Joe Melton (03 Oct 2021)

RR by Matthias Forkel (15 Oct 2021)

Suggestions for revision or reasons for rejection

Dear authors,

many thanks for adressing my comments and for revising the manuscript. However, some of your answers are not really to the point or are not reflected in the revised text.

I previously asked how well the simulated biomass captures observed biomass(e.g. remote sensing estmiates) as any biases in modelled biomass will cause errors in the simulation of fire. The authors responded to this question by comparing a global total estimate of biomass (Figure S6). This comparison is not meaningful because a global total biomass can originate from various regional patterns of biomass. I request to make a proper evaluation of the simulated biomass by e.g. comparing a map of simulated biomass with a map of observed biomass and making a difference map (e.g. Biomass CCI dataset). This will allow assessing if regional biases in simulated biomass (and hence other fuel properties) might cause biases in the simulation of burned area. Ultimately, it is completley unclear how such an differences would affect the simulations in a coupled model.

In the response it is written that "we tuned the DNN-Fire surrogate model towards ensemble mean with standard deviation across 14 GFED regions" which implies that the standard deviation was considered in the tuning. However, the standarad deviation of burned area is not included in the used cost function in equation 8. This needs to be clarified.

Figures 4 and S7: Some of the symbols/colours are used for two regions and hence cannot be distinguished (e.g. BONA and CEAS). The colours, symbols and legends need to be revised.

The analysis of the sensitivity of the results to different forcing data is interesting as it reflects actually a strong influence of the forcing data on the simulation result. In order to identify the most reliable forcing data a comparison with the observed burned area and biomass would be insightful. Which forcing dataset results in model simulation closest to the observations?
Figure S5: To make the plot more clear, it would be usefull to include the 1:1 line each panel.
Lines 360-370 are not clearly written and some sentences are repetitive. Please revise.

Figure 7 looks fuzzy and the colours are difficult to see.

Lines 101-102: "Although explicit processes are simulated, the accuracy of process-based wildfire models
are highly dependent on parameterization, which is computationally expensive" - This statement is mis-leading as indeed most process-based wildfire models like the models within FireMIP actually were never parametrised using computational approaches. Parameters were rather taken from literature sources during model development. A proper calibration of process-oriented models, i.e. by using a cost function and optimization algorithm was rarely done and otherwise the computation time is comparable to the time needed for the building and training of neural network models.

Finally, I'm not really convinced by the study. Although you can nicely demonstrate that the DNN can 1) reproduce the simulated burned area of the Earth system model and 2) it also captures regional total burned area from observations, the study does not contribute to an improved "understanding of human, climate, and
ecosystem controls on fire number, fire size, and burned area" (as motivated in the abstract). As the DNN models are only trained against burned area, no statements about number and size of fires can be done. Furthermore, the ecosystem controls are mostly represented by the input variables tree cover and biomass. As tree cover is always described from an input dataset and biomass is used from the base simulation, the DNN models are actually inconsistent because they have been trained gainst different sources and magntiude of burned area (but always using the same magnitude of tree cover and biomass!). As fire has a clear reducing effect on tree cover and biomass (Lasslop et al. 2020), it is obvious that the relation between fire, biomass and tree cover sis actually inconsitent in some DNN models because the spatial patterns of biomass and tree cover do not correspond to the patterns of fire. Hence the approach cannot give meaningful insights on how ecosystem controls affect fire and I am not convinced that this can be transfered to a coupled model.
The abstract and discussion needs to be revised in order to make this limiation clear.

Hide

RR by D. I. Kelley (21 Jan 2022)

ED: Publish subject to minor revisions (review by editor) (23 Jan 2022) by Gerd A. Folberth

AR by Qing Zhu on behalf of the Authors (02 Feb 2022) Author's response Author's tracked changes Manuscript

ED: Publish as is (03 Feb 2022) by Gerd A. Folberth

AR by Qing Zhu on behalf of the Authors (04 Feb 2022) Manuscript

Short summary

Wildfire is a devastating Earth system process that burns about 500 million hectares of land each year. It wipes out vegetation including trees, shrubs, and grasses and causes large losses of economic assets. However, modeling the spatial distribution and temporal changes of wildfire activities at a global scale is challenging. This study built a machine-learning-based wildfire surrogate model within an existing Earth system model and achieved high accuracy.