the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
ML4Fire-XGBv1.0: Improving North American wildfire prediction by integrating a machine-learning fire model in a land surface model
Abstract. Wildfires have shown increasing trends in both frequency and severity across the Contiguous United States (CONUS). However, process-based fire models have difficulties in accurately simulating the burned area over the CONUS due to a simplification of the physical process and cannot capture the interplay among fire, ignition, climate, and human activities. The deficiency of burned area simulation deteriorates the description of fire impact on energy balance, water budget, and carbon fluxes in the Earth System Models (ESMs). Alternatively, machine learning (ML) based fire models, which capture statistical relationships between the burned area and environmental factors, have shown promising burned area predictions and corresponding fire impact simulation. We develop a hybrid framework (ML4Fire-XGB) that integrates a pretrained eXtreme Gradient Boosting (XGBoost) wildfire model with the Energy Exascale Earth System Model (E3SM) land model (ELM) version 2.1. A Fortran-C-Python deep learning bridge is adapted to support online communication between ELM and the ML fire model. Specifically, the burned area predicted by the ML-based wildfire model is directly passed to ELM to adjust the carbon pool and vegetation dynamics after disturbance, which are then used as predictors in the ML-based fire model in the next time step. Evaluated against the historical burned area from Global Fire Emissions Database 5 from 2001–2020, the ML4Fire-XGB model outperforms process-based fire models in terms of spatial distribution and seasonal variations. Sensitivity analysis confirms that the ML4Fire-XGB well captures the responses of the burned area to rising temperatures. The ML4Fire-XGB model has proved to be a new tool for studying vegetation-fire interactions, and more importantly, enables seamless exploration of climate-fire feedback, working as an active component in E3SM.
- Preprint
(5757 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on gmd-2024-151', Anonymous Referee #1, 28 Sep 2024
This manuscript builds on previous work that used climate forcing observations and vegetation model–derived vegetation outputs to build a fire model over the continental U.S. (CONUS) using the XGBoost machine learning algorithm. Here, the authors couple that fire model back into the ELM land and vegetation model, resulting in marked improvements relative to the built-in, process-based ELM fire model in terms of total burned area, its seasonal timing, and its interannual variability. There is (as expected) some decrease in performance relative to the uncoupled ML fire model, but not much. The authors also compare their ELM simulations with other process-based fire models in the FireMIP experiments. The manuscript is mostly well-structured, the figures are easy to understand, and the writing is for the most part clean and clear.
Process-based fire models are notoriously complicated and uncertain, so I am quite interested in the potential of machine learning to supplement, complement, or even replace them. However, I have serious concerns about the usefulness of the particular model system described here. I also have various less-severe but still-important concerns related to methodological and analytical issues.
To some extent these can be addressed by expanding the Discussion and adding subsections for organization. The authors should reduce the amount of space in the Discussion dedicated to reiterating already-stated results, instead only re-presenting results as needed to support new assertions. However, my fundamental concern about the usefulness of the model system presented here will require a fair amount of additional work. I thus recommend this paper be reconsidered after major revisions.
See attached PDF for my detailed remarks.
-
AC1: 'Reply on RC1', Ye Liu, 11 Nov 2024
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2024-151/gmd-2024-151-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Ye Liu, 11 Nov 2024
-
RC2: 'Comment on gmd-2024-151', Matthew Kasoar, 29 Oct 2024
The authors present a new two-way coupling of an XGBoost machine learning fire model over the contiguous US, with the ELM land surface model, a derivative of the widely-used CLM land-surface and dynamic vegetation model, which can be run as an alternative to the process-based Li et al. scheme currently used within ELM (and CLM). The XGB fire model performs very well at reproducing the observation-based training dataset (GFED5 burned area) over the CONUS. The authors also compare against BA simulations from several process-based models, and note that agreement is better over regions of the CONUS where burned area is mainly driven by climate, and poorer over regions where there is a strong human influence on the burned area signature, e.g. due to crop and pasture fires, thereby highlighting another potential use of ML models to help identify key process that should be better represented in their process-based counterparts.
ML methods show a lot of promise when it comes to more accurately parameterising sub-grid phenomena, including wildfire prediction which is notoriously uncertain among current process-based and simple empirical models, and so I really welcome initiatives like this to interactively couple data-driven fire models with a dynamic vegetation model to provide an alternative to the existing process-based scheme, depending on the desired application. I have some high-level concerns detailed below about the current presentation of the model description and validation; hopefully most of these can be resolved through additional discussion and clarifications - and very happy to be corrected if I'm mistaken or have misunderstood anything! I also have some recommendations for additional analysis and validation that I think could be beneficial. I then finally list a few very minor technical comments/clarifications.
In terms of the big picture motivation of the paper, the main new development is the interactive coupling of the XGB fire model with the ELM land surface and dynamic vegetation model. Therefore, it seems strange that no results or analysis are presented showing the feedbacks that are possible because of this coupling. All the model outputs presented are focused purely on burned area validation - where in fact, the coupled ELM-ML4Fire-XGB model performs slightly worse than just using offline XGB, presumably due to the coupling feedbacks which influence the vegetation distribution. So as it stands, the paper doesn't really motivate why coupling these models is desirable; if you just want burned area accuracy, it's better to use the XGB model offline. The key benefit is presumably the feedbacks on vegetation distribution, carbon fluxes, etc. One would imagine that the interactive vegetation distribution in ELM is improved when it's impacted by a more realistic fire distribution, or that the feedbacks on vegetation due to changing fire regimes over time are better simulated. So, it would be nice to see some results showing how vegetation-related variables are impacted by the coupling, as is this presumably the main advantage of having such a coupled model.
Regarding the model description (Section 2): Though I appreciate that the underlying land-surface model (ELM) and XGBoost wildfire model have been documented previously (though, the current XGB implementation appears slightly updated re. the datasets used, time period etc.), given that the coupling between these models is the central development of this manuscript I felt that the details of the models (particularly XGB) and the coupling were a bit brief, and it was hard to figure out the answer to certain relevant details. In particular:- What are the respective model timesteps, and what is the coupling timestep? The XGB model was trained (I think?) to predict GFED5 monthly BA, so does this mean it runs on a monthly timestep? But, on P6, L62-63, the authors say "All the datasets are resampled to 0.25×0.25 spatial and annual temporal resolution" - so does this mean that it actually runs on an annual timestep? But then, in the coupled model, it's described that the output of XGB is passed to ELM to affect land surface properties at the next timestep, and vice versa - I don't fully understand how this works if XGB is being trained with annual inputs to predict monthly or annual GFED5 BA. ELM (I would assume?) has a much shorter timestep than annual/monthly, at least for properties like surface fluxes, soil moisture, LAI etc., as well as for the meteorological driving data used as input to the coupled model?
- What was the spatial resolution of ELM - does it match the 0.25 degree GFED5 grid that the XGB model (presumably?) provides output on?
- There's insufficiently clear information about the XGB training process - the details are spread out in different parts of Section 2, and it's hard to work out exactly what were the inputs (including the time resolution, any dimension reduction that was applied, etc.), what was the target output, and what data was reserved for training vs validation.
- "To reduce overfitting, we build a separate ML model for each year from 2001 to 2020 using the remaining 19 years’ data" - I would stress that I'm not an ML expert, so maybe this is a simple lack of subject knowledge on my part. But it's unclear to me what is meant here - how is a separate model trained for each year, using data from other years? If the model is trained to predict the BA in one year based on the meteorological data of other years, it's not clear how it would learn the correct relationships. Or do the authors mean, that it is trained to predict BA relationships for all the other years, and then the trained model is applied to the one year that was left out, as the validation data? I'd appreciate if this could just be clarified a bit. Additionally: does this mean that the model(s) can only be used for years between 2001-2020? If so, that would seem to greatly limit it's usefulness for exploring future scenarios.
- As with the points above - the nice schematic (Fig 1) shows the same meteorological and fire-specific input datasets being passed to ELM and the process-based fire model as to the XGB model, but it's unclear whether these inputs are provided at the same temporal and spatial resolution to the respective models, or whether there are intermediate pre-processing steps. I'm not sure how easy this is to depict in the schematic, but as mentioned it's also a little unclear from the text and table of inputs as well.
- As I understand it, the XGB model is initially trained using PFT distributions diagnosed from a prior run of the ELM model using its processed-based fire scheme. However, the process-based scheme predicts a different fire distribution to the XGB model. Does this therefore introduce an inconsistency, i.e. that the XGB model is trained on PFT distributions that are predicated on the wrong spatial distribution of fires? Could this be improved by e.g. repeated iterations of running the ML4Fire-XGB coupled model to update the PFT distributions, and then re-training the XGB model? It would be good if the authors could comment on this.
Regarding the comparison of burnt area results against four FireMIP models:- Why those particular 4 models? E.g., the authors note that none of the models they compare against included a crop fire scheme, which is potentially one reason for poor performance over central US. However a couple of the FireMIP models that are not included here, did have crop schemes - so it seems odd to omit these. To be clear, I fully expect the ML4Fire-XBG model to outperform all the FireMIP models, it just seems a bit arbitrary why the comparison is made only against these four, out of nine FireMIP models that were included in the Rabin et al. (2017) paper. If, for instance, these were the four best performing models over the CONUS, then it could make sense to compare against just these rather than against all of them. But if that is the rationale, I couldn't see it mentioned anywhere (happy to be corrected though).
- All the figures comparing burnt area are described as an average over 2001-2020. However, the FireMIP experiments that are cited in Rabin et al. (2017) only went up to 2013. Even the most recent round of FireMIP (aka ISIMIP3a) I think only goes up to 2019. So as far as I can understand, the FireMIP data can't be for the same time period.
- The authors don't mention or discuss (as far as I could see) some very important caveats which really need to be attached to the comparison with FireMIP models. In particular, it should be noted that the process-based FireMIP models were run with different reanalysis driving data, at a much coarser spatial resolution. I appreciate that being able to run much higher resolution is a potential advantage of using an ML model. But, it needs to be acknowledged that it's not a like-for-like comparison of pure model process accuracy. The different driving data (from a different, global reanalysis product, provided at a much lower resolution than the North America-specific reanalysis data that the XGB model is driven by) is potentially a very important factor - a fairer comparison of performance would be to run the XGB model driven by the FireMIP driving data.
Regarding the discussion and model validation:- As mentioned above, it would be good to have some more quantitative discussion on the advantages of having the coupling, e.g. for vegetation distribution
- All the comparison of BA performance is performed against GFED5, which is the same data that the model was trained on. Arguably, it's quite unsurprising that an ML model trained to predict GFED5 over CONUS from 2001-2020, would do better at predicting GFED5 over CONUS from 2001-2020, compared with global process-based models that were not specifically optimised to do this. Ideally, performance would be evaluated with out-of-sample tests - for example, by running the ML4Fire-XGB model with the FireMIP inputs as mentioned previously, or by comparing against alternative datasets and/or over different time periods to the training period.
- On that note: the authors assume GFED5 is the ground truth in evaluating that the XGB model outperforms process-based models, but it should be acknowledged that there is a large uncertainty in the observation-based BA data. This observational uncertainty should also be addressed, for example by including comparisons not just against GFED5, but against alternative BA datasets that are available, for example the USGS Landsat Burned Area product for the CONUS, or one of the FireCCI BA products.
- P16, L51-53: "However, the ML-fired process exhibits high accuracy, as demonstrated by the Offline-XGB model, making it a reliable tool for evaluating the fired area under different warming scenarios" - the authors show that the XGB model captures well the trend in GFED5 BA due to warming during the 2001-2020 period. However, this is the same period that the model was trained to perform well on. How reliable an indication is this that the model will still be accurate under high-end future warming scenarios, where the degree of climate change over the US will substantially exceed that observed over 2001-2020? This should be discussed.
Minor/technical clarifications:P2, L27-29: "Over the globe, climate change has contributed to a 16% increase in the global burned area over the past two decades, while human influences, including ignition and suppression, have reduced by 27% (Burton et al., 2023; Jones et al., 2022)" - as it reads, I don't think this isn't an accurate paraphrasing of the studies being referenced. Currently it reads (to me) like: there has been a 16% increase in BA over the last two decades, to which climate has been a main driver, while the influence of humans has reduced by 27%. This isn't what either of these studies showed. Burton et al. find (based on FireMIP model data) that climate change since 1901 has made average BA (median over the 2003-2019 period) 16% higher than it would have been if climate stayed fixed at circa ~1901. However, they also find that human influences have made median BA 19% lower in the present compared with early 20th Century, suggesting the net effect over the 20th Century is a small decline in BA. Jones et al. show that, in MODIS BA data, total global BA has declined by 27% over the last two decades. This is similar to previously reported results from GFED4 and GFED5, which both show ~24% declines in total global BA over the last two decades. The reason for this decline has been attributed mainly to human influences (c.f. also Andela et al. 2017).
(As an aside: since the present manuscript was submitted, the Burton et al. (2023) pre-print that is referenced has now been published as a final article, and so the citation should be updated accordingly: https://www.nature.com/articles/s41558-024-02140-w)
Section 2.2.3: From the looks of it, the authors use existing Level 1 EPA ecoregions for their analysis regions 1, 2, and 3, but then for their regions 4 and 5 they split the Eastern Temperature Forest Level 1 EPA region into two. What was the rationale for subdividing this region but not the others? Also, this section seems oddly located in the middle of the model description, in between the description of the individual model components (2.1) and the description of the coupling (2.3), even though it relates only to the analysis of the final results and is not part of the model description. It would be better to have this section on analysis methods after all the description of the models and model coupling, I think.
P8, L04-05: "According to the GFED5, the CONUS experiences an averaged burned area fraction (BAF) of 0.6–0.9% yr-1 (4.8–7.1 Mha yr-1), which is consistent with Chen et al., (2023)" - Not quite sure what the authors intended here. Chen et al. (2023) is itself the GFED5 burned area description paper, so trivially GFED5 burnt area is consistent with itself.
P8, L05-06: "High-burned areas are predominantly observed in the WUS" - this seems an confusing statement, since the authors then go on to list other areas which have higher BA than the WUS, and indeed Figure 3 seems to show other areas of the US where BA is higher and more widespread.
P10, L33-34: "indicating that the ML model effectively describes crop fire thereby utilizing data on crop fraction and LAI" - Is this referring to the crop PFT fraction in the ELM model? (Rather than agricultural land use fraction, which isn't listed as an input)?
P10, L52-53: "Notably, none of the process-based models has activated the explicit cropland fire model. That says all vegetation models treat pastures as natural grasslands." - This statement is slightly confusing and conflates two things. Pasture is not the same as cropland, and they are usually represented as different land cover types in DGVMs. Similarly crop residue burning is a very different fire regime to pasture fires.
Citation: https://doi.org/10.5194/gmd-2024-151-RC2 -
AC2: 'Reply on RC2', Ye Liu, 11 Nov 2024
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2024-151/gmd-2024-151-AC2-supplement.pdf
-
AC3: 'Reply on RC2', Ye Liu, 11 Nov 2024
Publisher’s note: this comment is a copy of AC2 and its content was therefore removed on 12 November 2024.
Citation: https://doi.org/10.5194/gmd-2024-151-AC3
-
AC2: 'Reply on RC2', Ye Liu, 11 Nov 2024
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
256 | 77 | 107 | 440 | 6 | 4 |
- HTML: 256
- PDF: 77
- XML: 107
- Total: 440
- BibTeX: 6
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1