ELM2.1-XGBfire1.0: improving wildfire prediction by integrating a machine learning fire model in a land surface model

Liu, Ye; Huang, Huilin; Wang, Sing-Chun; Zhang, Tao; Xu, Donghui; Chen, Yang

doi:https://doi.org/10.5194/gmd-18-4103-2025

Articles | Volume 18, issue 13

https://doi.org/10.5194/gmd-18-4103-2025

Articles | Volume 18, issue 13

Model description paper

04 Jul 2025

Model description paper |

| 04 Jul 2025

ELM2.1-XGBfire1.0: improving wildfire prediction by integrating a machine learning fire model in a land surface model

Ye Liu, Huilin Huang, Sing-Chun Wang, Tao Zhang, Donghui Xu, and Yang Chen

Abstract

Wildfires have shown increasing trends in both frequency and severity across the contiguous United States (CONUS). However, process-based fire models have difficulties in accurately simulating the burned area over the CONUS due to a simplification of the physical process and cannot capture the interplay among fire, ignition, climate, and human activities. The deficiency of burned area simulation deteriorates the description of fire impact on energy balance, water budget, and carbon fluxes in the Earth system models (ESMs). Alternatively, fire models based on machine learning (ML), which capture statistical relationships between the burned area and environmental factors, have shown promising burned area predictions and corresponding fire impact simulation. We develop a hybrid framework (ELM2.1-XGBFire1.0) that integrates an eXtreme Gradient Boosting (XGBoost) wildfire model with the Energy Exascale Earth System Model (E3SM) land model (ELM) version 2.1. A Fortran–C–Python deep learning bridge is adapted to support online communication between ELM and the ML fire model. Specifically, the burned area predicted by the ML-based wildfire model is directly passed to ELM to adjust the carbon pool and vegetation dynamics after disturbance, which are then used as predictors in the ML-based fire model in the next time step. Evaluated against the historical burned area from Global Fire Emissions Database 5 from 2001–2019, the ELM2.1-XGBFire1.0 outperforms process-based fire models in terms of spatial distribution and seasonal variations. The ELM2.1-XGBFire1.0 has proven to be a new tool for studying vegetation–fire interactions and, more importantly, enables seamless exploration of climate–fire feedback, working as an active component of E3SM.

Download & links

Article (PDF, 5480 KB)

Download & links

How to cite.

Received: 04 Aug 2024 – Discussion started: 30 Aug 2024 – Revised: 30 Nov 2024 – Accepted: 31 Mar 2025 – Published: 04 Jul 2025

1 Introduction

Recent wildfire outbreaks worldwide have raised alarms due to wildfires burning longer and more intensely in many regions, posing significant threats to human livelihoods and biodiversity. In the past 2 decades, satellite-derived data suggest that the global total burned area has declined by over 20 %, which is primarily attributed to human influences (Jones et al., 2022; Andela et al., 2017). However, the contiguous United States (CONUS) has emerged as a hotspot for wildfires, where both climate change and human activities have fueled a 42 % increase in burned area (Jones et al., 2022). Such expansive burned areas release an average of 162×10⁶ t of CO₂ and 0.9×10⁶ t of PM_2.5 annually into the atmosphere, resulting in over USD 200 billion health costs due to exposure to wildfire smoke (Global Wildfire Information System, 2024; JEC, 2023). Accurate prediction of wildfire risks has become an urgent need.

Traditional fire models, predominantly process-based models, simulate the behavior of individual wildfires using theoretical equations for ignitions and fire spread (Hantson et al., 2016). These models explicitly simulate the number and size of individual fires by incorporating parameterizations and parameters derived from laboratory or field experiments and typically estimate the burned area by scaling up to the grid-cell level (Lasslop et al., 2014; Pfeiffer et al., 2013; Yue et al., 2014; Li et al., 2012; Thonicke et al., 2010; Huang et al., 2020, 2021; Arora and Boer, 2005; Burton et al., 2019). While process-based wildfire models are effective in simulating global burned area distribution (Hantson et al., 2020), they often fall short when accurately predicting the extent and temporal changes of wildfires over the CONUS (Forkel et al., 2019; Teckentrup et al., 2019). The climate and vegetation controls on the CONUS burned area and their relative importance are incorrectly represented, leading to failures in burned area predictions regarding both spatial distribution and temporal variations (Forkel et al., 2019). Human ignition and suppression are assumed to be linearly or log-linearly related to population density and the gross domestic product (GDP), respectively (Jones et al., 2023; Li et al., 2013). This assumption overlooks a more nuanced picture of human activities, such as road density, cultural differences, agricultural activities, and forest management policy (Jones et al., 2022; Villarreal et al., 2022; Hanan et al., 2021; Miller et al., 2009; Turco et al., 2023; Haas et al., 2022). Process-based fire models are often integrated with biogeochemical process-enabled land models (hereafter referred to as BGC models) within Earth system models (ESMs) to predict fire disturbances to carbon allocation, which is then used to update energy balance, water budget, and carbon fluxes in the land model. Incorrect simulation of burned areas over the CONUS induces large uncertainties in the assessment of fire impacts using ESMs.

Recent advances have explored the application of machine learning (ML) techniques in wildfire prediction (e.g., Buch et al., 2023; Li et al., 2023; Wang et al., 2021; Zhu et al., 2022). ML models offer the advantage of capturing nonlinear dependencies and complex interactions between driving factors and fire dynamics without the need for the explicit understanding of physical processes (Rodrigues and de la Riva, 2014). Zhu et al. (2022) presented a deep neural network (DNN) scheme that surrogated the process-based wildfire model with the Energy Exascale Earth System Model (E3SM) interface, demonstrating over 90 % higher accuracy in simulating the global burned area. Wang et al. (2021) combined the local predictors, large-scale meteorological patterns, and the eXtreme Gradient Boosting (XGBoost) algorithm to build an ML wildfire model, which improves the temporal correlations of burned areas in several regions over the CONUS by 14 %–44 %. Buch et al. (2023) developed a novel stochastic machine learning (SML) framework, SMLFire1.0, with a high spatial resolution of 12 km over the western US (WUS).

The newly developed ML fire models often focus on wildfire properties such as burned area, fire count, and fire emissions (Wang et al., 2021; Buch et al., 2023). Despite the improved fire predictions, fire impacts on the ecosystem, climate, and human community cannot be evaluated without integrating the wildfire process into the Earth system. In addition, climate change impacts on the burned area, either directly through fire weather conditions or indirectly through ecosystem productivity, vegetation type, fuel loads, and fuel moisture, cannot be fully understood without explicitly representing the complex interplays between climate, ecosystems, and fire. For instance, a warmer and drier climate has been shown to cause an 8-fold rise in the high-severity burned area from 1985 to 2017 over the WUS (Parks and Abatzoglou, 2020). The corresponding changes in fire dynamics may shift the vegetation species distribution from those originally low in resistance to wildfire to those in high resistance or even benefiting from regular fire occurrence (Rogers et al., 2015; Huang et al., 2024). The fire-adapted vegetation species, in turn, facilitate the frequent occurrence of wildfires. Taking this into consideration, a full coupling of fire, ecosystem, and climate is required to better predict fire changes and the corresponding impacts in a future climate.

Leveraging the accuracy of ML-based wildfire models and the representation of ecosystem–climate interactions in ESMs, in this study, we develop a novel hybrid framework to integrate a pretrained ML wildfire model with the E3SM land model (ELM) to study the full atmosphere–vegetation–wildfire feedback. This integration facilitates a dynamic feedback loop where outputs from the ML model (i.e., predicted burned areas) inform the land surface processes in ELM, which in turn update the inputs for the ML model for subsequent predictions. This approach leverages the detailed physical understanding of surface biogeophysical and biogeochemical processes provided by ELM and the predictive power of ML-based wildfire models to create a more accurate and robust framework for wildfire prediction and impact assessment. The remaining sections are arranged as follows: Sect. 2 introduces the ELM and ML wildfire model training method, coupling strategy, and datasets used in this study; Sect. 3 presents the simulated burned area compared with observations and several process-based fire models; and Sect. 4 contains the discussion and conclusion.

2 Materials and methods

2.1 Model description

2.1.1 Default wildfire model in ELM

The ELM is part of the E3SM project which started with a version of the Community Earth System Model (CESM1). The ELM default wildfire module originated from the Community Land Model (CLM4.5) (Li et al., 2012). This wildfire model calculates burned areas by multiplying the number of wildfires and burned area per fire on a grid-cell level. The number of wildfires (fire count) is derived using anthropogenic and natural ignition sources, fuel load and combustibility, surface meteorology, and anthropogenic suppression. The natural ignition source is derived from the number of cloud-to-ground lightning flashes multiplied by a constant ignition efficiency (Prentice and Mackerras, 1977). Anthropogenic ignitions are simply parameterized using a fixed number of potential anthropogenic ignitions by a person and population density (Venevsky et al., 2002). Humans also suppress wildfires. The capability of fire suppression is assumed to be a function of GDP and population density. The ignition efficiency is also altered by fuel conditions, including the fuel load (aboveground biomass) and fuel combustibility (approximated using relative humidity, temperature, and top or root zone soil moisture). The spread of each fire is approximated using an ellipse shape, with its length-to-breadth ratio determined by wind speed and fuel moisture (Rothermel, 1972). This simple concept captures the major constraints for predicting the global wildfire distribution and seasonal variations well (Rabin et al., 2017; Li et al., 2014; Huang et al., 2020).

Like many other process-based wildfire models, the default fire model in ELM benefits from the full ecosystem interactions from its hosting land model as well as the potential to be coupled with atmospheric models. With the BGC processes being turned on, ELM-BGC reallocates carbon and nitrogen in leaf, wood, root, litter, and soil pools after fire based on carbon combustion and mortality rate dependent on plant functional type (PFT). The biogeochemical changes subsequently influence biogeophysical properties such as leaf area index (LAI), vegetation canopy height, and albedo, disturbing the land–atmosphere exchanges of energy and water fluxes. The post-fire vegetation recovery in ELM-BGC depends on the plant photosynthesis processes and PFT competition strategy for soil resources. The interactions between wildfire and vegetation under historical climate have been thoroughly assessed in CLM long-term simulations (Li and Lawrence, 2017). The model framework is illustrated in Fig. 1. Hereafter the ELM coupled with the process-based fire model is referred to as ELM-BGC.

https://gmd.copernicus.org/articles/18/4103/2025/gmd-18-4103-2025-f01

Figure 1Schematic diagram of the hybrid model framework.

Download

2.1.2 Machine learning wildfire model

The XGBoost-based wildfire model has proven to outperform process-based models in predicting burned areas over the CONUS (Wang et al., 2021). XGBoost is a highly efficient and scalable implementation of gradient boosting, designed for performance and speed (Chen and Guestrin, 2016). It builds sequential decision trees to correct errors from previous models, using techniques like regularization to prevent overfitting and parallel processing for faster computation. In this study, we adapt the XGBoost algorithm used in Wang et al. (2021) to develop an offline ML fire model using variables directly provided by ELM at each grid cell. Wang et al. (2021) integrated large-scale meteorological patterns alongside local weather, land surface properties, and socioeconomic data to enhance the prediction of burned areas. The large-scale patterns were identified using singular value decomposition (SVD) to capture influential atmospheric conditions that develop over days to weeks and cumulatively impact the monthly burned area. The feature importance analysis in their study noted that while large-scale patterns improved prediction, however, they played a secondary role. Therefore, we exclude the large-scale patterns from predictors without significantly affecting the model accuracy. Hereafter the uncoupled XGBoost fire model is referred to as offline-XGB.

2.1.3 Hybrid modeling framework

The offline-XGB model is integrated with the ELM using the ML4ESM coupling framework. The ML4ESM framework offers a robust and flexible solution for integrating ML parameterizations into ESMs through a Fortran–Python interface (Zhang et al., 2024). It supports popular ML libraries such as PyTorch, TensorFlow, and Scikit-learn, enabling the seamless incorporation of ML algorithms to represent complex climate processes like convection and wildfire dynamics. The interface leverages C language as an intermediary for efficient data transfer by accessing the same memory reference instead of the extra data copy or through files, minimizing memory overhead and computational inefficiencies. A C hub is then used to communicate variables from the Fortran-written ELM and the Python-written ML fire model. In our application, all surface meteorology, lightning, and socioeconomic data alongside the ELM simulated fuel conditions are passed to the ML-based fire model to predict the burned area. The burned area is returned to ELM to calculate fire impacts and update surface properties.

2.2 Datasets and processing

2.2.1 Burned area datasets

The primary dataset for training and validating the ML-based model is the Global Fire Emissions Database version 5 (GFED5) (Chen et al., 2023). The GFED5 is a succession of GFED4s (van der Werf et al., 2017), which we also use as an additional reference dataset. GFED4 is generated by fusing multiple streams of remote sensing data to create a 24-year (1997–2020) dataset of the monthly burned area at 0.25° spatial resolution. During 2001–2020, the GFED5 comprises the Moderate Resolution Imaging Spectroradiometer (MODIS) MCD64A1 burned area product (Hall et al., 2016; Giglio et al., 2016; Giglio et al., 2018) with adjustments for the errors of commission and omission. Adjustment factors are estimated based on region, land cover, and tree cover fraction using spatiotemporally aligned burned areas from Landsat or Sentinel-2 (Claverie et al., 2018). Because of a new fire detection method that significantly boosts the area of small fires, the CONUS annual burned area increases from 2.36 Mha in GFED4s to 6.04 in GFED5, which is primarily contributed to by the increase of crop fires from 0.83 to 3.09 Mha.

The FireCCI5.1 is obtained as another reference dataset (Chuvieco et al., 2019). FireCCI5.1 maps fires at 250 m resolution using the spectral information from MODIS in combination with the thermal anomalies. FireCCI5.1 has been reported to heavily underestimate the total burned area mainly due to the underrepresentation of small fires (Lizundia-Loiola et al., 2020).

Besides observations, we also obtain burned area from seven state-of-the-art process-based wildfire models participating in the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP3a) (Burton et al., 2024b), including the Canadian Land Surface Scheme Including Biogeochemical Cycles (CLASSIC) (Melton et al., 2020), the Simplified Simple Biosphere model coupled with the Top-down Representation of Interactive Foliage and Flora Including Dynamics model (SSiB4-TRIFFID-Fire) (Huang et al., 2020, 2021), the SPread and InTensity of FIRE (SPITFIRE) coupled with the Organizing Carbon and Hydrology In Dynamic Ecosystems (ORCHIDEE) (Yue et al., 2014), the Joint UK Land Environment Simulator (JULES) coupled with the INFERNO fire model (Mathison et al., 2023; Mangeon et al., 2016), the LPJ-GUESS dynamic global vegetation model coupled with the SPITFIRE (LPJ-GUESS-SPITFIRE) and SIMple FIRE model (SIMFIRE) (Knorr et al., 2016) and BLAZe induced biosphere–atmosphere flux Estimator (BLAZE) (LPJ-GUESS-SIMFIRE-BLAZE) (Rabin et al., 2017), and the Vegetation Integrative Simulator for Trace gases (VISIT) (Ito, 2019). Driven by GSWP3-W5E5 historical climate forcing (Cucchi et al., 2020; Lange et al., 2021), these models provide the monthly burned area at 0.5° spatial resolution from 1901–2019. The multi-model output during 2001–2019 is used in this study. We also performed the benchmarking simulation using the built-in process model in ELM-BGC.

The process-based models differ from one another in not only their dynamic global vegetation models (DGVMs) but also the complexity of their fire models. ELM-BGC and SSiB4-TRIFFID utilized the same fire model from Li et al. (2012), LPJ-GUESS-SPITFIRE and ORCHIDEE, both coupled with SPITFIRE. Other models incorporate their own unique fire modules. The representation of fires over croplands and pastures varies across models (Burton et al., 2024a; Teckentrup et al., 2019). Most models, except for JULES, classify croplands as non-burnable. JULES treats croplands similarly to natural grasslands, while all other models exclude croplands from burning. Most models do not include pasture as a PFT and, therefore, do not distinguish pastures from grasslands in terms of both growth and fire behavior. In LPJ-GUESS-SIMFIRE-BLAZE, pastures are harvested, leading to reduced biomass and consequently a smaller burned area. The difference among process-based models is discussed in Sect. 4.

2.2.2 Surface meteorological, lightning, and socioeconomic datasets

Surface meteorological variables, including temperature, humidity, wind speed, downward shortwave radiation, downward longwave radiation, precipitation, and surface pressure, are obtained from NLDAS-2 (Phase 2 of the North American Land Data Assimilation System), forcing fields to both drive the ELM and construct the training set for the ML fire model. This dataset combines multiple sources of observations (such as precipitation gauge data, satellite data, and radar precipitation measurements) to produce estimates of climatological properties at or near the Earth's surface at hourly temporal resolution and $1 / 8$ ° grid spacing. We use the temperature, relative humidity, specific humidity, wind speed, and precipitation directly from NLDAS-2 to train the ML fire model. Additionally, we calculate the standardized precipitation evapotranspiration index (SPEI) following Beguería et al. (2014) and the vapor pressure deficit (VPD) based on NLDAS-2 dataset as additional input for the ML model (Table 1). We coarsen this dataset to 0.25° to align with burned area datasets.

In addition to surface meteorological forcing, while identical to those used by ISIMIP3a fire models, we acquire lightning and socioeconomic datasets from multiple sources. The 2 h climatology lightning flash data from NASA LIS/OTD v2.2 at 2.5° resolution are used to calculate the number of natural ignitions. Lightning data are aggregated by summing the 2 h data to derive monthly climatological means, and these monthly climatologies are repeated across all years, disregarding interannual variations. The annual gridded population density data are acquired from Klein Goldewijk et al. (2017), while the GDP per capita is from the World Bank (https://data.worldbank.org/, last access: 20 June 2024), and both are assigned constant values for all months within each corresponding year. All datasets are spatially resampled to a 0.25° × 0.25° grid using bilinear interpolation. To train the ML model, additional inputs, including top-layer soil moisture, LAI, and spatial fraction of each plant functional type (PFT), are simulated by ELM (explained further in Sect. 2.3).

Table 1Meteorological forcing, land surface properties, and fire-specific inputs for driving the ELM-BGC and training the offline-XGB fire model.

Download Print Version | Download XLSX

2.3 Model configuration and offline-XGB training and coupling processes

In ELM-BGC, vegetation properties, including canopy height and LAI, vary with carbon allocation and distribution driven by climate variability and disturbances such as wildfires. To bring the model's carbon and nitrogen pools into equilibrium, we first conduct long-term spin-up simulations as suggested by Lawrence et al. (2011). We adopt a two-step approach consisting of a 400-year accelerated decomposition (AD) spin-up followed by a 400-year regular spin-up, driven by cycling NLDAS-2 meteorological forcing from 1981 through 2000. In the AD spin-up, acceleration factors are applied to accelerate decomposition in soil organic matter pools and for plant dead stem and coarse root mortality. The terrestrial carbon pools and vegetation distribution after spin-up simulations reach quasi-equilibrium states after the 800-year simulations.

With the quasi-equilibrium state from the spin-up simulation, we conduct transient simulations with the process-based fire model in the ELM-BGC driven by hourly NLDAS-2 meteorological forcings at a 0.25° resolution from 2001 to 2020. The process-based fire model operates on an hourly basis, matching the frequency of the meteorological inputs, while the ML fire model is trained and applied at a monthly interval consistent with GFED5 data intervals. For training the offline-XGB model, the ELM-BGC outputs, including LAI, surface soil moisture, and PFT fractions, are averaged to monthly intervals combined with monthly mean meteorological conditions, socioeconomic variables (GDP, population density), and lightning (as detailed in Table 1) to learn the relationship between predictors and burned area. To reduce overfitting, the 20-year dataset is split, with 80 % used for training and 20 % for validation. During training, grid cells with fewer than 30 months of non-zero burned area (∼ two-thirds of the total number of grid cells) are masked. This step is important to avoid feeding the ML model distinct predictor combinations that all correspond to zero burned areas, which could skew the model's learning process. Model performance was evaluated based on its accuracy in predicting the spatial distribution and temporal variation of burned areas. Validation metrics included root mean square error (RMSE) and the coefficient of determination (R²).

We then integrate the offline-XGB to ELM-BGC, forming the coupled model ELM2.1-XGBfire1.0. The coupled model runs at 0.25° and hourly resolutions, where the hourly model predictions are accumulated to calculate monthly means. At the end of each month, the ML fire model is called to predict the monthly burned area, updating the land surface properties (e.g., LAI and vegetation height), carbon cycling (biotic carbon in each pool), and ecohydrology processes (photosynthesis and soil moisture) in ELM-BGC.

2.4 Ecoregion

We evaluate the model simulated burned area for each ecoregion adopted from the US Environmental Protection Agency (EPA). Ecoregions are areas where ecosystems (and the type, quality, and quantity of environmental resources) are generally similar (Omernik and Griffith, 2014), and, generally, wildfire properties in each ecoregion are similar. A combination of level I and level II ecoregions is used, and some types have been combined to focus on the broad vegetation distribution. As shown in Fig. 2, the Western Forested Mountains include NW Forested Mountains, Marine West Coast Forests, and Mediterranean California from level I ecoregions. The North American (NA) Deserts include NA Deserts and small portions of Temperate Sierras and Southern Semi-Arid Highlands. The Northeast (NE) Temperate Forests include Mixed Wood Shield, Mixed Wood Plains, and Atlantic Highlands from level II ecoregions. The Southeast (SE) Temperate Forests include Southeastern US Plains Ozark, Ouachita–Appalachian Forests, and Mississippi Alluvial and Southeast US Coastal Plains level II ecoregions.

https://gmd.copernicus.org/articles/18/4103/2025/gmd-18-4103-2025-f02

Figure 2Ecoregions used in fire model evaluation. 1: Western Forested Mountains, 2: NA Desert, 3: Great Plains, 4: SE Temperate Forests, and 5: NE Temperate Forests.

3 Results

3.1 Evaluation of the burned area spatial distribution

The burned areas across the CONUS exhibit a strong spatial variation (Fig. 3a), which is primarily influenced by climate, vegetation, and human activities. According to the GFED5, the CONUS experiences an average burned area fraction (BAF) of 0.6 % yr⁻¹–0.9 % yr⁻¹ (4.8–7.1 Mha yr⁻¹). The BAF in the WUS (Western Forested Mountains and NA Desert) ranges between 0.4 % yr⁻¹–0.9 % yr⁻¹ (1.1–2.3 Mha yr⁻¹). States like California, Oregon, and Nevada, as well as the Rocky Mountain region, including parts of Colorado and Wyoming, experience large wildfires. The wildfires in the Pacific Northwest and northern California are generally lightning-caused and occur in boreal forests (Balch et al., 2017), whereas those in southern California are primarily caused by human ignition in dry forests and shrublands. The southwest, including Arizona and New Mexico, also sees significant burned areas in shrublands and dry forests. In the Great Plains, states such as Kansas and North Dakota also exhibit high burned areas alongside Texas and Oklahoma, with a BAF ranging between 0.7 % yr⁻¹ and 1.3 % yr⁻¹ (1.6–2.9 Mha yr⁻¹). These high burned areas are primarily contributed by agricultural fires, particularly for cleaning crop residues and managing pastures (Donovan et al., 2020). The southeastern US experiences 0.9 % yr⁻¹–1.5 % yr⁻¹ (1.5–2.6 Mha yr⁻¹) BAF annually, while the temperate forested areas covering Florida, Georgia, and the Carolinas, show lower burned areas compared to the West. The Midwest and northeast exhibit sparse burned areas, with BAF mostly less than 0.16 % yr⁻¹–0.25 % yr⁻¹ (0.2–0.3 Mha yr⁻¹). Burned areas in GFED4s and FireCCI5.1 are much smaller than GFED5 due to the underrepresentation of small fires. The overall spatial distributions are generally consistent across the three datasets, as shown by the high spatial correlation coefficients (R_p).

https://gmd.copernicus.org/articles/18/4103/2025/gmd-18-4103-2025-f03

Figure 3Observed burned area fraction (% yr⁻¹). (a) GFED5 (2001–2019), (b) GFED4s (2001–2016), and (c) FireCCI5.1 (2001–2019). The numbers indicate the mean (M) burned area fraction and burned area (in Mha) in brackets for each dataset. The pattern correlation (R) against GFED5 is also shown, with an asterisk (∗) denoting significance at the 0.01 level. Black contours outline the ecoregions.

The offline-XGB wildfire model reproduces the burned area distribution over the CONUS well (Fig. 3b), with a R_p of 0.98 (p<0.01) and a bias of −1.0 Mha yr⁻¹. While integrated with ELM, the performance degraded (R_p=0.59, p<0.01, bias = 1.9 Mha yr⁻¹) (Fig. 3d). This degradation is likely due to the fire–vegetation feedback. The aboveground biomass and fuel moisture from ELM-BGC have been used to train the offline-XGB prior to the coupled run within ELM. In the coupled simulation, ELM2.1-XGBfire1.0 updates the biotic carbon and fuel moisture based on the burned area simulated in the previous time step. Consequently, differences in the simulated burned area compared to the process-based models are reflected in the biotic carbon and fuel moisture, accumulating over the 20-year simulation period and influencing the burned area simulation in subsequent time steps.

https://gmd.copernicus.org/articles/18/4103/2025/gmd-18-4103-2025-f04

Figure 4Same as Fig. 3 but showing model outputs. The pattern correlation (R) and bias (B) against GFED5 are denoted.

In various ecoregions, the offline-XGB model demonstrates minimal biases, and the ELM2.1-XGBfire1.0 consistently outperforms all process-based fire models in predicting annual mean burned area (Fig. 4a–b). The accurate simulation of burned area over the Western Forest Mountains indicates that the ELM2.1-XGBfire1.0 framework generally captures the complex interplays between climate, vegetation, and human activities, with both climate forcings and predicted vegetation status acquired from ELM-BGC. Meanwhile, the ELM2.1-XGBfire1.0 shows superior performance over the Great Plains, indicating that the ML model effectively describes crop fire, thereby utilizing data on crop fraction and LAI.

The performance of the eight process-based fire models in simulating burned areas over the CONUS shows both strengths and weaknesses (Figs. 4c–j and 5). All models generally capture the high burned areas in key regions such as the WUS and southeast US, except for ORCHIDEE showing a concentrated burned area in the Great Plains and LPJ-GUESS-SIMFIRE-BLAZE model missing fires in SE US. However, all process-based models tend to overestimate burned areas in various regions across the CONUS. ELM-BGC has moderate overestimations over the CONUS, with 3.83 Mha yr⁻¹. The burned areas are doubled in CLASSIC, ORCHIDEE, JULES, and VISIT simulations, with values of up to 20.7 Mha yr⁻¹ (Fig. 4a).

https://gmd.copernicus.org/articles/18/4103/2025/gmd-18-4103-2025-f05

Figure 5Observed and simulated mean burned area fraction ( % yr⁻¹) over the CONUS and ecoregions. The red line in each panel indicates the observed burned area. Modeled burned areas greater than 4 % yr⁻¹ are truncated with the value denoted on the bar.

Download

In the Western Forest Mountains, where fuel is abundant due to dense forest coverage, all process-based models except ORCHIDEE simulate 2 to 5 times of the GFED5 burned area. This overestimation can be related to many factors, including overestimation of fuel combustibility and underrepresentation of anthropogenic fire suppression (Balch et al., 2017). In contrast, wildfires in the NA Desert are primarily constrained by the fuel load. ELM-BGC and CLASSIC produce smaller overestimations, while SSiB4-TRIFFID-Fire, VISIT, JULES, and LPJ-GUESS models significantly overestimate the burned area (4–16 times of GFED5), likely due to overestimations of fuel load, which might be attributed to insufficient water stress on vegetation growth in the arid region (Liu and Xue, 2020; Zhang et al., 2015). Although none of the process-based models accurately capture the spatial distribution of burned area over the Great Plains (Fig. 1), ELM-BGC, SSiB4-TRIFFID-Fire, and VISIT produce comparable burned areas to observations, while CLASSIC and ORCHIDEE overpredict them (4–7 times of GFED5). The inaccurate description of the spatial pattern and large inter-model spread in the Great Plains may be caused by inaccurate treatments of cropland fires and pasture fires (Donovan et al., 2020). As noted by Teckentrup et al. (2019) and Burton et al. (2024a), none of the process-based models has activated the explicit cropland fire model. While LPJ-GUESS-SIMFIRE-BLAZE incorporates harvesting in pastures, reducing biomass and influencing fire dynamics, all other process-based vegetation models do not distinguish pastures from natural grasslands for both vegetation growth and fire processes. Therefore, information on how fuel properties, including the amount as well as physical (e.g., bulk density) and chemical characteristics, and fire ignitions differ between pastures and natural grasslands could help to improve burned area simulation in the process-based fire models (Rabin et al., 2017). Fuel management practices, such as prescribed burning and grazing, can significantly alter fire dynamics but are generally absent in current models. In the eastern US (EUS) forests (Southeast and Northeast Temperate Forests ecoregions), fires are more managed by prescribed burning, leading to fewer uncontrolled extreme wildfires. Although prescribed burning as an additional ignition source is not included in the process-based models, ignition is not a limiting factor in this region due to the abundance of lightning, which provides sufficient natural ignition sources. Consequently, the burned area is primarily controlled by fire spread, which is influenced by natural conditions such as fuel availability and wind, allowing the models to perform well in simulating fire dynamics.

3.2 Evaluation of the burned area temporal variability

We evaluate the model performance in simulating the monthly burned area and depicting fire seasons. Fire season is defined as a monthly burned area greater than $1 / 12$ of the annual total burned area. The CONUS has two fire seasons, i.e., March–April–May and August–September–October, affected by both climate and human activities (Fig. 6a). The WUS fire season spans from early summer to late fall and is primarily determined by the dry conditions and high temperature during these months (Safford et al., 2022; Schoennagel et al., 2017). Specifically, over the Western Forest Mountains, the fire season includes July-to-November (Fig. 6b). Most models capture the July to October fire season, except for ORCHIDEE (May–August). However, only offline-XGB, SSiB4-TRIFFID-Fire, and CLASSIC simulate the peak fire month in August, while others simulate a peak ∼ 1–2 months late. Similar fire season and model performance are observed over the NA Desert (Fig. 5c). In wildfire-dominant regions, the shift in fire peak months might be related to the representation of seasonality in vegetation production and fuel build-up in the BGC model (Hantson et al., 2020).

https://gmd.copernicus.org/articles/18/4103/2025/gmd-18-4103-2025-f06

Figure 6Monthly mean burned area fraction (% yr⁻¹) over each ecoregion. Vertical shadings indicate the fire seasons, monthly burned area greater than $1 / 12$ of the total burned area, and shadings along x axes indicate 1 standard deviation across the years.

Download

Human activities can also change the timing of fire occurrences (Le Page et al., 2010). Over the Great Plains, pasture fires are conducted during late winter to early spring to control pests, recycle nutrients, and prepare fields for planting (Gates et al., 2017). During the late summer to early fall, crop fires are conducted to clear crop residues. However, these fires may become uncontrolled, leading to larger fires that significantly impact the region. The fire seasons due to pasture fires and crop fires are evident in observations and are captured in offline-XGB and ELM2.1-XGBfire1.0, despite ELM2.1-XGBfire1.0 slightly underestimating the peak in March. Except for LPJ-GUESS-SPITFIRE, none of the process-based models is able to simulate these periods, and instead, a summer fire season is predicted. LPJ-GUESS-SPITFIRE produces peaks in both spring and summer. In SE Temperate Forests, routinely prescribed burns reduce large fire occurrences across the year (Mitchell et al., 2014). The dry condition and/or fallen vegetation fuel larger burned areas in February–March and September–November. The ML-based models generally reproduce the fire seasons in March–April and September–November while none of the process-based models captures the bimodal seasonality. The results of NE Temperate Forests are similar to Great Plains, expect no peak burned area appears in November. The offline-XGB and SSiB4-TRIFFID-Fire models capture the spring peak. To the best of our knowledge, ELM-BGC is one of the few process-based models capable of explicitly simulating crop fires; however, this feature was not enabled in our study. None of the models used here includes explicit representations of pasture burning. Our evaluation suggests that including anthropogenic fires could help to improve model simulations in central and eastern US. However, this requires a better understanding of how fire is used for land management under different socioeconomic and cultural conditions (Pfeiffer et al., 2013; Li et al., 2013).

Over the CONUS, the observed interannual variability (IAV), measured using standard deviation, is 0.7 Mha yr⁻¹, representing 12 % of the annual total burned area in GFED5 (Fig. 7a). GFED4s and FireCCI5.1 suggest 1.1 Mha yr⁻¹ (45 %) and 0.9 Mha yr⁻¹ (30 %), respectively. Process-based models greatly overestimate the IAV, ranging from 2.5 (LPJ-GUESS-SIMFIRE-BLAZE) to 6.6 Mha yr⁻¹ (VISIT). The relative IAV regarding the modeled annual mean value, ranging from 12 % (JULES) to 41 % (ELM-BGC), generally within the range of observations. The machine learning models, offline-XGB and ELM2.1-XGBfire1.0 produce IAV of 0.6 Mha yr⁻¹ (11 %) and 0.8 Mha yr⁻¹ (10 %), respectively.

Despite the magnitude of IAV being amplified by process-based models, after extracting the mean values and dividing by standard deviation, the standardized time series correlated well with the observation (Fig. 7b). Since the modeled IAV is generally influenced by climate variability and the climate-driven fuel variability, both process-based and ML-based models capture the timing of the fluctuations.

https://gmd.copernicus.org/articles/18/4103/2025/gmd-18-4103-2025-f07

Figure 7Annual total burned area (Mha yr⁻¹). (a) Annual total value and (b) standardized by removing mean and standard deviation.

Download

Monthly temporal variability in burned areas demonstrates significant regional differences across the ecoregions (Fig. 8). Over the entire simulation period, the ML-based models generally capture the timing of wildfires across the CONUS with a temporal correlation coefficient greater than 0.5 (p<0.01), whereas the process-based models exhibit a correlation of only 0.3 (p>0.01). The ML-based models also effectively capture the temporal variability across the ecoregions, although there is a slight decrease in the ELM2.1-XGBfire1.0 in the Great Plains and EUS. This decrease is likely related to the fire–vegetation feedback, which alters the fuel condition differently from the training set. In contrast, the process-based models show correlations comparable to the ML-based models in the WUS but fail to accurately predict burned area temporal variations in the Great Plains and EUS. Again, climatic factors play a dominant role in shaping the temporal variability of BAF in the WUS, while human activities largely influence the BAF in the Great Plains and EUS (Kupfer et al., 2020; Chen et al., 2023). Process-based models tend to better describe the responses of fuel load and combustibility to climate than the responses of fire ignition and suppression to human activities (Hantson et al., 2016).

https://gmd.copernicus.org/articles/18/4103/2025/gmd-18-4103-2025-f08

Figure 8Monthly correlation coefficient between simulations and GFED5 over each ecoregion.

Download

4 Discussion and conclusion

4.1 Overview of the hybrid framework

This study introduces a hybrid framework integrating an XGBoost wildfire model into an Earth system model (ELM-BGC), resulting in ELM2.1-XGBfire1.0. Both offline and coupled versions of the ML model were evaluated against observations and compared to eight state-of-the-art process-based models. The offline-XGB model significantly reduces burned area biases, particularly in the WUS, while the ELM2.1-XGBfire1.0 model retains the spatial and temporal accuracies with slightly reduced performance. In regions such as the Great Plains and EUS, where human activities are major influences, offline-XGB and ELM2.1-XGBfire1.0 outperform all process-based models.

4.2 Challenges and insights for process-based models

We acknowledge that the simulation biases in process-based models may come from multiple sources. All ISIMIP3a fire models were driven by daily GSWP3-W5E5 forcings at a 0.5° spatial resolution. Differences in forcing data could lead to variations in burned area predictions. However, given that both ELM-BGC and ELM2.1-XGBfire1.0 are driven by the same set of forcings yet produce markedly different burned area predictions, we suggest that limitations in physical understanding may play a dominant role in hindering the performance of the process-based model. By contrast, the ML model incorporates the crop PFT fraction and is trained with data that include agricultural burning, allowing it to capture burning patterns that are often missing or underrepresented in process-based models. Meanwhile, all process-based fire models used in this study have used GFED4s or an earlier versions as a reference for calibration. GFED5 captures significantly more small fires compared to GFED4s, making the CONUS annual burned area increase by 156 %, with crop fire increasing by 240 % (Chen et al., 2023). The inclusion of crop fires is particularly impactful in the CONUS.

The process-based fire models used in this study differ in both fire models and DGVMs. VISIT, JULES, and LPJ-GUESS-SIMFIRE-BLAZE employ the semi-empirical fire models (Thonicke et al., 2001; Pechony and Shindell, 2009; Knorr et al., 2014), in which burned area is calculated without an explicit rate-of-spread model (Hantson et al., 2016). The CLM-Li fire model (Li et al., 2012), a fire model of intermediate complexity, is incorporated into both ELM-BGC and SSiB4-TRIFFID-Fire and partially used in CLASSIC (Melton and Arora, 2016). Consequently, similar performance is observed among these models, although CLASSIC tends to exhibit a larger overestimation. The highly complex SPITFIRE model (Thonicke et al., 2010) provides a more comprehensive description of fire behavior (e.g., fire duration and flame height) and is coupled with ORCHIDEE and LPJ-GUESS to describe the fire impact depending on plant traits (bark thickness and crown height). Although SPITFIRE provides more comprehensive description of fire, it does not outperform other fire models with regard to burned area simulation (Hantson et al., 2020).

With more sophisticated parameterization and fire parameters introduced, more observational analyses are required to understand the mechanism behind and to constrain the parametric uncertainty. The fire–vegetation feedback further complicates this problem, with more complex dynamic vegetation models being slow to reach equilibrium after disturbances. The choice of prescribed or dynamic vegetation could also play a role; note that among all the process-based models, CLASSIC, VISIT, and ELM used prescribed vegetation, while all others used dynamic vegetation. It is noteworthy that parameters involved in wildfire prediction are calibrated to align with the research interests of the institutes developing and managing these models. Advancing the physical understanding of wildfire processes for the CONUS and fine-tuning model parameters towards the new burned area dataset hold the potential to improve model performance (Huang et al., 2020).

4.3 Impact on carbon dynamics and broader application

Although ELM2.1-XGBfire1.0 significantly improves the simulation of burned areas, its impact on terrestrial carbon fluxes remains limited. Within the CONUS, fires primarily affect the terrestrial carbon cycle at localized scales due to the relatively small burned areas. ELM-BGC, for instance, underestimates gross primary production (GPP) by approximately 30 % (figure not shown). With more accurate fire predictions, ELM2.1-XGBfire1.0 helps to slightly reduce this negative bias (less than 1 %). Additionally, while ELM-BGC using prescribed PFT distributions can suppress the effects of fires on the ecosystem, it does not account for fire-induced shifts in vegetation species, where species with greater resistance or fire-adaptive traits may gradually dominate. Nonetheless, the coupling remains valuable, especially when the model is configured at higher resolutions. It is particularly important for evaluating fire-induced tree mortality, post-fire recovery, and fire emissions and their subsequent impacts on air quality, cloud formation, and surface meteorology, particularly when ELM is run as part of the E3SM.

The development and application of ML4Fire-XGB represent a significant step forward in our ability to model wildfire dynamics in regions with complicated interactions between fires, ecosystems, climate, and human activities, bypassing the explicit understanding of physical processes. By incorporating an ML wildfire model into a land surface model, we address the critical need for enhanced predictive capabilities at subseasonal to seasonal scales. Meanwhile, the predictability can adapt to the evolving nature of fire regimes under climate change. This research not only contributes to the scientific community's understanding of fire–ecosystem–climate interactions but also provides a practical tool for policymakers and resource managers engaged in wildfire preparedness and response.

Code and data availability

Data and scripts used to generate results in this study are publicly available at the Pacific Northwest National Laboratory (PNNL) DataHub (https://doi.org/10.25584/2424127, DataHub, 2020). The Fortran–Python interface (ML4ESM) for developing ML parameterizations is archived at https://doi.org/10.5281/zenodo.11005103 (Zhang, 2024). The E3SM v2.1 (including ELM v2.1) is available at https://doi.org/10.11578/E3SM/dc.20230110.5 (E3SM Project, 2023) and https://github.com/E3SM-Project/E3SM/releases/tag/v2.1.0 (last access: 20 June 2024). The modified ELM v2.1 (including the XGBoost ML fire model) is available at https://doi.org/10.5281/zenodo.13358187 (Liu, 2024).

Author contributions

Research conceptualization, paper preparation, and analysis were performed by YL and HH. ELM configuration and setup was supported by DX. The hybrid model coupling framework was first developed by TZ. The GFED5 data were provided by YC. The ML fire model development was assisted on by SSW. YL, HH, DX, TZ, and YC contributed to the paper edits and technical review.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Acknowledgements

Donghui Xu was supported by the Earth System Model Development program area of the US Department of Energy, Office of Science, Office of Biological and Environmental Research, as part of the multiprogram collaborative Integrated Coastal Modeling (ICoM) project. PNNL is operated by the DOE by the Battelle Memorial Institute under contract no. DE-A05-76RL0 1830. Research activity at Brookhaven National Laboratory (BNL) was performed under the Brookhaven National Laboratory contract no. DE-SC0012704.

Financial support

This research has been supported by the Earth and Biological Sciences Directorate (EBSD)'s Laboratory Directed Research and Development (LDRD) Program at Pacific Northwest National Laboratory (PNNL).

Review statement

This paper was edited by Fiona O'Connor and reviewed by two anonymous referees.

References

Andela, N., Morton, D. C., Giglio, L., Chen, Y., van der Werf, G. R., Kasibhatla, P. S., DeFries, R. S., Collatz, G. J., Hantson, S., Kloster, S., Bachelet, D., Forrest, M., Lasslop, G., Li, F., Mangeon, S., Melton, J. R., Yue, C., and Randerson, J. T.: A human-driven decline in global burned area, Science, 356, 1356–1362, 2017.

Arora, V. K. and Boer, G. J.: A parameterization of leaf phenology for the terrestrial ecosystem component of climate models, Glob. Change Biol., 11, 39–59, 2005.

Balch, J. K., Bradley, B. A., Abatzoglou, J. T., Nagy, R. C., Fusco, E. J., and Mahood, A. L.: Human-started wildfires expand the fire niche across the United States, P. Natl. Acad. Sci. USA, 114, 2946–2951, 2017.

Beguería, S., Vicente-Serrano, S. M., Reig, F., and Latorre, B.: Standardized precipitation evapotranspiration index (SPEI) revisited: parameter fitting, evapotranspiration models, tools, datasets and drought monitoring, Int. J. Climatol., 34, 3001–3023, 2014.

Buch, J., Williams, A. P., Juang, C. S., Hansen, W. D., and Gentine, P.: SMLFire1.0: a stochastic machine learning (SML) model for wildfire activity in the western United States, Geosci. Model Dev., 16, 3407–3433, https://doi.org/10.5194/gmd-16-3407-2023, 2023.

Burton, C., Betts, R., Cardoso, M., Feldpausch, T. R., Harper, A., Jones, C. D., Kelley, D. I., Robertson, E., and Wiltshire, A.: Representation of fire, land-use change and vegetation dynamics in the Joint UK Land Environment Simulator vn4.9 (JULES), Geosci. Model Dev., 12, 179–193, https://doi.org/10.5194/gmd-12-179-2019, 2019.

Burton, C., Lampe, S., Kelley, D. I., Thiery, W., Hantson, S., Christidis, N., Gudmundsson, L., Forrest, M., Burke, E., Chang, J., Huang, H., Ito, A., Kou-Giesbrecht, S., Lasslop, G., Li, W., Nieradzik, L., Li, F., Chen, Y., Randerson, J., Reyer, C. P. O., and Mengel, M.: Global burned area increasingly explained by climate change, Nat. Clim. Change, 14, 1186–1192, https://doi.org/10.1038/s41558-024-02140-w, 2024a.

Burton, C., Fang, L., Hantson, S., Forrest, M., Bradley, A., Burke, E., Chang, J., Chao, Y., Ciais, P., Huang, H., Ito, A., Kim, J., Kou-Giesbrecht, S., Nieradzik, L., Nishina, K., Zhu, Q., and Reyer, C. P. O.: ISIMIP3a simulation data from the fire sector, ISIMIP Repository, https://doi.org/10.48364/ISIMIP.446106, 2024b.

Chen, T. and Guestrin, C.: XGBoost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'16, San Francisco, CA, 13–17 August 2016), 785–794, ACM, ISBN 978-1-4503-4232-2, https://doi.org/10.1145/2939672.2939785, 2016.

Chen, Y., Hall, J., van Wees, D., Andela, N., Hantson, S., Giglio, L., van der Werf, G. R., Morton, D. C., and Randerson, J. T.: Global fire emissions database (GFED5) burned area, Zenodo, https://doi.org/10.5281/ZENODO.7668423, 2023.

Chuvieco, E., Pettinari, M. L., Lizundia-Loiola, J., Storm, T., and Padilla Parellada, M.: ESA Fire Climate Change Initiative (Fire_cci): MODIS Fire_cci Burned Area Grid product, version 5.1, Centre for Environmental Data Analysis (CEDA), https://doi.org/10.5285/3628cb2fdba443588155e15dee8e5352, 2019.

Claverie, M., Ju, J., Masek, J. G., Dungan, J. L., Vermote, E. F., Roger, J.-C., Skakun, S. V., and Justice, C.: The Harmonized Landsat and Sentinel-2 surface reflectance data set, Remote Sens. Environ., 219, 145–161, 2018.

Cucchi, M., Weedon, G. P., Amici, A., Bellouin, N., Lange, S., Müller Schmied, H., Hersbach, H., and Buontempo, C.: WFDE5: bias-adjusted ERA5 reanalysis data for impact studies, Earth Syst. Sci. Data, 12, 2097–2120, https://doi.org/10.5194/essd-12-2097-2020, 2020.

DataHub: Simulated wildfire burned area over the CONUS during 2001–2020, DataHub [data set], https://doi.org/10.25584/2424127, 2020.

Donovan, V. M., Wonkka, C. L., Wedin, D. A., and Twidwell, D.: Land-Use Type as a Driver of Large Wildfire Occurrence in the U.S. Great Plains, Remote Sens., 12, 1869, https://doi.org/10.3390/rs12111869, 2020.

E3SM Project: Energy Exascale Earth System Model v2.1.0, DOE [computer software], https://doi.org/10.11578/E3SM/dc.20230110.5, 2023.

Forkel, M., Andela, N., Harrison, S. P., Lasslop, G., van Marle, M., Chuvieco, E., Dorigo, W., Forrest, M., Hantson, S., Heil, A., Li, F., Melton, J., Sitch, S., Yue, C., and Arneth, A.: Emergent relationships with respect to burned area in global satellite observations and fire-enabled vegetation models, Biogeosciences, 16, 57–76, https://doi.org/10.5194/bg-16-57-2019, 2019.

Gates, E. A., Vermeire, L. T., Marlow, C. B., and Waterman, R. C.: Fire and Season of Postfire Defoliation Effects on Biomass, Composition, and Cover in Mixed-Grass Prairie, Rangeland Ecol. Manage., 70, 430–436, 2017.

Giglio, L., Schroeder, W., and Justice, C. O.: The collection 6 MODIS active fire detection algorithm and fire products, Remote Sens. Environ., 178, 31–41, 2016.

Giglio, L., Boschetti, L., Roy, D. P., Humber, M. L., and Justice, C. O.: The Collection 6 MODIS burned area mapping algorithm and product, Remote Sens. Environ., 217, 72–85, 2018.

Global Wildfire Information System: Global Wildfire Information System (2024) – with minor processing by Our World in Data, Annual number of wildfires, Seasonal wildfire trends [original data], Global Wildfire Information System [data set], https://archive.ourworldindata.org/20250624-125417/grapher/annual-number-of-fires.html (last access: 29 June 2025), 2024.

Haas, H., Reaver, N. G. F., Karki, R., Kalin, L., Srivastava, P., Kaplan, D. A., and Gonzalez-Benecke, C.: Improving the representation of forests in hydrological models, Sci. Total Environ., 812, 151425, https://doi.org/10.1016/j.scitotenv.2021.151425, 2022.

Hall, J. V., Loboda, T. V., Giglio, L., and McCarty, G. W.: A MODIS-based burned area assessment for Russian croplands: Mapping requirements and challenges, Remote Sens. Environ., 184, 506–521, 2016.

Hanan, E. J., Ren, J., Tague, C. L., Kolden, C. A., Abatzoglou, J. T., Bart, R. R., Kennedy, M. C., Liu, M., and Adam, J. C.: How climate change and fire exclusion drive wildfire regimes at actionable scales, Environ. Res. Lett., 16, 024051, https://doi.org/10.1088/1748-9326/abd78e, 2021.

Hantson, S., Arneth, A., Harrison, S. P., Kelley, D. I., Prentice, I. C., Rabin, S. S., Archibald, S., Mouillot, F., Arnold, S. R., Artaxo, P., Bachelet, D., Ciais, P., Forrest, M., Friedlingstein, P., Hickler, T., Kaplan, J. O., Kloster, S., Knorr, W., Lasslop, G., Li, F., Mangeon, S., Melton, J. R., Meyn, A., Sitch, S., Spessa, A., van der Werf, G. R., Voulgarakis, A., and Yue, C.: The status and challenge of global fire modelling, Biogeosciences, 13, 3359–3375, https://doi.org/10.5194/bg-13-3359-2016, 2016.

Hantson, S., Kelley, D. I., Arneth, A., Harrison, S. P., Archibald, S., Bachelet, D., Forrest, M., Hickler, T., Lasslop, G., Li, F., Mangeon, S., Melton, J. R., Nieradzik, L., Rabin, S. S., Prentice, I. C., Sheehan, T., Sitch, S., Teckentrup, L., Voulgarakis, A., and Yue, C.: Quantitative assessment of fire and vegetation properties in simulations with fire-enabled vegetation models from the Fire Model Intercomparison Project, Geosci. Model Dev., 13, 3299–3318, https://doi.org/10.5194/gmd-13-3299-2020, 2020.

Huang, H., Xue, Y., Li, F., and Liu, Y.: Modeling long-term fire impact on ecosystem characteristics and surface energy using a process-based vegetation–fire model SSiB4/TRIFFID-Fire v1.0, Geosci. Model Dev., 13, 6029–6050, https://doi.org/10.5194/gmd-13-6029-2020, 2020.

Huang, H., Xue, Y., Liu, Y., Li, F., and Okin, G. S.: Modeling the short-term fire effects on vegetation dynamics and surface energy in southern Africa using the improved SSiB4/TRIFFID-Fire model, Geosci. Model Dev., 14, 7639–7657, https://doi.org/10.5194/gmd-14-7639-2021, 2021.

Huang, H., Qian, Y., McDowell, N. G., Hao, D., Li, L., Shi, M., Rittger, K., Bisht, G., and Chen, X.: Elevated forest canopy loss after wildfires in moist and cool forests in the Pacific Northwest, Authorea [preprint], 15, https://doi.org/10.22541/au.172901089.98985690/v1, 2024.

Ito, A.: Disequilibrium of terrestrial ecosystem CO₂ budget caused by disturbance-induced emissions and non-CO₂ carbon export flows: a global model assessment, Earth Syst. Dynam., 10, 685–709, https://doi.org/10.5194/esd-10-685-2019, 2019.

JEC: Climate-exacerbated wildfires cost the U.S. between 394 to 893 billion each year in economic costs and damages, Report, https://www.jec.senate.gov/public/_cache/files/9220abde-7b60-4d05-ba0a-8cc20df44c7d/jec-report-on-total-costs-of-wildfires.pdf (last access: 28 June 2025), 2023.

Jones, A. M., Kane, J. M., Engber, E. A., Martorano, C. A., and Gibson, J.: Extreme wildfire supersedes long-term fuel treatment influences on fuel and vegetation in chaparral ecosystems of northern California, USA, Fire Ecol., 19, 1–19, 2023.

Jones, M. W., Abatzoglou, J. T., Veraverbeke, S., Andela, N., Lasslop, G., Forkel, M., Smith, A. J. P., Burton, C., Betts, R. A., van der Werf, G. R., Sitch, S., Canadell, J. G., Santín, C., Kolden, C., Doerr, S. H., and Le Quéré, C.: Global and regional trends and drivers of fire under climate change, Rev. Geophys., 60, e2020RG000726, https://doi.org/10.1029/2020rg000726, 2022.

Klein Goldewijk, K., Beusen, A., Doelman, J., and Stehfest, E.: Anthropogenic land use estimates for the Holocene – HYDE 3.2, Earth Syst. Sci. Data, 9, 927–953, https://doi.org/10.5194/essd-9-927-2017, 2017.

Knorr, W., Kaminski, T., Arneth, A., and Weber, U.: Impact of human population density on fire frequency at the global scale, Biogeosciences, 11, 1085–1102, https://doi.org/10.5194/bg-11-1085-2014, 2014.

Knorr, W., Arneth, A., and Jiang, L.: Demographic controls of future global fire risk, Nat. Clim. Change, 6, 781–785, 2016.

Kupfer, J. A., Terando, A. J., Gao, P., Teske, C., and Kevin Hiers, J.: Climate change projected to reduce prescribed burning opportunities in the south-eastern United States, Int. J. Wildland Fire, 29, 764–778, 2020.

Lange, S., Menz, C., Gleixner, S., Cucchi, M., Weedon, G. P., Amici, A., Bellouin, N., Schmied, H. M., Hersbach, H., Buontempo, C., and Cagnazzo, C.: WFDE5 over land merged with ERA5 over the ocean (W5E5 v2.0), ISIMIP Repository [data set], https://doi.org/10.48364/ISIMIP.342217, 2021.

Lasslop, G., Thonicke, K., and Kloster, S.: SPITFIRE within the MPI Earth system model: Model development and evaluation, J. Adv. Model. Earth Sy., 6, 740–755, 2014.

Lawrence, D. M., Oleson, K. W., Flanner, M. G., Thornton, P. E., Swenson, S. C., Lawrence, P. J., Zeng, X., Yang, Z.-L., Levis, S., Sakaguchi, K., Bonan, G. B., and Slater, A. G.: Parameterization improvements and functional and structural advances in Version 4 of the Community Land Model, J. Adv. Model. Earth Sy., 3, M03001, https://doi.org/10.1029/2011MS00045, 2011.

Le Page, Y., Oom, D., Silva, J. M. N., Jönsson, P., and Pereira, J. M. C.: Seasonality of vegetation fires as modified by human action: observing the deviation from eco-climatic fire regimes, Global Ecol. Biogeogr., 19, 575–588, 2010.

Li, F. and Lawrence, D. M.: Role of Fire in the Global Land Water Budget during the Twentieth Century due to Changing Ecosystems, J. Climate, https://doi.org/10.1175/jcli-d-16-0460.1, 2017.

Li, F., Zeng, X. D., and Levis, S.: A process-based fire parameterization of intermediate complexity in a Dynamic Global Vegetation Model, Biogeosciences, 9, 2761–2780, https://doi.org/10.5194/bg-9-2761-2012, 2012.

Li, F., Levis, S., and Ward, D. S.: Quantifying the role of fire in the Earth system – Part 1: Improved global fire modeling in the Community Earth System Model (CESM1), Biogeosciences, 10, 2293–2314, https://doi.org/10.5194/bg-10-2293-2013, 2013.

Li, F., Bond-Lamberty, B., and Levis, S.: Quantifying the role of fire in the Earth system – Part 2: Impact on the net carbon balance of global terrestrial ecosystems for the 20th century, Biogeosciences, 11, 1345–1360, https://doi.org/10.5194/bg-11-1345-2014, 2014.

Li, F., Zhu, Q., Riley, W. J., Zhao, L., Xu, L., Yuan, K., Chen, M., Wu, H., Gui, Z., Gong, J., and Randerson, J. T.: AttentionFire_v1.0: interpretable machine learning fire model for burned-area predictions over tropics, Geosci. Model Dev., 16, 869–884, https://doi.org/10.5194/gmd-16-869-2023, 2023.

Liu, Y.: Machine learning (XGBoost) fire model for CONUS, Zenodo [data set], https://doi.org/10.5281/zenodo.13358187, 2024.

Liu, Y. and Xue, Y.: Expansion of the Sahara Desert and shrinking of frozen land of the Arctic, Sci. Rep., 10, 4109, https://doi.org/10.1038/s41598-020-61085-0, 2020.

Lizundia-Loiola, J., Otón, G., Ramo, R., and Chuvieco, E.: A spatio-temporal active-fire clustering approach for global burned area mapping at 250 m from MODIS data, Remote Sens. Environ., 236, 111493, https://doi.org/10.1016/j.rse.2019.111493, 2020.

Mangeon, S., Voulgarakis, A., Gilham, R., Harper, A., Sitch, S., and Folberth, G.: INFERNO: a fire and emissions scheme for the UK Met Office's Unified Model, Geosci. Model Dev., 9, 2685–2700, https://doi.org/10.5194/gmd-9-2685-2016, 2016.

Mathison, C., Burke, E., Hartley, A. J., Kelley, D. I., Burton, C., Robertson, E., Gedney, N., Williams, K., Wiltshire, A., Ellis, R. J., Sellar, A. A., and Jones, C. D.: Description and evaluation of the JULES-ES set-up for ISIMIP2b, Geosci. Model Dev., 16, 4249–4264, https://doi.org/10.5194/gmd-16-4249-2023, 2023.

Melton, J. R. and Arora, V. K.: Competition between plant functional types in the Canadian Terrestrial Ecosystem Model (CTEM) v. 2.0, Geosci. Model Dev., 9, 323–361, https://doi.org/10.5194/gmd-9-323-2016, 2016.

Melton, J. R., Arora, V. K., Wisernig-Cojoc, E., Seiler, C., Fortier, M., Chan, E., and Teckentrup, L.: CLASSIC v1.0: the open-source community successor to the Canadian Land Surface Scheme (CLASS) and the Canadian Terrestrial Ecosystem Model (CTEM) – Part 1: Model framework and site-level performance, Geosci. Model Dev., 13, 2825–2850, https://doi.org/10.5194/gmd-13-2825-2020, 2020.

Miller, J. D., Safford, H. D., Crimmins, M., and Thode, A. E.: Quantitative Evidence for Increasing Forest Fire Severity in the Sierra Nevada and Southern Cascade Mountains, California and Nevada, USA, Ecosystems, 12, 16–32, 2009.

Mitchell, R. J., Liu, Y., O'Brien, J. J., Elliott, K. J., Starr, G., Miniat, C. F., and Hiers, J. K.: Future climate and fire interactions in the southeastern region of the United States, Forest Ecol. Manage., 327, 316–326, 2014.

Omernik, J. M. and Griffith, G. E.: Ecoregions of the conterminous United States: evolution of a hierarchical spatial framework, Environ. Manage., 54, 1249–1266, 2014.

Parks, S. A. and Abatzoglou, J. T.: Warmer and drier fire seasons contribute to increases in area burned at high severity in western US forests from 1985 to 2017, Geophys. Res. Lett., 47, e2020GL089858, https://doi.org/10.1029/2020gl089858, 2020.

Pechony, O. and Shindell, D. T.: Fire parameterization on a global scale, J. Geophys. Res.-Atmos., 114, D16115, https://doi.org/10.1029/2009JD011927, 2009.

Pfeiffer, R. M., Park, Y., Kreimer, A. R., Lacey Jr., J. V., Pee, D., Greenlee, R. T., Buys, S. S., Hollenbeck, A., Rosner, B., Gail, M. H., and Hartge, P.: Risk prediction for breast, endometrial, and ovarian cancer in white women aged 50 y or older: derivation and validation from population-based cohort studies, PLoS Med., 10, e1001492, https://doi.org/10.1371/journal.pmed.1001492, 2013.

Prentice, S. A. and Mackerras, D.: The Ratio of Cloud to Cloud-Ground Lightning Flashes in Thunderstorms, J. Appl. Meteorol. Clim., 16, 545–550, 1977.

Rabin, S. S., Melton, J. R., Lasslop, G., Bachelet, D., Forrest, M., Hantson, S., Kaplan, J. O., Li, F., Mangeon, S., Ward, D. S., Yue, C., Arora, V. K., Hickler, T., Kloster, S., Knorr, W., Nieradzik, L., Spessa, A., Folberth, G. A., Sheehan, T., Voulgarakis, A., Kelley, D. I., Prentice, I. C., Sitch, S., Harrison, S., and Arneth, A.: The Fire Modeling Intercomparison Project (FireMIP), phase 1: experimental and analytical protocols with detailed model descriptions, Geosci. Model Dev., 10, 1175–1197, https://doi.org/10.5194/gmd-10-1175-2017, 2017.

Rodrigues, M. and de la Riva, J.: An insight into machine-learning algorithms to model human-caused wildfire occurrence, Environ. Model. Softw., 57, 192–201, 2014.

Rogers, B. M., Soja, A. J., Goulden, M. L., and Randerson, J. T.: Influence of tree species on continental differences in boreal fires and climate feedbacks, Nat. Geosci., 8, 228–234, 2015.

Rothermel, R. C.: A Mathematical Model for Predicting Fire Spread in Wildland Fuels, Intermountain Forest & Range Experiment Station, Forest Service, U.S. Department of Agriculture, 40 pp., https://www.fs.usda.gov/research/treesearch/32533 and https://www.fs.usda.gov/rm/pubs_int/int_rp115.pdf (last access: 28 June 2025), 1972.

Safford, H. D., Paulson, A. K., Steel, Z. L., Young, D. J. N., Wayman, R. B., and Varner, M.: The 2020 California fire season: A year like no other, a return to the past or a harbinger of the future?, Global Ecol. Biogeogr., 31, 2005–2025, 2022.

Schoennagel, T., Balch, J. K., Brenkert-Smith, H., Dennison, P. E., Harvey, B. J., Krawchuk, M. A., Mietkiewicz, N., Morgan, P., Moritz, M. A., Rasker, R., Turner, M. G., and Whitlock, C.: Adapt to more wildfire in western North American forests as climate changes, P. Natl. Acad. Sci. USA, 114, 4582–4590, 2017.

Teckentrup, L., Harrison, S. P., Hantson, S., Heil, A., Melton, J. R., Forrest, M., Li, F., Yue, C., Arneth, A., Hickler, T., Sitch, S., and Lasslop, G.: Response of simulated burned area to historical changes in environmental and anthropogenic factors: a comparison of seven fire models, Biogeosciences, 16, 3883–3910, https://doi.org/10.5194/bg-16-3883-2019, 2019.

Thonicke, K., Venevsky, S., Sitch, S., and Cramer, W.: The role of fire disturbance for global vegetation dynamics: Coupling fire into a dynamic global vegetation model, Global Ecol. Biogeogr., 10, 661–677, 2001.

Thonicke, K., Spessa, A., Prentice, I. C., Harrison, S. P., Dong, L., and Carmona-Moreno, C.: The influence of vegetation, fire spread and fire behaviour on biomass burning and trace gas emissions: results from a process-based model, Biogeosciences, 7, 1991–2011, https://doi.org/10.5194/bg-7-1991-2010, 2010.

Turco, M., Abatzoglou, J. T., Herrera, S., Zhuang, Y., Jerez, S., Lucas, D. D., AghaKouchak, A., and Cvijanovic, I.: Anthropogenic climate change impacts exacerbate summer forest fires in California, P. Natl. Acad. Sci. USA, 120, e2213815120, https://doi.org/10.1073/pnas.2213815120, 2023.

van der Werf, G. R., Randerson, J. T., Giglio, L., van Leeuwen, T. T., Chen, Y., Rogers, B. M., Mu, M., van Marle, M. J. E., Morton, D. C., Collatz, G. J., Yokelson, R. J., and Kasibhatla, P. S.: Global fire emissions estimates during 1997–2016, Earth Syst. Sci. Data, 9, 697–720, https://doi.org/10.5194/essd-9-697-2017, 2017.

Venevsky, S., Thonicke, K., Sitch, S., and Cramer, W.: Simulating fire regimes in human-dominated ecosystems: Iberian Peninsula case study, Glob. Change Biol., 8, 984–998, 2002.

Villarreal, M. L., Norman, L. M., Yao, E. H., and Conrad, C. R.: Wildfire probability models calibrated using past human and lightning ignition patterns can inform mitigation of post-fire hydrologic hazards, Geomat. Nat. Haz. Risk, 13, 568–590, 2022.

Wang, S. S., Qian, Y., Leung, L. R., and Zhang, Y.: Identifying Key Drivers of Wildfires in the Contiguous US Using Machine Learning and Game Theory Interpretation, Earth's Future, 9, e2020EF001910, https://doi.org/10.1029/2020EF001910, 2021.

Yue, C., Ciais, P., Cadule, P., Thonicke, K., Archibald, S., Poulter, B., Hao, W. M., Hantson, S., Mouillot, F., Friedlingstein, P., Maignan, F., and Viovy, N.: Modelling the role of fires in the terrestrial carbon balance by incorporating SPITFIRE into the global vegetation model ORCHIDEE – Part 1: simulating historical global burned area and fire regimes, Geosci. Model Dev., 7, 2747–2767, https://doi.org/10.5194/gmd-7-2747-2014, 2014.

Zhang, T., Morcrette, C. J., Zhang, M., Lin, W., Xie, S., Liu, Y., Van Weverberg, K., and Rodrigues, J.: A FORTRAN-Python interface for integrating machine learning parameterization into Earth System Models, ESS Open Archive, https://doi.org/10.22541/essoar.171322761.17960693/v1, 2024.

Zhang, T.: tzhang-ccs/ML4ESM: ML4ESM_v1 (Version v1), Zenodo [data set], https://doi.org/10.5281/zenodo.11005103, 2024.

Zhang, Z., Xue, Y., MacDonald, G., Cox, P. M., and Collatz, G. J.: Investigation of North American vegetation variability under recent climate: A study using the SSiB4/TRIFFID biophysical/dynamic vegetation model, J. Geophys. Res.-Atmos., 120, 1300–1321, 2015.

Zhu, Q., Li, F., Riley, W. J., Xu, L., Zhao, L., Yuan, K., Wu, H., Gong, J., and Randerson, J.: Building a machine learning surrogate model for wildfire activities within a global Earth system model, Geosci. Model Dev., 15, 1899–1911, https://doi.org/10.5194/gmd-15-1899-2022, 2022.

Articles

Short summary

This study integrates machine learning with a land surface model to improve wildfire predictions in North America. Traditional models struggle with accurately simulating burned areas due to simplified processes. By combining the predictive power of machine learning with a land model, our hybrid framework better captures fire dynamics. This approach enhances our understanding of wildfire behavior and aids in developing more effective climate and fire management strategies.