Evaluation of CMIP6 model performances in simulating fire weather spatiotemporal variability on global and regional scales
- 1Centre for Agroecology, Water and Resilience, Coventry University, Coventry, CV8 3LG, UK
- 2Southern Swedish Forest Research Centre, Swedish University of Agricultural Sciences, Alnarp, 230 53, Sweden
- 3Institut de recherche sur les forêts, Université du Québec en Abitibi-Témiscamingue (UQAT), QC J9X 5E4, Canada
- 4Forest Research Institute of the Karelian Research Centre of the Russian Academy of Sciences, Petrozavodsk, 185910, Russia
- 5School of Forestry, Northern Arizona University, Flagstaff, Arizona, 86011, USA
- 6Disaster Risk Management Unit, Directorate for Space, Security and Mitigation, Joint Research Centre (JRC), European Commission, Ispra, 21027, Italy
- 7School of Energy, Construction and Environment, Coventry University, Coventry, CV1 5FB, UK
- 1Centre for Agroecology, Water and Resilience, Coventry University, Coventry, CV8 3LG, UK
- 2Southern Swedish Forest Research Centre, Swedish University of Agricultural Sciences, Alnarp, 230 53, Sweden
- 3Institut de recherche sur les forêts, Université du Québec en Abitibi-Témiscamingue (UQAT), QC J9X 5E4, Canada
- 4Forest Research Institute of the Karelian Research Centre of the Russian Academy of Sciences, Petrozavodsk, 185910, Russia
- 5School of Forestry, Northern Arizona University, Flagstaff, Arizona, 86011, USA
- 6Disaster Risk Management Unit, Directorate for Space, Security and Mitigation, Joint Research Centre (JRC), European Commission, Ispra, 21027, Italy
- 7School of Energy, Construction and Environment, Coventry University, Coventry, CV1 5FB, UK
Abstract. Weather and climate play an important role in shaping global wildfire regimes and geographical distributions of burnable area. As projected by the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC-AR6), in the near future, fire danger is likely to increase in many regions due to warmer temperatures and drier conditions. General Circulation Models (GCMs) are an important resource in understanding how fire danger will evolve in a changing climate but, to date, the development of fire risk scenarios has not fully accounted for systematic GCM errors and biases. This study presents a comprehensive global evaluation of the spatiotemporal representation of fire weather indicators from the Canadian Forest Fire Weather Index System simulated by 16 GCMs from the 6th Coupled Model Intercomparison Project (CMIP6). While at the global scale, the ensemble mean is able to represent variability, magnitude and spatial extent of different fire weather indicators reasonably well when compared to the latest global fire reanalysis, there is considerable regional and seasonal dependence in the performance of each GCM. To support the GCM selection and application for impact studies, the evaluation results are combined to generate global and regional rankings of individual GCM performance. The findings highlight the value of GCM evaluation and selection in developing more reliable projections of future climate-driven fire danger, thereby enabling decision makers and forest managers to take targeted action and respond to future fire events.
Carolina Gallo et al.
Status: final response (author comments only)
-
RC1: 'Comment on gmd-2022-223', Anonymous Referee #1, 22 Nov 2022
Synopsis
This paper evaluates the skill of 16 CMIP6 models in reproducing historical observed fire weather conditions as viewed through the CFDRS. The paper is well-written, well-structured and the methodology is robust. I think the paper has the potential to be published but I have a few comments below that need to be addressed before publication.
Major comments
Section 2.3 – Are authors aware of the hot model problem in CMIP6 ? See the paper below. This should be discussed and acknowledged.
https://www.nature.com/articles/d41586-022-01192-2
Line 166-170 – Defining a unique fire season for each GFED region is questionable given their spatial extent. The timing of the fire season has already been reported in a number of previous studies and has been shown to be much variable in space. The present coarse-scale analysis is thus likely to mix a number of different seasonalities within each GFED region. I would suggest to define a fire season locally (e.g. at the pixel scale) as done in most previous studies. Moreover, the model ranking (currently based on GFED region) would be much more relevant if authors would consider the spatial extent (number of pixels) where a model falls within a specific tercile. The current ranking is dependent on the size of each GFED region (a small region contributes as much as a large one).
Minor comments
Line 65-68 - GCMs have also been used in attribution studies of FWI to quantify the current risk
Barbero R, Abatzoglou J T, Pimont F, Ruffault J and Curt T. 2020 Attributing increases in fire weather to anthropogenic climate change over France. Frontiers in Earth Science
https://doi.org/10.3389/feart.2020.00104
Line 130-131 – I am not sure to understand how links between fire events and fire weather trends relate to the performance of ERA-5 in reproducing observed fire weather conditions Two independent variables can follow similar trends !
Line 150-155 – As stated by authors, the record length of GFED is a bit short to draw such conclusions. What about using a land cover map to define what is burnable or not?
Line 157 – why this specific time period? This needs clarification
Line 158 – Are you referring to the 90th percentile of daily values?
Figure 2 – the title indicates (mon_mean) while the caption reads annual mean. Please use the same terminology for consistency.
Figure 2 and 3 – I wonder if the (minor) differences between average and extreme statistics need to be reported. Please consider moving Figure 3 to the supplementary information and reduce the text accordingly.
Line 197-198 – Given the (rather expected) similarities in the results, I suggest moving DSR results to the supplementary information.
Line 217-219 – this belongs to the methods
Figure 4 – please increase label size (x and y labels) as well as the colorbar. They are very hard to read.
Figures 5-6 – Would that make sense to add the MMM for comparison with each individual GCM?
Figure 5-6 – Please indicate the RMSE error on the figure for clarity
Figure 6 – please consider moving this figure to SI given the similarity with figure 5
Line 349-350 – this is a very interesting result
Line 355-358 – Does it mean that a model performing well in reproducing historical variability is more credible in simulating future changes to fire weather conditions? I am just asking.
Line 402-405 – Yes, other regionalization such as the pyroregions presented in Galizia et al. (2021) over Europe would be more relevant in terms of fire activity and management.
Galizia, L. F. C., Curt, T., Barbero, R., Rodrigues, M. (2021). Understanding fire regimes in Europe, International Journal of Wildland Fire, 31, 56-66. https://doi.org/10.1071/WF21081
A future interesting research question would be to examine what meteorological variables (temp, prcp, RH and wind) are responsible for the difference between observed and simulated CFDRS components. This could be discussed somewhere in the paper.
-
RC2: 'Comment on gmd-2022-223', Anonymous Referee #2, 03 Jan 2023
This work evaluates the performance of 16 CMIP6 models in reproducing historical fire weather indicators represented by the Canadian Fire Weather Index System by comparing the results to those produced by GEDD-ERA5. This paper is written in a concise manner, it is well structured and provides a robust and insightful analysis that allows a better understanding of the performance of CMIP6 models in capturing fire related weather. In my opinion, this work provides a useful contribution to the scientific community, aiding model selection for future related studies focused on the use of these for future climate driven fire projections. However, before this is considered for publication, there are several clarifications which should be provided by the authors.
Lines 125 – 129 – The work by Vitolo et al. (2020) describes GEFF-ERA5, the reanalysis dataset of FWI fire behaviour indices based on the ERA5 reanalysis, including the impact of resolution by comparing to GEFF-ERAI. Please consider including a reference to this work.
(Vitolo, C., Di Giuseppe, F., Barnard, C. et al. ERA5-based global meteorological wildfire danger maps. Sci Data 7, 216 (2020). https://doi.org/10.1038/s41597-020-0554-z)
Lines 150 – 155 – My understanding is that this paragraph is highlighting that this work will be focused on the fire-prone areas of the world, as defined in GFED4. It’s not clear to me how the last sentence of this paragraph is relevant for this work, as no further mention to GFED (other than the regions) is made throughout this work.
Section 3 – Throughout this section the authors analyse the differences in the fire indices between the different CMIP6 models and GEDD-ERA5. While this provides a useful insight and understanding on the FWI components bias, further expanding these to the fire weather components (e.g., temperature, wind, precipitation, etc…) would provide a better understanding of the drivers of said bias, strengthening the evaluation provided by this work. This is especially relevant at regional level and may help with model selection, as well as inform future model development.
Figure 2 and 3 – The caption refers to these figures as Annual means, however subtitles on each top tile refer to mon_mean. This should be reviewed and made consistent.
Figure 4 – Figure label are not legible, please consider increasing the font.
Lines 344 – 346 – Although it is stated that it is difficult to identify systematic reasons for inter-model differences, having an analysis of the meteorological driver (e.g., temperature and precipitation) between models may help better understand the inter-model bias.
Lines 346 – 350 – It is mentioned that there is little evidence on the impact of spatial resolution, but it is mentioned that MPI-ESM1-2-HR consistently performs better than MPI-ESM-ESM1-2-LR and MPI-ESM-1-2-HAM. Comparing the impact of resolution using different models may not provide the robust framework to draw conclusions, as the effect of different resolution may be impacted by different model formulation (e.g., different dynamics, physics, inputs, etc…). Furthermore, Vitolo et al. (2020) shows the benefits of resolution in FWI between GEFF-ERAI GEFF-ERA5.
Line 390 – resolution should be considered a caveat, especially following the mention of the fire regimes vary substantially at the intra-regional scale in lines 402 – 405.
-
RC3: 'Review of gmd-2022-223', Matthew Kasoar, 06 Jan 2023
The authors present an evaluation of historical fire weather performance by 16 CMIP6 models, comparing how well they reproduce ERA5 reanalysis estimates of the various components of the Canadian Forest Fire Weather Index System (the most widely-used set of fire weather danger metrics).
Such an evaluation is valuable and timely given the obvious application of using latest-generation climate models to study changing fire weather severity and frequency. The authors provide an indication of the current CMIP6 ensemble performance at capturing the mean, 90th percentile, and seasonal cycles of fire weather both globally and across various fire-prone regions, and also give an indication of which models (of the 16 studied) tend to provide the best (and worst) performances – a useful guide for many studies that often rely on single-model results. While it’s a shame that some fairly prominent climate models (e.g. CESM2, HadGEM3/UKESM1, GISS-E2, EC-Earth3, NorESM, MIROC) are not included in the current analysis (due to the required variables not being available at the time the analysis was done), nonetheless it also provides a methodology that can be extended to validate FWI performance of further models given a small set of fairly standard output variables.
The manuscript is well written with a concise and very readable prose style, and is a well suited study for this journal. I have a number of mostly minor comments which I have listed below; mainly these are just requesting clarification on certain details. I also feel that the manuscript could be even more useful if the discussion touched a little more upon the drivers of model bias rather than being purely descriptive, as detailed in my second comment below. But the analysis seems sound (subject to my first comment below being addressed), and if the authors can provide the additional clarifications as detailed in the subsequent comments then I would certainly endorse it’s publication in GMD.
Slightly more major comments:
- My main worry surrounding the methodology, is that the FWI indices for the CMIP6 models are calculated slightly differently from how they are derived in the GEFF-ERA5 reanalysis product. As the authors note, the FWI indices are ideally supposed to be calculated with local noon values of temperature, RH, wind speed, and accumulated precip, and these are what GEFF-ERA5 uses. However, local noon snapshots are not typically archived in CMIP6, and so the authors instead use daily maximum temperature, daily minimum RH, and daily mean wind speed and total precip, as proxies for local noon conditions. While these are necessary and reasonable approximations which previous studies have similarly used when working with climate model data, nonetheless it means it’s not quite a like-for-like comparison when comparing the estimated CMIP6 FWI indices with the exact GEFF-ERA5 values. It’s therefore important to verify first that this difference in calculation method doesn’t make any difference to the resulting bias patterns. A priori, it seems plausible for instance that daily maximum temperatures would often be slightly higher than local noon values, which could result in the CMIP6 indices being positively biased on average simply due to the differing calculation methods. The standard ERA5 atmospheric reanalysis should have all the same daily max/min/mean variables that are used from the CMIP6 models, and therefore the authors should be able to calculate the FWI indices from scratch for ERA5 using the same method and approximations as for the CMIP6 models. This could then be differenced with the exact GEFF-ERA5 values to check what (if any) difference the calculation method makes. Provided the difference due to calculation method is negligible compared to the bias patterns then there’s no problem with keeping the rest of the analysis as is, but this should be verified first.
- In Sections 4 and 5 (Synthesis/Discussion and Conclusions), it would add further value if the authors could comment a bit on what are the main meteorological drivers of good and bad model performance and inter-model spread. At the moment the paper principally illustrates the models’ biases without any substantive discussion of what drives them. This is certainly still invaluable for end users of these models ‘off the shelf’ to study fire weather, while also providing a methodology to validate model FWI performance which can be extended to other models, and the authors are clear that this is the primary aim of the paper. But from a model development perspective, it would be very useful to also have some pointers towards what are the critical things that models need to get right to be able to simulate fire weather well.
There is some brief discussion of structural differences between models, and the authors note for instance that model resolution doesn’t seem to correlate with performance. But the models don’t simulate FWI directly; they simulate meteorology, and it feels like it should be possible to say something about which meteorological factors (and/or regions) are the ones that model developers should be concentrating on to try and improve the representation of fire weather. A thorough exploration of this is no doubt beyond the scope of the current paper, as it could be a whole separate analysis in itself. But it would be great if the authors could comment a bit on some of the broader patterns. For instance, from Figure 4 we see that the MMM consistently does badly in certain tropical regions like NHSA, SEAS, and EQAS. Is this because all the models consistently struggle to represent a certain driving variable in these regions, e.g. maybe they all tend to underestimate tropical precipitation? Or are the models all bad for different reasons in these locations?
In terms of inter-model spread, I also found it very intriguing that (L376-377) “strong model performance for one indicator does not necessarily mean strong performance for another”, and some of the fire weather indices (FFMC, ISI, FWI, and DSR) are more consistently well-simulated than others, even though those other indices are largely calculated from the same meteorological variables as the ones that are well-simulated. I assume this can only be because the relative influence of the various meteorological variables is different for different indices. But therefore, it again seems like it should be possible to say something broadly about which meteorological variables are more responsible for driving inter-model spread in performance; e.g. which are the driving variables that are relatively more important in those indices which tend to be poorly simulated, which can explain why those variables are often less well simulated than others? N.B. it’s not quite the same thing, but as a tangential example the authors could look at this paper: Grillakis et al. ERL 2022, https://doi.org/10.1088/1748-9326/ac5fa1. In Figure 4 of that paper, we ranked which FWI input components were the most important for driving burnt area in each of the different GFED regions (RH and temperature tended to be the most important, but it varied by region which one was dominant, and occasionally it was something else like wind speed that mattered most). This was looking at the drivers of burnt area, but it should be possible to say something similar about which input variables are most important for influencing the different CFFWIS indices, and therefore make some general statement abouts which meteorological variables tend to be more/less consistently well-simulated, and therefore result in certain indices to be more/less consistently well-simulated.
Minor comments:
- L28: “Wildfires burn hundreds of millions of hectares of forest each year around the world (Giglio et al., 2013…)". Hundreds of millions of hectares is actually the total burnt area of all land cover types (~ 350 Mha in Giglio et al., although this may be an underestimate). The vast majority of this is savannah fires; the amount of forest burnt is only a small fraction (~ 5%) of the total (c.f. Figure 4 in Giglio et al.).
- 2.4: More precise details of the data processing and metrics used are needed here to fully understand the comparison being made. E.g.:
- L156-157: “simulations from each GCM are then compared to corresponding GEFF-ERA5 fields between 1980 and 2014”. I assume that by ‘simulations’ the authors mean the “historical” experiment from CMIP6, but please specify this. Similarly I assume the analysis period goes from 1980 to 2014 because the ‘historical’ experiment in CMIP6 only goes up to 2014, but again for the benefit of readers who aren’t familiar with the details of the CMIP6 suite of scenarios, it would be useful to clarify this.
- L158: “monthly mean and 90th percentile statistics”. I’m assuming that the monthly mean analysis is done for a monthly climatology, i.e. where the daily FWI indices are averaged for each month across all years between 1980-2014, rather than being compared year-by-year. However please clarify this. Similarly, please clarify what the 90th percentile is the 90th percentile of. g. is it the 90th percentile of all the individual monthly means? Is it the 90th percentile of the daily FWI values for each month, which are then averaged into a monthly climatology? Is it the 90th percentile of all the daily FWI values across the whole year? Or something else?
- L163: “ratio of observed standard deviation to assess the representation of variance”. Is this the spatial variance (i.e. s.d. between different gridbox values), or temporal variance (i.e. s.d. of the year-to-year timeseries)? I’m assuming it’s spatial, given that it’s later plotted on a Taylor diagram against the spatial correlation and RMSE, but please clarify. Assuming that is it a spatial variance, it may also be worth clarifying that these three metrics are not entirely independent measures of performance (which is of course why they can be plotted together on a Taylor diagram in 2 dimensions) – if you know any two of these metrics then it uniquely determines the third. (This is relevant for interpreting Section 4, where model skill is ranked based on how often a model scores well for all three metrics together).
- L169-170: “those months for which the total burned area is greater than 50% of the maximum burned area across all months”. Is this the maximum burned across all 12 months of a monthly climatology (i.e. averaged for each month over 1996-2016), or is it the maximum month from any point in the raw 252-month time series? If the latter, this strikes me as a potentially restrictive definition of fire season, since it could be very sensitive to one extreme year where there was much higher burned area than usual.
- L174-175: “For all CFWIS components, global patterns are generally similar for both the annually-averaged monthly mean (Fig. 2; centre column) and 90th percentile statistics (Fig. 3; centre column)”. Is this talking about the CMIP6 models, or is it still talking describing the GEFF-ERA5 patterns? The previous sentence only talks about GEFF-ERA5, but the centre column of Figs 2 and 3 relate to the CMIP6 models.
- Figures 2 and 3: The caption says ‘Annual means’, but the labels at the top of each column say ‘mon_mean’ and ‘mon_p90’ respectively, which is a little confusing. (Especially for Figure 3, c.f. my previous comment that it’s confusing what the 90th percentile is of – e.g. is it the 90th percentile of monthly mean values, or is it a monthly climatology of the 90th percentile of daily FWI values?)
- Figures 2, 3, 4, 9: Axes label text is much too small to read. Figure 4 is the worst offender; I printed it out in A4 and it’s impossible to read any of the colourbar, row, or column labels. Colourbar labels on Figs 2, 3, and 9 also need to be bigger.
- L249: Title line of Figure 4 caption describes it only as “Bias in monthly means….” however the figure shows the bias in 90th percentile as well (with equal prominence), which should therefore also be reflected in the title description.
- L256: “monthly burned area for each region”. Presumably this is from GFED4; it could be helpful to specify this in the caption.
- L265-267: “At the global scale, the representation of DMC, DC and BUI is similar among models, which all present similar patterns, with greater inter-model variability and thus greater uncertainty, for both monthly mean (Fig. 5b, c, e) and 90th percentile annual values (Fig. 6b, c, e)”. As currently worded, this is quite a confusing sentence – it initially says that all models show similar patterns of DMC, DC and BUI, but then says there’s large inter-model variability, which seems contradictory? Also unclear: what is the ‘greater inter-model variability’ greater than?
- L270: “one indicator to the other” -> “one indicator to another” (because there’s more than one other indicator)
- L270-271: “model performance varies greatly from one indicator to the other. For instance, the GFDL-CM4 model performs well for all CFWIS components”. Another slightly confusing wording, as ‘GFDL-CM4 performs well for all CFWIS components’ appears to be a counterexample to the preceding statement that model performance varies greatly from one indicator to the other, rather than an instance of it.
- Figure 8: Could a row also be added for the global rankings?
- Section 4: “models were ranked according to… the count of the number of times for which each model falls into the upper tercile in terms of all three spatiotemporal skill metrics (i.e., correlation, normalised RMSE and the ratio of standard deviation)”. Is there a danger of double counting by ranking the models in this way? Since (as far as I understand), these three metrics are not independent – any model which performs well by two metrics will automatically perform well on the third (I think?), since they are related by the Taylor diagram.
- L324: “all three spatiotemporal skill metrics” – what is the temporal element of these skill metrics? As far as I’ve understood, all three are purely spatial metrics calculated across the gridbox values of time-averaged FWI indices (though I may well have misunderstood; if so perhaps Section 2.4 could be clarified to give more detail on how the metrics are defined).
- L362: “a comprehensive evaluation of CMIP6 performance” – while an excellent evaluation, I’m not certain it can be described as ‘comprehensive... of CMIP6’ when only 16 out of ~50 CMIP6 models are included.
- L365: “for the period 1979-2014” – Earlier in the text (L157) the analysis period was given as 1980-2014; which range is correct?
- L381-384: “the large differences in model performances highlight the importance of a comprehensive model selection. This could significantly affect the conclusion provided in previous assessments... using a multi-model mean” – it would be interesting to check how much the MMM bias improves by in an ensemble where only the best performing models are included. Or do the errors in different models cancel each other such that the MMM performance is actually similar either way?
- On a related note, and just as a quick aside, good (bad) performance at simulating historical FWI isn’t necessarily a guarantee that models will project future changes in FWI well (badly). This is probably beyond the scope of the current paper, but if the authors have any plans to extend this work, it could be interesting to take the best performing models and see whether or not they project the same changes in FWI for a given future SSP scenario, or whether they diverge in their future projections…
Carolina Gallo et al.
Carolina Gallo et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
473 | 232 | 12 | 717 | 2 | 6 |
- HTML: 473
- PDF: 232
- XML: 12
- Total: 717
- BibTeX: 2
- EndNote: 6
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1