The Seasonal-to-Multiyear Large Ensemble (SMYLE) Prediction System using the Community Earth System Model Version 2
- 1National Center for Atmospheric Research, Boulder, Colorado, USA
- 2Department of Atmospheric and Oceanic Sciences, University of Colorado, Boulder, Colorado, USA
- 1National Center for Atmospheric Research, Boulder, Colorado, USA
- 2Department of Atmospheric and Oceanic Sciences, University of Colorado, Boulder, Colorado, USA
Abstract. The potential for multiyear prediction of impactful Earth system change remains relatively underexplored compared to shorter (subseasonal to seasonal) and longer (decadal) timescales. In this study, we introduce a new initialized prediction system using the Community Earth System Model Version 2 (CESM2) that is specifically designed to probe potential and actual prediction skill at lead times ranging from 1 month out to 2 years. The Seasonal-to-Multiyear Large Ensemble (SMYLE) consists of 2-year long hindcast simulations that cover the period from 1970 to 2019, with 4 initializations per year and an ensemble size of 20. A full suite of output is available for exploring near-term predictability of all Earth system components represented in CESM2. We show that SMYLE skill for El Niño-Southern Oscillation is competitive with other prominent seasonal prediction systems, with correlations exceeding 0.5 beyond a lead time of 12 months. A broad overview of prediction skill reveals varying degrees of potential for useful multiyear predictions of seasonal anomalies in the atmosphere, ocean, land, and sea ice. The SMYLE dataset, experimental design, model, initial conditions, and associated analysis tools are all publicly available, providing a foundation for research on multiyear prediction of environmental change by the wider community.
- Preprint
(6219 KB) -
Supplement
(2115 KB) - BibTeX
- EndNote
Stephen Gerald Yeager et al.
Status: final response (author comments only)
-
RC1: 'Comment on gmd-2022-60', Anonymous Referee #1, 11 Apr 2022
General comments
This manuscript describes and evaluates a new ensemble prediction system for lead times up to 2 years. The system is based on the previously documented CESM2 (Danabasoglu et al. 2020) Earth system model, initialized from JRA-55 in the atmosphere, a forced integration following the OMIP2 protocol for ocean and sea ice (FOCI), and a forced simulation of the CLM land surface component. 20-member ensembles are initialized for four initial months per year over the 1970-2019 period, making this dataset a substantial contribution to the climate prediction community.
The paper is well organized, pleasant to read and instructive. I acknowledge the thorough effort led by the authors to evaluate a more diverse range of variables and indices beyond sea surface temperature, circulation indices and precipitation, thereby illustrating the interest of such a system based on an Earth system model and promoting further research with this database. The goals of the paper are clearly stated at the end of the introduction, and in my view, rather adequately fulfilled in the following sections. The number of figures remains quite reasonable with respect to the completeness of the analysis.
Given the quality of this submission, I recommend to accept it for publication in GMD, subject to minor revisions. I have two main points I wish to raise, and more specific comments and minor suggestions follow.
1) Although the initialization strategy is described in detail, the authors focus the evaluation on the seasonal-to-multiyear forecast skill. For some variables for which observational data is scarce, or does not cover the entire hindcast period, reconstructions used for initialization are also used as a reference for skill assessments. I understand the reason for this choice but would then have expected more details on the estimated quality of these reconstructions when these haven’t been documented elsewhere (which is the case at least for FOSI). For instance, the authors mention some shortcomings of the CESM2 contribution to OMIP2 that were corrected by tuning parameters and restoring strength for FOSI, but no further details or evaluation of the improvement with respect to independent estimates are provided (it could be included as supplementary information).
With respect to these reconstructions, were any long-term drifts found? I acknowledge several cycles of the forced model have been run but is it enough to avoid spurious effects in the hindcasts?
2) Furthermore, although some figures provide an indication of the ensemble spread, skill is evaluated solely using deterministic scores (anomaly correlation coefficients of the ensemble mean, root mean square error). Having a 20-member ensemble allows for the assessment of other aspects of forecast quality, including reliability and resolution, or other probabilistic metrics of hindcast skill.
Specific comments
1) The authors compare some skill assessments with other reference systems such as the NMME multimodel (for seasonal time scales) and CESM1 DPLE (for November initializations). However these comparisons are only shown in a selection of figures. I’m not necessarily asking for comparisons to be included in each figure, but more discussion on similarities / discrepancies in skill with these two benchmarks could be of interest to the reader.
2) I was confused by differences in lead time values in Figure 5 and in the text (lines 298-312). The shorter lead months don’t seem to appear on the plots although they are mentioned in the text (ie: line 300 refers to an ACC of 0.65 I cannot find on the plot). Furthermore using the color and symbol code, lead month 2 for SMYLE-FEB reads as JJA, which isn’t consistent at all with definitions provided in the paragraph starting at line 164 – and doesn’t make sense. Could you please revise the figure?
3) Correlation / ACC values are often referred to as significant / non-significant, but I found no mention of the significance test and underlying hypotheses (sorry if I missed it!).
Minor suggestions
l. 46: “seasonal protocols call for ensemble simulations lasting 12 months” → not all operational systems go up to 12 months; by WMO standards, seasonal prediction information is provided up to ~ 6 months. I would recommend saying “lasting up to 12 months”.
l. 65-69: Some of the potential sources of predictability are associated to a reference, whereas others are not; I would recommend harmonizing this. For snow cover: consider Orsolini et al. (2016) or more recently Ruggieri et al. (2022). For QBO: consider Butler et al. (2016) (QJRMS). For greenhouse gas forcing: Doblas-Reyes et al. (2006)
l. 181: JRA55 (reanalysis) data for precipitation is not an obvious choice; is this due to the hindcast period? Couldn’t you use merged precipitation datasets such as GPCP which probably have a higher fidelity to actual observations?
l. 310: Not unrelated to my earlier comment on assessing probabilistic skill and using the 20-member ensemble, did you evaluate the ensemble spread of SMYLE according to target season and forecast time for these ocean indices?
l. 348-364: Was this low (no) NAO skill already found with DPLE-NOV? Another aspect, beyond horizontal and vertical resolution of the atmosphere, is the sensitivity of correlation of NAO to the ensemble size, the length of the re-forecast period (see e.g. Shi et al., 2015) and low-frequency variability of NAO skill during the last century (Weisheimer et al., 2019). They suggest that RMS-based scores are less sensitive estimates of NAO skill.
l. 389: I’m not at all a biogeochemistry expert; there appears to be some variability in skill according to the target season, with summer and fall Zoo C, NPP and carbon export more predictable than winter or spring. Why is this the case? What are the drivers behind what appears to be a return of potential predictability in SMYLE? Some discussion (or references) on this would be helpful!
Figure 9 is a bit blurry: could you increase its resolution?
In figures 10 and 11, correlation and RMSE for CESM2-LE are plotted at lead month 19. I find this choice confusing since CESM2-LE is not initialized; maybe you could use a dotted or dashed line as done for the persistence forecasts?
l. 465-480: Summer (JAS) SIE trends in SMYLE seem different from FOSI, with the ensemble mean generally below FOSI values in the 1970s-1980s, and above after the mid-2000s. Do you have an explanation for what appears to be conditional drift? Did you compare the sea ice thickness fields in SMYLE with FOSI?
Section 3.7: Results are interesting, however some comparison with other recent evaluations would be nice. Although not focusing on the same period, and using IBTrACS as a reference, Befort et al. (2022) (their Fig. 5) would be a nice comparison for your lead time 1 month results in table 1.
Fig. 15: This figure is quite difficult to read and interpret as it superimposes many time series. I would suggest either presenting a subset of information, or including it in the supplement to the article.
l. 560: missing word (“as”)? “as well an experimental system”
l. 574: Out of curiosity: are there any plans to update the system in near real-time? How frequently is JRA55-do updated?
References mentioned:
Befort et al. (2022) doi: 10.1175/JCLI-D-21-0041.1
Butler et al. (2016) QJRMS 142(696):1413–1427
Doblas-Reyes et al. (2016) doi: 10.1029/2005GL025061
Orsolini et al. (2016) Climate Dynamics 47(3–4):1325–1334
Ruggieri et al. (2022) doi: 10.1007/s00382-021-06016-z
Shi et al. (2015) doi: 10.1002/2014GL062829
Weisheimer et al. (2019) doi: 10.1002/qj.3446
-
RC2: 'Comment on gmd-2022-60', Anonymous Referee #2, 26 Apr 2022
This submission describes an extensive set of hindcasts from the CESM2 model that enable the performance of initialized predictions in the relatively unexamined multi-year range (out to 24 months in this case) to be extensively explored. Notably, performance over a broad range of Earth system components (atmosphere, sea ice, ocean and land including biogeochemistry) is addressed. The paper is very well organized and written, and criticisms are limited mainly to relatively minor details of description and presentation. Exceptions are items 7 and 17 below, which will require modest additional computation if the authors concur that acting on these recommendations will improve the paper. Overall however the authors are to be congratulated for this interesting and compelling documentation of SMYLE.
Main comments
1) At line 47, suggest replacing “at least 10-years duration” with “up to 10-years duration” because some operational “decadal” systems have a 5-year range, but none that I’m aware of run for >10 years.
2) Suggest additionally referencing Boer et al. https://doi.org/10.1007/s00382-013-1705-0 and Chikamoto et al. https://doi.org/10.1038/s41598-017-06869-7 in the sentence starting on line 66, possibly as follows: “…volcanic activity (Hermanson et al. 2020), greenhouse gas forcing (Boer et al. 2013), or some combination thereof (Chikamoto et al. 2017).”
3) At line 72 should also reference Ilyina et al. https://doi.org/10.1029/2020GL090695
4) Line 128 states that forcing is applied cyclically form 1901-1920 to equilibrate the land state. Please go into a bit more detail about the total length of time this cyclic forcing was applied, in relation to expected equilibration times of land variables such as vegetation and soil carbon.
5) It’s stated that the hindcasts cover 1970-2019. Presumably this is the period covered by the initialization times, and not the simulations themselves which would extend into 2021? If so please be explicit that 1970-2019 spans the initialization times.
6) Regarding “Potentially useful prediction skill (ACC>0.5) is seen for land precipitation over the southwestern US in DJF and MAM (lead month 1)” at lines 208-209, this really should say “southwestern North America” considering that the only DJF grid boxes >0.5 are in Mexico.
7) Regarding “A more rigorous analysis is needed to definitively demonstrate that SMYLE skill differences from DPLE are statistically significant and not likely explainable by chance” (lines 226-227), this could be done relatively easily by applying the random walk methodology of DelSole and Tippett, https://doi.org/10.1175/MWR-D-15-0218.1, where differences either in the anomaly pattern correlation or the RMSE between the 50 pairs of November-initialized hindcasts could be used as the basis for comparison.
8) At line 178, please provide a rationale for regridding to a 5x5 or 3x3 degree grid. (Also a small point, but I’m not sure that regridding to a coarser grid qualifies as “interpolation”.)
9) Below line 360 it would be appropriate to reference Butler et al. https://doi.org/10.1002/qj.2743 in relation to the influence of lid height on skill in forecasting the NAO. (For example could append as “…relative to these baseline SMYLE results, although a robust connection between atmospheric vertical resolution and NAO skill has not been demonstrated (Butler et al. 2015).”)
10) Please replot Figs. 8d-i using the tick mark values in Fig. 9 which are better aligned with the experiment.
11) Are there any evident explanations or hypotheses for the strong seasonal dependence of skill in Figs. 8d-i, e.g. high SE US shelf NPP ACC in JJA and SON, and low CA current NPP ACC in DJF?
12) Should mention in the captions for Figs. 8-9 that shading and filled symbols indicate statistical significance.
13) The OceanSODA-ETHZ aragonite saturation dataset covers 1985 to 2018 according to Gregor and Gruber (2021), so presumably the skill results in Fig. 9 are specific to this period? Or does the verification period leave out the years before 1990 which are much more uncertain according to those authors? Please be explicit about this and any other deviations of verification periods from the 1970-2019 period covered by SMYLE.
14) In the captions to Figs. 10 and 11 suggest removing “(see text for details)” since the text doesn’t provide any significant additional detail.
15) In Fig. 13 the lead times in the legends of the plots disagree with the lead times indicated in the caption.
16) Relating to Figs. 12 and 13, it would be interesting to have a sense of how the correlation and nRMSE values shown compare to values based on comparing OBS to FOSI.
17) Figure 14 shows JJASON (NH) and DJFMMA (SH) cyclone track densities regressed against annual mean Nino3.4 index. However, because ENSO typically peaks around December and frequently changes phase between about April and August, annual mean Nino3.4 is not a very good indicator of ENSO activity. In addition, this procedure introduces a seasonal disconnect in that January Nino34 is presumed to influence TC activity in the following November (for example). Suggest instead regressing JJASON track densities against JJASON Nino3.4, and DJFMAM track densities against DJFMAM Nino3.4, or else better justifying the original choice made. (Also please be explicit in the caption to Fig. 14 what is the timing of the Nino3.4 index.)
18) Tables 1 and S1 along with Fig. 15 imply that JJASON and DJFMAM TC predictions are made at 19-month lead time. However, for JJASON this implies initialization on 1 Nov, which in turn implies prediction of JJASO (not JJASON) at 19-month lead by the 24-month hindcast, and similarly for DJFMAM. Although this is a small point it should briefly be acknowledged somewhere (similarly to the 22-mon lead Nino3.4 forecasts in the caption to Fig. 4).
19) Regarding the RMSE scores, “RMSE” isn’t defined anywhere, and when introducing RMSE in the text should briefly comment on the use of normalized RMSE and introduce the nRMSE notation. Also, is nRMSE defined such that predictions of climatology (zero anomaly) will yield values of 1? If so then briefly mentioning this will help the reader appreciate that nRMSE values <1 indicate that the prediction is more skillful than a climatological prediction.
Technical corrections:
line 203: central America -> Central America
line 207: SAT hasn’t been defined
lines 270 and 275: “1998” -> “1997” (as year of a strong El Nino)
line 361: Quasi-biennial -> Quasi-Biennial
line 510: Figs. S1, S2 -> Figs. S4, S5
line 520 vs 185 vs 591: is it “best track”, “Best Track” or “BestTrack”?
line 528: should the 2 in kt2 be a superscript?
line 556: suggest “multi-year skill” -> “multi-year skill or potential skill”
line 561: suggest to remove “obviously”
line 923: ACC map gross -> ACC map for gross
Stephen Gerald Yeager et al.
Stephen Gerald Yeager et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
451 | 136 | 13 | 600 | 30 | 6 | 6 |
- HTML: 451
- PDF: 136
- XML: 13
- Total: 600
- Supplement: 30
- BibTeX: 6
- EndNote: 6
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1