The Euro-Mediterranean Center on Climate Change (CMCC) decadal prediction system

. Decadal climate predictions, obtained by constraining the initial condition of a dynamical model through a truthful estimate of the observed climate state, provide an accurate assessment of near-term climate change and are a useful tool to inform decision-makers on future climate-related risks. Here we present results from the CMIP6 (Coupled Model Intercomparison Project Phase 6) Decadal Climate Prediction Project (DCPP) decadal hindcasts produced with the operational CMCC (Euro-Mediterranean Center on Climate Change) decadal prediction system (DPS), based on the fully coupled CMCC-CM2-SR5 dynamical model. A 20-member suite of 10-year retrospective forecasts,

Abstract.Decadal climate predictions, obtained by constraining the initial condition of a dynamical model through a truthful estimate of the observed climate state, provide an accurate assessment of near-term climate change and are a useful tool to inform decision-makers on future climate-related risks.
Here we present results from the CMIP6 (Coupled Model Intercomparison Project Phase 6) Decadal Climate Prediction Project (DCPP) decadal hindcasts produced with the operational CMCC (Euro-Mediterranean Center on Climate Change) decadal prediction system (DPS), based on the fully coupled CMCC-CM2-SR5 dynamical model.A 20-member suite of 10-year retrospective forecasts, initialized every year from 1960 to 2020, is performed using a full-field initialization strategy.
The predictive skill for key variables is assessed and compared with the skill of an ensemble of non-initialized historical simulations so as to quantify the added value of the initialization.In particular, the CMCC DPS is able to skillfully reproduce past climate surface and subsurface temperature fluctuations over large parts of the globe.The North Atlantic Ocean is the region that benefits the most from initialization, with the largest skill enhancement occurring over the subpolar region compared to historical simulations.On the other hand, the predictive skill over the Pacific Ocean rapidly decays with forecast time, especially over the North Pacific.In terms of precipitation, the skill of the CMCC DPS is significantly higher than that of the historical simulations over a few specific regions, including the Sahel, northern Eurasia, and over western and central Europe.
The Atlantic multidecadal variability is also skillfully predicted, and this likely contributes to the skill found over remote areas through downstream influence, circulation changes, and teleconnections.Considering the relatively small ensemble size, a remarkable prediction skill is also found for the North Atlantic Oscillation, with maximum correlations obtained in the 1-9 lead year range.
Systematic errors also affect the forecast quality of the CMCC DPS, featuring a prominent cold bias over the Northern Hemisphere, which is not found in the historical runs, suggesting that, in some areas, the adopted full-field initialization strategy likely perturbs the equilibrium state of the model climate quite significantly.
The encouraging results obtained in this study indicate that climate variability over land can be predictable over a multiyear range, and they demonstrate that the CMCC DPS is a valuable addition to the current generation of DPSs.This stresses the need to further explore the potential of the nearterm predictions, further improving future decadal systems and initialization methods, with the aim to provide a reliable tool to inform decision-makers on how regional climate will evolve in the next decade.

Introduction
Climate fluctuations are the end result of a number of processes acting on a multitude of timescales.Prior to the year 2000, century-scale climate change projections, initialized with a physical state of the climate system obtained from a long simulation of the preindustrial period and subject to prescribed anthropogenic and natural forcings, have been the only available product to inform decision-makers on future climate-related risks.A major limitation of non-initialized climate projections is their lack of information about the ongoing natural variability that may affect climate changes in the near future, which is, at least in part, linked to the current state of the Earth's climate system.Decadal predictions, obtained by constraining the initial condition of a dynamical model (coupled global circulation model/Earth system model) through a realistic estimate of the observed climate state, provide a more accurate assessment of climate change in the near-term (decadal) range, where both external and internal drivers contribute to the climate evolution (Smith et al., 2007;Kushnir et al., 2019).
Starting from the 2000s, initialized decadal predictions have been assessed in multiple projects, from the first pioneering efforts up to the fifth Coupled Model Intercomparison Project (CMIP5; Smith et al., 2007;Keenlyside et al., 2008;Pohlmann et al., 2009;Meehl et al., 2009;Doblas-Reyes et al., 2011), in which coordinated experiments allowed a multi-system comparison to reduce single-model uncertainties (Taylor et al., 2012;Bellucci et al., 2015b), contributing to the Intergovernmental Panel on Climate Change Fifth Assessment Report (AR5, chap.11; Kirtman et al., 2013).
Years of coordinated research and development have led to an established experiment protocol that has overcome some of the limitations (e.g., limited ensemble size and initialization every 5 years) of the decadal prediction simulations produced in the CMIP5 framework.This protocol is extensively described in the CMIP6 (Coupled Model Intercomparison Project Phase 6) Decadal Climate Prediction Project (DCPP), a coordinated multimodel effort within the World Climate Research Programme (WCRP), which aims to investigate climate predictions, predictability, and variability from annual to decadal timescales (Boer et al., 2016).The DCPP is intended to make skillful forecasts and predictions on these timescales using state-of-the-art climate models and statistical approaches.The core of the DCPP is component A, which includes a set of retrospective forecasts (hindcasts).This framework has laid the groundwork for a number of singlemodel (Bethke et al., 2021;Bilbao et al., 2021;Kataoka et al., 2020;Robson et al., 2018;Sospedra-Alfonso et al., 2021;Xin et al., 2019;Yang et al., 2021;Yeager et al., 2018) and multimodel studies (Borchert et al., 2021a, b;Delgado-Torres et al., 2022).
Climate anomalies on annual-to-multidecadal timescales are determined from both the internal and the externally forced variability (Boer et al., 2016).External contributions derive from solar irradiance variations, volcanic aerosols, and anthropogenic activities, including land use, aerosols, and greenhouse gas emissions, accounting for the global warming trend.On the other hand, the oceans, and to a lesser degree the land surface, sea ice, and stratosphere (Bellucci et al., 2015a) and their interaction with the atmosphere, are the primary source of internal variability within the climate system on decadal timescales.Low-frequency fluctuations in the North Atlantic sea surface temperature (SST), known as the Atlantic multidecadal variability (AMV), affect the global climate through local impacts and remote teleconnections (e.g., Sutton and Hodson, 2005;Zhang and Delworth, 2005;Knight et al., 2006;Sun et al., 2015;Nicolì et al., 2020;Ehsan et al., 2020;Ruprich-Robert et al., 2021).The longterm AMV evolution is well captured in most state-of-the-art forecast systems and represents one of the primary sources of predictability and, possibly, of skill at a decadal timescale that is generally attributed to the initialization of the Atlantic meridional overturning circulation (AMOC; Zhang et al., 2019).Other predictability sources arise from the Pacific Ocean.Interestingly, decadal El-Niño-Southern Oscillation (ENSO) impacts may be modulated by the Interdecadal Pacific Oscillation (IPO), which features both the ENSO SST region and extends to other extratropical areas (Henley et al., 2015).In addition, the initial state of some land surface characteristics, stratosphere, snow cover, and sea ice may also impact the predictability of the climate system (e.g., Bellucci et al., 2015b;Meehl et al., 2021).The aforementioned initialized components provide additional predictability, for the atmospheric circulation and, in particular, for the North Atlantic Oscillation (NAO) affecting boreal winter climate over Europe (Smith et al., 2019;Athanasiadis et al., 2020;Dunstone et al., 2022).
In this paper, we present the decadal prediction system (DPS) developed at the Euro-Mediterranean Center on Climate Change (CMCC) using the CMCC-CM2-SR5 state-ofthe-art climate model (Cherchi et al., 2019) and contributing to the CMIP6 DCPP project.In particular, the study aims to assess the skill in predicting the observed anomalies in key meteorological variables, thereby testing the ability of the DPS to simulate the main climate variations from annual to the decadal timescale.
The article is structured as follows: Sect. 2 provides details on the model configuration, the experimental protocol, the evaluation metrics, and the data used to check the benefits of the initialization.Section 3 presents results on the predictions' skill for key quantities and their evolution in time, using deterministic and probabilistic approaches, and assesses the evolution of some relevant model biases.Section 4 focuses on the skill for selected regional climate variability indices.Finally, Sect. 5 summarizes and discusses the main findings of the study and also draws some conclusions.
2 Data and methodology

Description of the CMCC DPS model
The CMCC decadal prediction system is based on the CMCC-CM2-SR5 coupled model, shortly described below (see Cherchi et al., 2019, for additional details).The atmospheric component is the Community Atmosphere Model version 5 (CAM5) with a regular grid of 0.9-1.25 • and 30 hybrid levels, including 17 levels below 200 hPa and extending up to 2 hPa.The finite-volume configuration has been chosen for the dynamical core.The ocean model is the Nucleus for European Modelling of the Ocean version 3.6 (NEMO v3.6), using a tripolar ORCA grid with a horizontal resolution of about 1 • (with a varying latitudinal resolution ranging from 1/3 • near the Equator up to 1 • at high latitudes) and 50 levels in the vertical.The sea ice component is the Community Ice CodE in its fourth version (CICE4).The DPS configuration of CICE4 uses a single category to characterize the sea ice thickness, for consistency with the respective reanalysis used for the initialization.The Community Land Model version 4.5 (CLM4.5) is used for the simulation of the land surface at the same horizontal grid used by the atmospheric component.Finally, the River Transport Model (RTM; Branstetter, 2001) routes liquid and ice runoff from the land surface model to the active ocean to simulate a closed hydrological cycle.
A suite of retrospective forecasts (hindcasts) consisting of 20-member ensembles of 10-year-long hindcasts, initialized every year from 1960 to 2020, has been completed, following the CMIP6 DCPP-A protocol (Boer et al., 2016).As summarized in Table 1, all the members are initialized on 1 November, starting from direct, full-field estimates of the observed state of the ocean, sea ice, land surface, and atmosphere, without any coupled assimilation runs.For each start date, two initial conditions for the atmosphere are obtained from the ERA-40 (1960ERA-40 ( -1978;;Uppala et al., 2005), ERA-Interim (1979-2018;Berrisford et al., 2011), andERA5 (2019 onwards;Hersbach et al., 2020) reanalyses, taking the atmospheric states of 1 and 2 November.The ocean, sea ice, and land surface states are initialized with an ensemble of global data assimilation products (ocean and sea ice) and analyses constrained with observed fluxes (land surface).Specifically, land surface is initialized using two different analyses obtained from a land-only configuration of the CLM4.5 land model integrated offline with two different atmospheric forcing datasets, namely CRUNCEP version 7 (Viovy, 2016) and GSWP3 (Kim, 2017).These datasets provide the land model with instantaneous 2 m air temperature and humidity, 10 m winds and surface pressure every 6 h, and accumulated radiation and precipitation every 3 h.Ocean initial states are derived from CHOR (for the period 1960-2010; Yang et al., 2016) and CGLORSv7 reanalysis (for the period 2011-2020; Storto and Masina, 2016) and are performed with a threedimensional variational data assimilation system, a surface nudging, and a bias correction scheme.It is worth noting that the reanalysis is performed with the same ocean model used in the CMCC DPS (i.e.,NEMO v3.6).An ensemble of five ocean initial states is used to initialize the ocean and sea ice components, where three initial estimates originate from global ocean reanalysis characterized by different assimilation strategies of SST and in situ profiles of temperature and salinity in a 0.5 • configuration of the NEMO ocean model, while the remaining two initial states are derived through linear combinations of the former three initial states.The ocean initial states provide three-dimensional fields of temperature, salinity, and horizontal currents.The sea ice model has been initialized, starting from sea ice temperature, sea ice volume, sea ice area, and snow volume.Our full-field initialization approach is similar to the ones adopted by other DPSs (e.g., Bilbao et al., 2021;Sospedra-Alfonso et al., 2021) in which the a posteriori correction is applied, since the simulations deviate to their own model attractors.Other DPSs (e.g., Bethke et al., 2021;Brune and Baehr, 2020;Kataoka et al., 2020) are initialized using the observed anomalous variability superimposed on the model climatology to avoid the initial shock.Both the techniques are deemed valid in CMIP6 DCPP protocols (Boer et al., 2016) and present some drawbacks.For example, bias correction in the former may remove part of the variability signal, while the latter has the assumption that the model variability is independent from the model mean state.Nevertheless, several studies have proven that differences in skill are small and localized (Smith et al., 2013;Bellucci et al., 2015b;Volpi et al., 2017).
The time-evolving radiative forcings (including solar radiation, greenhouse gas concentrations, and anthropogenic and volcanic aerosols) are prescribed during the historical period  and follow the Shared Socioeconomic Pathways (SSPs) scenario of SSP2-4.5 (O'Neil et al., 2016) from 2015 onwards, which is in agreement with the CMIP6 DCPP protocol.

Verification data
Uninitialized historical simulations covering the 1850-2014 period are used to assess the added value of realistic model initialization in decadal predictions.We use a 10-member ensemble of historical simulations initialized with different states of a multi-century preindustrial climate simulation.Each member of the historical ensemble is extended until 2030 under the SSP2-4.5 scenario, thus allowing a fair comparison with the decadal forecast ensemble initialized in the year 2020.Since only 10 historical members are available, the 20-member hindcast ensemble has been scaled down using a random sub-sampling with 100 combinations in order to allow a fair comparison to the historical ensemble in the skill assessments.

Verification metrics
Initializing decadal predictions from estimates of the observed states of the Earth system may generate spurious responses, since the climate model used to produce the simulations, after initialization, tends to drift towards its own attractor (mean climate), deviating from the observed climatology, which is a consequence of the model's systematic error (bias).This issue is particularly pronounced in the predic-tion systems adopting a full-field initialization strategy, as in the present case.The spurious drift can be removed a posteriori by subtracting a lead-time-dependent climatology at each grid point, assuming a constant drift throughout the time record (Goddard et al., 2013;Boer et al., 2016).
To evaluate the skill of the prediction system, both deterministic and probabilistic metrics are used.The anomaly correlation coefficient (ACC) and the mean square skill score (MSSS) are deterministic metrics, measuring the accuracy of the ensemble mean prediction in reproducing the observed variability over the 1961-2020 period targeted by the decadal reforecasts.More specifically, the ACC is a dimensionless measure evaluating the phase agreement between predicted and observed anomalies, ranging from −1 to 1 (Wilks, 2011).The MSSS, additionally, quantifies the magnitudes between the predicted and observed anomalies (Goddard et al., 2013).This metric evaluates the skill of the ensemble mean prediction with respect to a reference prediction.Specifically, the MSSS is defined as follows: where MSE HO (MSE PO ) is the mean square error evaluated for the initialized (uninitialized) ensemble mean against observations.The MSSS takes a maximum value of one (1.0), while it does not have a lower limit.Positive MSSS values mean more accurate predictions in the initialized runs, and one may speculate that the opposite is also true.However, since the MSSS is not symmetric around zero, the positive and negative MSSS absolute values do not have the same meaning in terms of variance.Probabilistic skill scores provide a useful complement to deterministic metrics in assessing the quality of a prediction system.In this study, relative operating characteristic (hereafter ROC) score maps have been assessed for the hindcasts (Kharin and Zwiers, 2003;Wilks, 2011).Each grid point in these maps shows the area under the ROC curve, equal to the probability of a certain anomaly to exceed a specific threshold.When the ROC score approaches the perfect forecast (i.e., equal to one), then the DPS is able to discriminate the occurrence of predetermined events.On the other hand, no skill emerges when the score is close to 0.5.Here, we have considered three equiprobable categories, i.e., upper tercile, lower tercile, and between lower and upper terciles (neutral).Note that the ROC score outcome is not dependent on forecast biases (i.e., calibration; Kharin and Zwiers, 2003).
The DPS's ability to reproduce the dominant climate variability patterns is also tested, focusing in particular on the North Atlantic and North Pacific sectors.Decadal variability in the Atlantic region is well described by the AMV, estimated as the detrended anomalies of SSTs areaweighted over the North Atlantic basin, following the definition adopted in Trenberth and Shea (2006).The skill in predicting the NAO index is also tested using the definition in Li and Wang (2003;Fig. S1 in the Supplement).
We characterize the low-frequency variability in the Pacific basin through the IPO, which is, in turn, expressed in terms of the IPO tripolar index (TPI), accounting for the difference between the averaged SST anomalies over the equatorial zone and over the extratropical lobes of the IPO (Henley et al., 2015).At shorter timescales, the ENSO prediction is evaluated through the Niño 3.4 index, representing the spatially averaged SST anomaly over the respective region of the equatorial Pacific.
The statistical significance is assessed with a one-tailed Student's t test (Wilks, 2011), accounting for autocorrelation in the time series (Eq.30 from Bretherton et al., 1999).The anomalies of the observations and the historical simulations are computed with respect to their climatologies (reference period 1981-2010).In the initialized runs, the climatology for each forecast year is computed considering the highest possible number of initialization years.This approach allows us to maximize the statistical robustness, even if the skill may depend on the targeted verification years.3 Skill evaluation

Near-surface air temperature
Skill in predicting the global mean surface temperature (GMST; based on 2 m air temperature over land and SST over the ocean) is assessed against observed anomalies combining CRU TS v4.05 (Harris et al., 2020) over land and HadISST 1.1 for SSTs (Rayner et al., 2003).Figure 1 shows GMST for initialized hindcasts (in red; hereafter Init), noninitialized historical simulations (in blue; hereafter NoInit) and observations (in black).
The ensemble spread envelope of predicted GMST (denoting maximum and minimum range of the members variability; shown in orange) encompasses the observations, especially at lead years 1 and 1-5, successfully capturing the multiyear variability, including the cooling effect of major volcanic eruptions, such as the El Chichón and Pinatubo eruptions that occurred in 1982 and 1991, respectively.The initialization contributes to the reduction in the Init ensemble spread, which is about half the envelope of the NoInit for lead year 1, due to the beneficial impact of synchronizing observed and model internal climate variability.

Deterministic metrics
Predictive skill at the regional scale is assessed through ACC maps. Figure 2 shows ACC for annual surface temperatures evaluated at different lead year intervals and the corresponding differences with respect to NoInit.In order to remove the skill impact on different ensemble sizes, we compare the skill of the historical (10 available members) with the skill of random subsampling of the initialized run with 10 members out of the 20 available members.The MSSS maps provide further details on the skill improvement determined by initialization (Fig. 3), assessing the consistency between the magnitudes of the predicted and the observed anomalies (see Sect. 2.3).
At lead year 1, significant predictive skill is found over most of the globe, reaching the highest values (ACC = 0.80) over the tropical Indian Ocean, northern and equatorial Africa, northeastern part of South America, subpolar North Atlantic, and western tropical Pacific.Lack of skill, instead, characterizes the western subtropical North Atlantic, eastern Europe, central part of South America, and part of the western North Pacific and Southern Ocean.The added value of initialization (Fig. 2b) is particularly prominent over the tropical and the eastern subpolar North Atlantic and over the tropical and the extratropical North Pacific.In addition, Init exhibits higher skill (up to 0.5) over the North American continent, central Africa, and the Indian subcontinent.The corresponding MSSS pattern (Fig. 3a) clearly indicates that Init outperforms NoInit in reproducing the magnitude and the sign of the observed anomalies over approximately the same areas, showing improved ACC with respect to NoInit (Fig. 2b).
In the 1-5 lead year range, the skill is generally higher than for lead year 1, likely due to the effect of averaging over a longer interval (5 years) and to the emerging warming trend.In contrast, the skill undergoes a clear deterioration over the tropical and northern part of the Pacific Ocean when a multiyear range of prediction skill is considered (Fig. 2c).Significant skill is found over the continental areas of North America, Eurasia, Africa, and over the Maritime Continent.A large fraction of the skill seems to derive from the warming trend that increases predictability, at this lead year range, over land and over the Indian Ocean (Van Oldenborg et al., 2012).Over the North Atlantic Ocean, the emerging AMV footprint is recognizable, with high predictive skill associated with the typical horseshoe pattern emerging from the Init vs. NoInit comparison (Fig. 2d).This pattern is also noticeable in the relative MSSS map (Fig. 3c), suggesting improved predictability for the AMV tropical lobe, while the extratropical lobe may be affected by strong biases as it is characterized by high ACC values and neutral MSSS.Nearterm prediction skill is improved especially over the eastern Mediterranean and the Arabian Peninsula (ACC = 0.3), reaching high correlation values (ACC = 0.90 in Fig. 2c) also reflected in the MSSS (Fig. 3c).
The pattern exhibited in the lead year range 6-10 is very similar to that shown in lead years 1-5, although some regional changes, such as those in eastern Europe and Siberian region, may be easily spotted.Areas with non-statistically significant skill cover part of the eastern Pacific Ocean (Fig. 2e).The generally higher skill attributable to initialization (Fig. 2f) is substantially consistent with the pattern obtained for the lead year range 1-5, even if it is not reproduced in the MSSS analysis (Fig. 3e), suggesting that surface temperature variations are not well captured.
To corroborate the skill analysis of surface temperature at decadal timescales, we assess the skill for the ocean heat content integrated over the top 300 m of the water column (hereafter OHC300).The ACC pattern computed for the OHC300 anomalies (Fig. 4) is similar to, and thus consistent with, the results obtained for the SST (Fig. 2).At lead year 1, significant ACC covers most parts of the oceans, except for the eastern Atlantic and Southern Ocean.The anomalies' values are also well captured north of 30 • N. The OHC300 area exhibiting significant skill is reduced when higher lead time ranges are considered.At lead years 1-5 and 6-10, the ACC is significant over the tropical Pacific, excluding the equatorial band due to the poor long-term predictability of ENSO, as also found in other DPSs (e.g., Bilbao et al., 2021).Positive ACC values also cover part of the Indian Ocean and South and North Atlantic regions.The MSSS shows positive values mainly localized over the midlatitudes in the North Atlantic (Fig. S6) and is found to be quite consistent with SST MSSS (Fig. 3).The lack of skill over the subpolar gyre may be partly due to the erroneous representation of the AMOC in the DPS, altering the local ocean circulation and heat content.A complementary analysis reveals that The stippling denotes points where 95 % statistical significance is not reached, according to a one-tailed t test.Effective degrees of freedom have been computed following Eq.( 30) in Bretherton et al. (1999).
the mean AMOC cell in the DPS is quite well reproduced in terms of structure although its maximum is located too far south (below 20 • N) at lead year 1 (Fig. S3), as compared to other AMOC reconstructions based on different oceanic reanalyses (e.g., Karspeck et al., 2017).At lead years 1-5 and 6-10 the maximum moves northwards, due to the model adjustment towards its own climatology, resembling the structure reported in other studies (e.g., Tsujino et al., 2020).The initialization shock may lead to the AMOC slowdown up to lead year 2 (Figs.S4 and S5), underestimating the maximum by about 2 Sv at 26.5 • N in the period covered by RAPID array (Moat et al., 2022).The slightly negative trend of the observed AMOC occurring during the last few decades is reproduced just at lead year 1 in the hindcasts (Figs.S4 and   S5), while the simulated low-frequency variability is consistent with the observed one also at longer lead years.

Mean bias assessment
The full-field strategy is used to initialize the forecasts, providing the best estimates of the observed state to each model component.Following the recommendation of the International CLI-VAR Project Office (ICPO, 2011), the mean bias is defined as the lead-time-dependent ensemble mean deviation from the observed mean state defined throughout the whole time record .Assessing the mean bias is an important part of evaluating decadal predictions.The time-dependent SST bias for decadal hindcasts (lead years 1, 2-5, and 10) and the bias in the historical simulations are shown in Fig. 6.
The SST bias in Init is very rapidly established during year 1, followed by a slower adjustment occurring in the following years, since the Init curves of zonal mean bias for lead years 1, 2-5, and 10 remain relatively close to each other (Fig. 6e).Bias patterns featured by Init and NoInit differ substantially over the Northern Hemisphere, with the former presenting a prominent cold bias, which is not found at all longitudes in NoInit.In the Southern Hemisphere, Init and NoInit are much more similar.This lack of agreement between Init and NoInit suggests that initialization likely perturbs the equilibrium state of the model climate quite significantly.Interestingly, the same kind of departures from the observed state have been found also in several other decadal prediction systems, including some contributing to the CMIP5 decadal prediction ensemble that adopted a fullfield initialization strategy (Bellucci et al., 2015b).This fact indicates that the time adjustment period following the initialization shock typically exceeds 10 years over the subpolar North Atlantic.
The Init bias for precipitation is comparable to the NoInit time mean bias (Fig. 7).Major departures occur in the tropical Pacific (20 • S-20 • N).Here, precipitation is overly strong in both Init and NoInit, especially south of the Equator, where the spurious occurrence of a southern ITCZ (Intertropical Convergence Zone) is a common bias in coupled models (Tian and Dong, 2020;Bellucci et al., 2010).The double ITCZ is enhanced by initialization, with a rainfall overestimation at lead year 1 (Fig. 7e).This may lead to an artificial increase in precipitation over the southwestern U.S. and drier conditions over the Mediterranean Sea (Dong et al., 2021), as seems to be the case for both the Init and NoInit.Finally, the Init bias for precipitation shows that no strong drift occurs outside the tropical zone, as the bias remains rather stable in lead time.

ROC score
The relative operating characteristic (ROC) score is used to assess the probabilistic properties of the ensemble Init (see Sect. 2.3).In this paper, the ROC score analysis focuses on near-surface temperature, because of its high predictability, and considers the occurrence of tercile categories.
The Init reproduces below-tercile and above-tercile anomalies well (Fig. 8), featuring patterns similar to the ACC ones throughout the lead times (Fig. 2), which might represent an upper boundary of the ROC score (Wilks, 2011).Specifically, ROC scores are close to one over land, the western Pacific, and the North Atlantic for multiyear predictions https://doi.org/10.5194/gmd-16-179-2023 Geosci.Model Dev., 16, 179-197, 2023 (lead years 1-5 and 6-10), confirming the anomaly direction within the ensemble spread.Predictions of anomalies falling in the middle tercile category exhibit less skill compared to the lower and upper tercile cases.Nevertheless some skill emerges over Africa, the northern and eastern part of South America, the North Atlantic, and Indian Ocean.Comparing the ROC score for NoInit (Fig. S7), most of the difference occurs over the oceans (e.g., over the North Atlantic), the realm which is most sensible to the initialization at a decadal timescale.

Assessing the prediction skill for the main climate indices
Predictability of selected regional climate indices is investigated in this section, since they influence climate variability on global and regional scales through the action of teleconnections.The Atlantic Multidecadal Variability (AMV) represents the dominant climate variability pattern of the multidecadal SST fluctuations in the North Atlantic basin (Knight et al., 2005;Smith et al., 2012;O'Reilly et al., 2019).
Figure 9a shows the ACC for the AMV index for different lead time ranges, thus helping to identify the lead year range that exhibits the maximum skill (see Sect. 2.3).The largest ACC values are found for lead year ranges longer than 4 years (ACC > 0.80), in agreement with previous works (Van Oldenborgh et al., 2012;García-Serrano and Doblas-Reyes, 2012), reaching a peak for the lead year range 4-10 (ACC = 0.91).At this lead year range, the AMV index in Init reproduces the observed low-frequency variability in the North Atlantic SST well, including the 1980s negative phase, the sharp increase during the 1990s, the peak occurring in the 2000s, and the subsequent decline (Fig. 9b), which  is in opposition to NoInit.The AMV spatial pattern is also well captured by the DPS, as depicted in ACC of linearly detrended near-surface temperature (Fig. S2).ACC patterns in the North Atlantic reveal the AMV footprint, with correlations ranging from 0.50 (in the subtropics) to 0.91 (at high latitudes).It is worth noting that large ACC values of surface temperature are also found over regions linked to the AMV through remote teleconnections, i.e., the eastern Mediterranean region (Mariotti and Dell'Aquila, 2012;Bellucci et al., 2017), Arabian Peninsula (Van Oldenborgh et al., 2012;Ehsan et al., 2020), southern Eurasia (Li et al., 2021), and the western tropical Pacific (Kucharski et al., 2016;Sun et al., 2017), suggesting that the skillful AMV prediction has also non-local impacts in regions affected by the AMV teleconnection pattern.
The predictive skill for the NAO is also analyzed by focusing on the boreal winter season (see Sect. 2.3).Significant skill is primarily found for lead year ranges longer than 5 years (Fig. 9c), although lead year 1 (coinciding with the first winter season after initialization, which is eshttps://doi.org/10.5194/gmd-16-179-2023 Geosci.Model Dev., 16, 179-197, 2023 sentially a seasonal forecast) also features significant skill (ACC = 0.42).The NAO predictive skill is maximum for the lead year range 1-9 (ACC = 0.58).At this lead year range, the observed NAO phases are well reproduced in Init (Fig. 9d), and in particular, starting from the satellite era, decadal hindcasts realistically simulate the growing trend of the 1980s and the following decline that occurred after 1995.The Init captures the NAO variability well, despite the limited ensemble size (ACC = 0.80, applying an 8-year running mean to the model index), while NoInit does not reproduce the right amplitude.The ratio of predictable component shows a rather low value (RPC = 2.15; estimated following Scaife and Smith, 2018), comparing NAO predictions with other state-of-the-art decadal prediction systems (Smith et al., 2020;Athanasiadis et al., 2020), suggesting that, in the CMCC DPS, the signal-to-noise ratio problem is somehow less impactful (Scaife and Smith, 2018).
In the Pacific sector, the TPI is considered to be a proxy of the decadal variability (Power et al., 1999;Henley et al., 2015), accounting for both the equatorial and extratropical SST fluctuations (see Sect. 2.3). Figure 10a displays the Init ACC for TPI, with significant skill peaking at lead years 4-10 (ACC = 0.56).As one may expect, Init is not able to reproduce most of the variance in the Pacific Ocean due to the lack of skill over that domain, which is also inferable by the ACC and MSSS maps (Figs. 2 and 3, respectively).In fact, the TPI evolution for lead years 4-10 shows that the respective predicted and observed anomalies are broadly consistent until the late 1980s but significantly diverge afterwards (Fig. 10b), similar to the NoInit one.The emerging skill at higher lead years is probably due to the inclusion of off-equatorial SST, linked to the predictability of the Pacific decadal variability (Henley et al., 2015).
Decadal variability in the equatorial Pacific still remains widely unpredictable (Doblas-Reyes et al., 2013), with very limited predictability beyond lead year 2. We quantify the DJF (December-February) Niño 3.4 index prediction skill in Fig. 10c and d.As mentioned, the respective maximum occurs during the first winter season after initialization (ACC = 0.95; Fig. 10d), although there is some predictability up to lead year 3 (Fig. 10c).

Summary and conclusions
In this paper, we analyzed the predictive capabilities of the CMCC DPS, using a set of 20-member hindcasts, initialized every year from 1960 to 2020, performed with the CMCC-CM2-SR5 coupled model and a full-field initialization strategy.
The study has highlighted the following main findings: -The DPS skillfully reproduces the observed variability in surface temperature (T2M over land and SST over the oceans) and upper-ocean heat content (Figs. 2  and 4, respectively), with a large fraction of the total skill stemming from long-term trends associated with changes in the external forcings (Van Oldenborgh et al., 2012).The North Atlantic Ocean is the region that benefits the most from initialization (ACC difference up to 0.80 in Fig. 2b, d, and f and MSSS close to 0.6 in Fig. 3), with the largest skill enhancement (compared to historical simulations) over the subpolar gyre region.As is still typical in decadal predictions, a lack of skill characterizes the whole Pacific Ocean over which significant ACC values are bound to lead year 1.The DPS correctly discriminates the occurrences of belowtercile and above-tercile surface temperature anomalies throughout different lead year intervals, with ROC scores close to one over land (Fig. 8).
-Some climate variability patterns in the North Atlantic sector feature significant predictability.In particular, the observed AMV signal is skillfully predicted (ACC = 0.91, Fig. 9a), and this likely contributes to also obtaining significant skill in remote areas through downstream influence, circulation changes, and teleconnections.Over the tropical Pacific Ocean, ENSO vari-ability exhibits high values of ACC bound to the first winter after initialization (ACC = 0.95).Moreover, the TPI shows higher predictive skill on longer timescales, despite the low skill that the DPS exhibits over most of the Pacific Ocean.
-Considering its relatively small ensemble size, the CMCC DPS exhibits an exceptionally high skill for the winter NAO on a multiyear-to-decadal scale (ACC_LY1-9 = 0.58 with 20 members).This is accompanied by a rather low ratio of predictable component (RPC = 2.15; estimated following Scaife and Smith, 2018), comparing NAO predictions with other state-ofthe-art decadal prediction systems (Smith et al., 2020;Athanasiadis et al., 2020).This result suggests that, in the CMCC DPS, despite certain obstinate systematic biases, the signal-to-noise ratio problem (Scaife and Smith, 2018) is somehow less severe.
-  specific areas, a feature shared with other state-of-theart decadal prediction systems.Indeed, significant skill is only found over Sahelian Africa, northern Eurasia, and over the western and central Europe, with ACC values up to 0.50 (Fig. 5).This spatially confined skill may derive from the AMV, which is known to be a source of predictability for these regions, influencing rainfall variability on annual-to-decadal timescales (Doblas-Reyes et al., 2013;Ehsan et al., 2020;Ruggieri et al., 2021).On the other hand, no significant skill for precipitation is found over the rest of the globe, probably also due to the relatively small size of our ensemble (Yeager et al., 2018) and to the very high spatial variability (Goddard et al., 2013) and the absence of a strong trend in precipitation (Gaetani and Mohino, 2013;Bellucci et al., 2015b).Improvements compared to the historical simulations are bound to some regional features, suggesting no substantial benefits from initialization in terms of precipitation skill.
Systematic errors also affect the forecast quality.The strong cold bias occurring over the Northern Hemisphere tends to produce an initialization shock and a subsequent drift, typi-cal of the full-field initialization (He et al., 2017) approach used in the CMCC DPS.Admittedly, AMOC is particularly affected by the initialization strategy (Fig. S3), with the fullfield approach inducing a long-term adjustment due to the bias in the representation of the large-scale ocean circulation (Polkova et al., 2014).In this context, another sensitive region is the equatorial Pacific in which full-field initialization seems to give the strongest benefit in skill compared to anomalous initialization (Bellucci et al., 2015b).
From another perspective, model errors may be mitigated by enhancing spatial resolution (both horizontal and vertical) in the oceanic and atmospheric model components, since a coarse resolution limits a realistic representation of key physical processes (e.g., realistic SST front in the Gulf Stream region), impacting the atmospheric circulation downstream (Athanasiadis et al., 2022;Famooss Paolini et al., 2022).For instance, an eddy-permitting ocean model (i.e., 0.25 • horizontal resolution) in a fully coupled system led to improved decadal predictions over the whole equatorial zone (Shaffrey et al., 2017).Moreover, increasing the ensemble size is expected to further increase the skill by allowing the predictable signal to emerge more clearly from the chaotic variability (Athanasiadis et al., 2020).Interestingly, the results obtained from the CMCC DPS are broadly consistent with similar assessments from other CMIP6 decadal prediction systems (Bethke et al., 2021;Bilbao et al., 2021;Kataoka et al., 2020;Robson et al., 2018;Sospedra-Alfonso et al., 2021;Xin et al., 2019;Yang et al., 2021;Yeager et al., 2018) and multimodel studies (Borchert et al., 2021a, b;Delgado-Torres et al., 2022).In particular, most of the DPSs feature high predictive skill over the Atlantic Ocean, the Indian Ocean, and continental areas, where a large fraction of predictability stems from the external forcings.The added value of initialization is most noted over the subpolar gyre and the subtropical Atlantic (in most of the DPSs), confirming these areas as being those in which decadal predictions benefit the most from realistic initialization.The non-significant skill found over the Southern Ocean is considered to be, at least in part, due to the lack of oceanic observations in that region that prevents the accurate estimate of the initial state of the ocean.The predictive skill of the Pacific Ocean rapidly decays with forecast time, especially over the North Pacific, arguably due to the limited predictability of ENSO beyond the first forecast year and the consequent strong influence over the North Pacific (unpredictable ENSO-driven variability).
We remind readers that the CMCC DPS provides decadal forecasts (operationally since 2021) for the annual release of the WMO (World Meteorological Organization) Global Annual to Decadal Climate Update, a multimodel assessment of the near-term climate for societal applications (Hermanson et al., 2022).At the same time, the CMCC DPS makes a significant contribution to the grand ensemble CMIP6 DCPP-A hindcasts.
In conclusion, this study has corroborated the idea that the climate system has a significant degree of multiyear predictability over land, and in this context, the CMCC DPS represents a valid element in the fleet of the state-of-the-art DPSs.The potential of near-term predictions deserves to be further investigated, in addition to the development of future decadal systems and improved initialization methods, with the aim of providing a useful tool able to inform decisionmakers on the evolution of the next decade's climate.

Figure 4 .
Figure 4. Same as Fig. 2 but for the ocean heat content integrated over the top 300 m depth.

Figure 5 .
Figure 5. Same as Fig. 2 but for the precipitation field.

Figure 9 .
Figure 9. (a) ACC for AMV index.The color bar ranges from 0.5 to 1.The cyan markers indicate statistically insignificant correlations.The white cross denotes the maximum value (ACC = 0.91).(b) Observed and predicted AMV index for lead years 4-10 (in which ACC is maximum).Panel (c) is the same as panel (a) but for the December-March (DJFM) NAO index.The color bar ranges from 0 to 0.6 (maximum ACC = 0.58).(d) Observed and predicted NAO index for lead years 1-9 (in which ACC is maximum).

Figure 10 .
Figure 10.(a) ACC for TPI index.The color bar ranges from 0 to 0.6.The cyan markers indicate statistically non-significant correlations.The white cross denotes the maximum value (ACC = 0.56).(b) Observed and predicted TPI index for lead years 4-10 (in which ACC is maximum).Panel (c) is the same as panel (a) but for the December-February (DJF) Niño 3.4 index.The color bar ranges from 0 to 1 (maximum ACC = 0.95).(d) Observed and predicted Niño 3.4 index for lead year 1 (in which ACC is maximum).

Table 1 .
List of initial conditions used for the generation of the 20-member hindcast ensemble.