Decadal climate predictions with the Canadian Earth System Model version 5 (CanESM5)

1 The Canadian Earth System Model version 5 (CanESM5) developed at Environment and Climate Change Canada’s Canadian 2 Centre for Climate Modelling and Analysis (CCCma) is participating in phase 6 of the Coupled Model Intercomparison Project 3 (CMIP6). A 40-member ensemble of CanESM5 historical decadal forecasts (or hindcasts) is integrated for ten years from 4 realistic initial states during 1961 to present using prescribed external forcing. The results are part of CCCma’s contribution to 5 the Decadal Climate Prediction Project (DCPP) component of CMIP6. This paper evaluates CanESM5 large ensemble decadal 6 hindcasts against observational benchmarks and against historical climate simulations initialized from pre-industrial control 7 run states. The focus is on the evaluation of the potential predictability and actual skill of annual and multi-year averages of 8 key oceanic and atmospheric fields at regional and global scales. The impact of initialization on prediction skill is quantified 9 from the hindcasts decomposition into uninitialized and initialized components. The dependence of potential and actual skill 10 on ensemble size is examined. CanESM5 decadal hindcasts skilfully predict upper-ocean states and surface climate with a 11 significant impact from initialization that depend on climate variable, forecast range, and geographic location. Deficiencies 12 in the skill of North Atlantic surface climate are identified and potential causes discussed. The inclusion of biogeochemical 13 modules in CanESM5 enables the prediction of carbon cycle variables which are shown to be skilful on decadal time scales, 14 with a strong long-lasting impact from initialization on skill in the ocean and a moderate short-lived impact on land. 15

For each ensemble member, the coupled model is initialized from a separate assimilation run that ingests observation-based 25 data from the ocean, atmosphere and sea ice as detailed below. These data-constrained assimilation runs span 1958 to present 26 and are started from consecutive years following a 80-year spinup run that assimilates repeating 1958-1967 data. This leads 27 to a spread of initial states that represent observational uncertainties. Full-field initialization is employed, and a standard lead 28 time-dependent bias correction is applied in calculating forecast anomalies (Boer et al., 2016). 29 For the global ocean, 3D potential temperature and salinity are nudged toward values interpolated from monthly Ocean where m Y and m U represent the ensemble size of forecasts and simulations, respectively. The anomalies are computed relative 23 to climatological averages over a specified time period that is common to model output and observations. For the forecast and 24 simulation ensembles, the anomalies are represented as consisting of predictable or "signal" components (Ψ, Φ) and unpredictable or "noise" components (y k , u k ). The predictable 28 components are, in turn, comprised of externally forced (ψ f , φ f ) and internally generated ψ variability. Even though the fore- while the unpredictable components (y k , u k ) differ and average to zero over a large enough ensemble. The assumption is 1 that all variables average to zero over the time period considered, forced components are independent of internally generated 2 components, and all are independent of the noise components.

3
Ensemble averaging across forecasts and simulations in Eq. (1), denoted here by dropping the sub-index k, leads to the 4 following representation of the ensemble mean forecast and ensemble mean simulation 7 where here and elsewhere the arrows indicate the large ensemble limit. The variances of the ensembles of forecast and sim- 8 ulations in Eq. (1) are, respectively, σ 2 Ye = σ 2 Ψ + σ 2 ye and σ 2 Ue = σ 2 Φ + σ 2 ue , while that of the ensemble means in Eq.

11
SB20 decompose Ψ into mutually independent uninitialized Y u and initialized Y i components, with 13 where the component Y u = αφ f is ascribed to uninitialized external forcing (here α is a measure of the projection of ψ f on  19 where q depends on ensemble size and both q and q e are less than one. Here, γ Y = σ 2 ye /σ 2 Ψ is the noise-to-predictable vari- 20 ance ratio of the forecast ensemble, Eq. (A12). The ppvf of the simulations is defined in like manner. The ppvf's q e and q 21 represent, respectively, the fractions of total and ensemble mean forecast variances that are potentially predictable. "Poten-22 tial predictability" refers here to predictability within the "model world", i.e., to predictability of a signal that is expected to 23 represent variations of the observed climate system, but which may be unrealistic due to model and/or initialization errors. A 24 potentially predictable signal is necessary but not sufficient for actual skill. The uninitialized and initialized contributions to q, 25 denoted q u and q i , respectively, are computed explicitly according to Eqs. (A14)-(A15) in appendix A. 26 Following SB20, the correlation skill (or anomaly correlation coefficient) r XY of the ensemble mean forecast can be de-27 composed as 28 r XY = r XYu σ Yu σ Y + r XYi σ Yi σ Y = r u + r i (6) 29 where r XYu and r XYi are the correlation skills of the uninitialized and initialized components Y u and Y i themselves, while 30 r u and r i are the contributions to the overall correlation skill obtained by scaling with the fractions √ q u and √ q i represent- 31 ing the variances involved. This decomposition allows the assessment of the impact of initialization on correlation skill and 7 where β = σ 2 Y /σ 2 Ye < 1. The squared potential skill gives the fraction of the ensemble total variance that is represented or 8 "explained" by the ensemble mean forecast, which in the large ensemble limit is the ppvf q e . The connection between the  12 A similar quantity can be defined for the simulations. Assuming r XY > 0, if r XY / √ β > 1 then r XY > ρ since q < 1 and the 13 actual correlation skill exceeds potential skill (Boer et al., 2019b), i.e., the model is more skilful at predicting the observations 14 than its own behaviour. In the large ensemble limit, this is possible only if the noise-to-predictable variance ratio γ Y is large 15 enough to offset the correlation skill of the ensemble mean prediction, and can occur when the forecast predictable variance 16 is much smaller than the observed variance. Such a behaviour is referred to as signal-to-noise paradox by Scaife and Smith 17 (2018).

18
Evaluations of hindcasts actual skill also include computations of the mean square skill score where MSE(Y, X) and MSE(R, X) are the mean square errors of, respectively, forecasts and reference predictions relative  predictability and small effect of initialization in equatorial regions (Fig. 1a,d) may be partly associated with the 3D ocean 16 initialization procedure that excludes the 1 • S-1 • N band (section 3), and to the fast wave processes on the equator. By contrast, 17 initialization has a strong impact on q e in vast extratropical regions (Fig. 1d). For multi-year averages, regions of lower q e 18 extend to the subtropics and to higher latitudes in the North Pacific, particularly for larger lead times (Fig. 1b, Some of the predictable variance contributes to skill, but some may reflect model biases and/or initialization errors. The  The negative skill in the WSPNA region is attributed to initialization (Fig. 2d)  from initialization. These locations are characterized by unrealistic strong negative trends consistent with those from ORAS5 7 reanalysis (Fig. 4). On inter-annual time scales, q e and q ei are generally smaller for SST than OHC300 (Figs. 1a-c), as SST is 8 more directly affected by atmospheric conditions. On multi-year time scales, q e is generally larger for SST than OHC300 ( Fig.   9 1d-f), as the simulated forced component becomes dominant, which strongly impacts SST trends (Fig. 4c).

10
SST forecasts show a reasonably widespread correlation skill ( Fig. 5a-c). A large fraction of SST correlation skill is at-11 tributable to the uninitialized external forcing, but significant contributions r i from initialization are seen in all ocean basins  is dominant (Fig. 10). The impact of initialization is also reduced, with typically q ei < 0.1 (Fig. 9e,f).  The noise-to-signal variance ratio of forecast and simulation ensembles have the form γ Y = σ 2 ye /σ 2 Ψ , Eq. (A12), and γ U = 26 σ 2 ue /σ 2 Φ , Eq. (A13), respectively. Figure 15 plots γ Y and γ U of annual mean and multi-year terrestrial precipitation for the 27 40-member ensembles. The global land averages of γ Y and γ U as a function of ensemble size stabilize for 10 members (not 28 shown), suggesting that the patterns of Fig. 15 are largely robust under changes in ensemble samples and size. In terms of q = 29 σ 2 Ψ /σ 2 Y , the ensemble size required for q > α is, according to Eq.
Therefore, all regions in Fig. 15a,b with, say, γ Y > 5 require m Y > 45 members to satisfy q > 0.9, i.e., over 45 members are 31 needed for the variance of the ensemble mean forecast to be at least 90% predictable. Most regions in Fig. 15a,b have γ Y > 5. 32 15b), which are both characterized by relatively strong precipitation signals. A similar analysis can be made for the simulations. the potentially predictable precipitation signal (Fig. 15a,b) and associated correlation skill due to initialization (Fig. 13a,d). initialization impacts the ppvf q = q i + q u in Eq. (5), and that large ensembles are required to extract the initialized predictable 17 variance from the ensemble mean forecast. The variance contribution to correlation skill will increase further, albeit minimally 18 and slowly, for ensemble sizes larger than 40, so there is a limit to the cost-effective increase of ensemble size to improve skill. 19 For Year 2-5 the behaviour is somewhat similar (Fig. 17a,b) although the variance contribution of the initialized (uninitialized) 20 component tends to be lower (higher).

21
Besides their variance contribution to skill, Y u and/or Y i must have realistic variations in phase for a skilful ensemble mean are more skilful than persistence for all ensemble sizes (Fig. 16c). By contrast, the median correlation skill over CSW Asia 28 surpasses persistence for m Y 20, but may require more than 40 members to do so confidently (Fig. 16d). It should be noted, 29 however, that forecast correlation skill is higher when averaged over winter and spring (DJFMAM), and surpasses that of from initialization (Fig. 18c,d). 12 The small impact of initialization on Sahelian rainfall forecasts is at odds with previous findings (Gaetani and Mohino, larger ensembles r XY > ρ, i.e., the ensemble mean hindcasts is more skilful at predicting the observed climate system than the 29 hindcasts themselves. This is a consequence of the noise-to-predictable variance fraction γ Y being too high, suggesting that  (Fig. 19b,e), and is much reduced for Year 6-9 (Fig. 19c,f)  particularly from CO 2 fertilization, with a moderate but significant contribution from initialization (Fig. 20d,e). The effects 9 of initialization become negligible for longer forecast ranges, except for a small sector of the Amazon rainforest (Fig. 20f).
Preliminary comparisons against observation-based products show realistic global mean GPP anomaly trends (not shown) and 11 interannual variations for the assimilation runs and Year 1 hindcasts (Fig. 21b). This is consistent with multi-model comparisons further analysis as it may impact climate predictability elsewhere.

10
The strong warming response of CanESM5 drives the potential predictability of near-surface air temperature over land, and 11 is largely responsible for the forecast correlation skill as examined in SB20. Initialization, however, reduces the strength of 12 the model response to external forcing leading to a lower forecast MSE than that of the simulations and persistence at all where we have used σ 2 y = σ 2 ye /m Y under the assumption of the y k 's independence, and similarly for u k . The noise variances 9 are estimated from Eqs. (A1)-(A4) as whereas the predictable variances are estimated from Eqs. (A3)-(A6) as The predictable and noise variances can be readily computed from the data by means of the total variance σ 2 Ye or σ 2 Ue , and the 16 variance of the ensemble mean σ 2 Y or σ 2 U . If we write explicitly the dependence of the anomaly forecast Y jk (τ ) and ensemble 17 mean Y j (τ ) on the forecast range τ , ensemble member k = 1 . . . m Y , and initial year j = 1 . . . n, then 20 and similarly for the simulations, where the over line indicates the average over the initial years.

A2 Correlation skill decomposition 22
Following SB20, the correlation skill of the ensemble mean forecast can be decomposed as where r XYu and r XYi are the correlation skills of the uninitialized and initialized components Y u and Y i themselves, while 1 r u and r i are the components contribution when scaled by the fractions of the variances involved. In terms of the noise-to-2 predictable variance ratios of forecasts and simulation, and available correlations and variances, these quantities can be computed explicitly as Year 2 OHC300 OHC300 OHC300 OHC300 OHC300 OHC300 Year 2 Year 2-5 Year 2-5 Year 6-9 Year 6-9  Year 2-5 Year 2-5 Year 6-9 Year 6-9 Year 2 Year 2 Year 2 Year 2-5 Year 2-5 Year 2-5 Year 6-9 Year 6-9 Year Year 2 Year 2 Year 2 Year 2-5 Year 2-5 Year 2-5 Year 6-9 Year 6-9 Year 6-9