Reply on RC1

Comment 1: I take the point that the oceanic DA with an elaborated EnKF scheme carries a lot of the interest to the authors. However, it seems that the combination NorESM+CMIP6 external forcing carries most of the skill in decadal predictions. As we know, this can be seen in other ESMs, too. And of course this usually is a good sign: if the combination ESM + external forcing delivers good results, so that assimilation does not have to repair too many "shortcomings". To provoke the authors, albeit in a friendly manner: should the prediction community rather invest in better models than in sophisticated assimilation?

The question raised is a hot issue, especially since recent findings suggest that external forcings have a stronger influence on extratropical atmospheric circulation variability than previously thought (e.g., Athanasiadis et al., 2020;Liguori et al., 2020;Drews et al., 2021;Klavans et al., 2021). We will add some short discussion on this to the revised manuscript based on the text below.
Improving the climate models utilized in the prediction systems is potentially very important for achieving more skilful near-term climate predictions (e.g., Athanasiadis et al., 2020). Increasing the sophistication of data assimilation has its limits with respect to mitigating effects from shortcomings in the dynamical behaviour of the models and errors in their response to external forcings. However, one can augment the state estimation with model parameters or external forcings and use advanced data assimilation to tune model parameters (Annan, 2005) and mitigate model bias, an approach that is currently being tested with our system. While skill gains from improving the models (Athanasiadis et al., 2020) may outweigh benefits from refining assimilation schemes, the combination is required to maximize the skill. Hence there is a need to both improve models and methods to assimilate observations. We have seen from weather and seasonal forecasting how improvements in both (as well as observations and computing power) have continued to lead to enhanced prediction skill (Bauer et al., 2015). Data assimilation methods have been mainly developed for numerical weather prediction and they need to be adapted for the climate system with multiscale interaction, an effort that will take some time. For NorCPM, the benefits from model and assimilation development are not necessarily independent, because improving the model will also improve our initialization capability as the ensemble covariance would improve (Counillon et al., 2021).
We argue that there is merit from the use of sophisticated assimilation and its further development. While the forced ocean-sea ice (FOSI) initialisation approach (that solely constrains surface fluxes) has been proven rather successful for multi-year climate prediction (Yeager et al., 2018), Polkova et al. (2019) found significant skill improvements from EnKF ocean assimilation in addition to constraining the atmosphere via nudging. The drift in prediction systems is another clear evidence that both model and assimilation improvements are needed. Progress has been made on reducing model biases but it is uncertain whether and when they can be completely eliminated (including conditional biases related to the forced trend). To date, the benefits from reducing initialisation shock and forecast drift or from having optimal spread in initial conditions are not well explored and elaborated data assimilation schemes are needed to appropriately deal with observational uncertainties and sparseness. How to best handle errors in the forced model trend during initialisation (e.g., Chikamoto et al., 2019) is a particular challenge that warrants further investigation.
The reviewer highlights the predictive potential due to external forcings. Using the historical all-forcing experiment of the Multi-Model Large Ensemble Archive (Deser et al., 2020;Klavans et al., 2021;Liguori et al., 2020) as an additional benchmark could help assess how improved response to external forcings may impact prediction skill. There is growing evidence, however, that current generation climate models systematically underestimate the influence of SST variations and external forcing variability on extratropical atmospheric variability, particularly related to the North Atlantic Oscillation (e.g., Scaife and Smith, 2018). As a consequence, the amplitude of the forced climate signal (either from surface boundary conditions or external forcings) is underestimated relative to the intrinsic climate variability. While post-processing methods relying on large ensembles have been proposed to mitigate this shortcoming (Smith et al. 2020), improving this aspect in the next model generation should be a key priority for the prediction community. To this end, we are investigating key processes in NorCPM-like atmospheric wave breaking and weather regime shifts in relation to boundary conditions and also ocean dynamics-that have been identified as essential for skilful near-term climate prediction (Athanasiadis et al., 2020). Rather than moving resources from data assimilation to ESM development, we plan to use data assimilation increasingly to inform the ESM development to improve processes and dynamics key to seasonal-to-decadal climate prediction. In this manuscript we focused on data assimilation, but in future work we will assess improvements related to the model being developed by the NorESM team. We recently upgraded NorCPM to use the latest NorESM2-MM (Seland et al., 2020) that has contributed to CMIP6 to a range of MIPs (but not DCPP), which has a higher atmospheric resolution, notably improved overall biases and an improved marine biogeochemistry representation relative to NorESM1-ME (albeit at a tenfold computational cost).
We thank the reviewer for the comment. In the revised manuscript, we will be more clear from the beginning (l.94 and other places) that the EnKF updates are not applied to the atmosphere and land states. We will include a short discussion on the implications of currently not having atmospheric assimilation and future plans regarding adding it.
Utilizing atmospheric observations and better constraining the atmospheric circulation variability has potential to improve the ocean and sea ice initialisation by producing surface fluxes that are more consistent with the SST and SIC anomalies during the assimilation phase. Constraining the atmospheric circulation will also improve atmosphere and land initialisation, which would be beneficial for seasonal prediction. Not utilizing atmospheric observations, the ensemble spread of our prediction system is likely larger than the theoretical uncertainty of the ocean-sea ice initial state given all available observations. This is particularly true during the propagation phases-i.e., between the monthly assimilation updates-when the unsynchronised atmospheric forcing variability amplifies growth in the ensemble spread. While the Bjerknes feedback at least partly synchronises tropical atmospheric variability, the extratropical atmospheric variability of the individual simulation members remains largely unconstrained in NorCPM. As a result, the EnKF assimilation of the ocean and sea ice state has to work against the simulated intrinsic atmospheric variability that is not in phase with the observations. The success of the FOSI approach (Yeager et al., 2018) demonstrates the potential in solely constraining surface fluxes over ocean and sea ice for initializing multi-year climate predictions, indicating that synchronising the ocean circulation through surface fluxes of heat, freshwater and momentum can largely compensate for not utilizing subsurface observations. We expect that the combination of constraining atmospheric variability and performing ocean-sea ice assimilation will provide the best result for climate prediction, as demonstrated by Polkova et al. (2019).
Assimilation of atmospheric observations into NorCPM is work in progress. One challenge is to avoid a collapse of spread in the surface ocean that is needed for determining the ensemble covariance in EnKF assimilation. Another challenge is that atmospheric updates have to be performed at daily or higher frequency. NorCPM's monthly EnKF assimilation updates are currently performed offline (the model integrates for one month, writes restart conditions that are updated by the EnKF, reads the updated conditions and integrates the next month). An offline approach with high-frequent updates (e.g., Karspeck et al., 2018) would result in a computational overhead that we consider unacceptable and to perform EnKF-based atmospheric data assimilation we would need to move the assimilation step online (Zang et al. , 2007;Nerger et al., 2020). As a readily available alternative, we are exploring atmospheric nudging in combination with EnKFbased ocean-sea ice assimilation, a strategy that has been successfully tested in the MPI MiKlip system (Polkova et al., 2019). We will take advantage of the availability of multiple simulation members of the reanalysis products like ERA5 (Hersbach et al., 2020) and CERA (Laloyaux et al., 2018) and nudge the members of the NorCPM analysis to individual members of the reanalysis products. This will provide a representation of atmospheric observational uncertainties and help generate ensemble spread in the ocean state. We are aiming at complementing this approach with the leading average cross covariance technique (Lu et al., 2015) that can perform a one-way strongly coupled data assimilation (from atmosphere to ocean) and has been shown to allow improved ocean initialization taking advantage of the abundant atmospheric observation data.

Comment 3: The authors include a lot of figures, and I like this very much.
However, the quality of the figure annotations (labels) is sometimes rather poor. I would like to ask the authors to re-assess the annotations, this would greatly help the reader to quickly connect with the figures.
Thanks for the suggestion. We will improve the quality of the figure annotations for the revised manuscript version (see examples in the Supplement to the reply). Also, we will move the Appendix C (baseline evaluation) and possibly Appendix D to a separate Supplementary Information document. This will reduce the number of figures in the main manuscript and allow us to reduce the compression of the figures.
Comment 3a: figures with maps are at the limit in terms of crowded information, but that is still okay.
If still acceptable in terms of crowded information then we prefer to keep the panel layout of the figures as is. The multi-panel layout allows the reader to visually compare the results for different lead years, prediction benchmarks and fields of interest.
Where possible, we tried to use a similar style for the figures with maps, mostly adopted from Yeager et al. (2018). The reader may need to spend some time to understand the first of such figures, but it should require less time to understand successive figures of the same style.
Comment 3b: although the maps themselves are in hires, their annotations sometimes look very lowres, e.g. as in Fig. 3 The poor label quality was a result of an unfortunate choice of font type, file format conversion and compression. As mentioned in the reply to 3, we will address the quality issue in the revised manuscript version (see example for new Figure 3 in the Supplement to the reply).