The Norwegian Earth System Model, NorESM2 - Evaluation of the CMIP6 DECK and historical simulations

. The second version of the fully coupled Norwegian Earth System Model (NorESM2) is presented and evaluated. NorESM2 is based on the second version of the Community Earth System Model (CESM2), but has entirely different ocean and ocean biogeochemistry models; a new module for aerosols in the atmosphere model along with aerosol-radiation-cloud interactions and changes related to the moist energy formulation, deep convection scheme and angular momentum conservation; modiﬁed albedo and air-sea turbulent ﬂux calculations; and minor changes to land and sea ice models. We show results 5 from low ( ∼ 2 ◦ ) and medium ( ∼ 1 ◦ ) atmosphere-land resolution versions of NorESM2 that have both been used to carry out simulations for the sixth phase of the Coupled Model Intercomparison Project (CMIP6). The stability of the pre-industrial climate and the sensitivity of the model to abrupt and gradual quadrupling of CO 2 is assessed, along with the ability of the model to simulate the historical climate under the CMIP6 forcings. As compared to observations and reanalyses, NorESM2 represents an improvement over previous versions of NorESM in most aspects. NorESM2 is less sensitive to greenhouse gas 10 forcing than its predecessors, with an equilibrium climate sensitivity of 2.5 K in both resolutions on a 150 year frame. We also consider the model response to future scenarios as deﬁned by selected shared socioeconomic pathways (SSPs) from the Scenario Model Intercomparison Project deﬁned under CMIP6. Under the four scenarios SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5, the warming in the period 2090–2099 compared to 1850–1879 reaches 1.3, 2.2, 3.0, and 3.9 K in NorESM2-LM, and 1.3, 2.1, 3.1, and 3.9 K in NorESM–MM, robustly similar in both resolutions. NorESM2-LM shows a rather satisfactorily 15 evolution of recent sea ice area. In NorESM2-LM an ice free Arctic Ocean is only avoided in the SSP1-2.6 scenario. near-surface temperature, the Atlantic Meridional Overturning Circulation (AMOC), the volume transport through the Drake Passage, and on sea ice area.


Introduction
The Norwegian Earth System Model version 2 (NorESM2) is the second generation of the coupled Earth System Model developed by the Norwegian Climate Center (NCC), and is the successor of NorESM1 Iversen et al., 2013;Kirkevåg et al., 2013;Tjiputra et al., 2013) which has been used in the 5th phase of the Coupled Model Intercomparison Project 20 (CMIP5; Taylor et al., 2012), and for evaluation of the difference between a 1.5 and 2 • C warmer world than pre-industrial (Graff et al., 2019). NorESM2 is based on the Community Earth System Model CESM2.1 . Although large parts of NorESM are similar to CESM, there are several important differences: NorESM uses the isopycnic coordinate Bergen Layered Ocean Model (BLOM; Bentsen et al., in prep.), uses a different aerosol module OsloAero6 (Kirkevåg et al., 2018;Olivié et al., in prep.), contains specific modifications and tunings of the atmosphere component (Toniazzo et al., 2019; 25 Toniazzo et al., in prep.), and contains the iHAMOCC model to describe ocean biogeochemistry (Tjiputra et al., 2019).
Many changes have contributed to the development of NorESM1 into NorESM2. The model has benefited from the evolution of the parent model CCSM4.0 into CESM2.1, comprising the change of the atmosphere component from CAM4 to CAM6, the land component from CLM4 to CLM5, and the sea ice component from CICE4 to CICE5. Also, specific developments have been implemented in the description of aerosols and their coupling to clouds and radiation (Kirkevåg et al., 2018), in addition to aerosols is proportional to that of fine sea-salt aerosols (Kirkevåg et al., 2018), this specific change also has an impact on the natural oceanic organic matter emissions.
The aerosol nucleation formulation described by Kirkevåg et al. (2018) has been updated by allowing all pre-existing particles to act as coagulation sinks for freshly nucleated particles (Sporre et al., 2019) to give a more realistic rate of survival for these 2 nm nucleation particles into the smallest explicitly modeled mode/mixture of co-nucleated sulfate and secondary 130 organic aerosols. This reduces the number concentrations of fine-mode particles, while increasing their size, which in effect yields increased cloud condensation nuclei and cloud droplet concentrations.
In NorESM2, oceanic dimethyl sulfide (DMS) emission is prognostically simulated by the ocean biogeochemistry component (Sect. 2.4), hence allowing for a direct biogeochemical climate feedback in coupled simulations. The DMS air-sea flux is simulated as a function of upper-ocean biological production following the formulation of Six and Maier-Reimer (1996) and 135 was first tested in the NorESM model framework by Schwinger et al. (2017).
While hygroscopic swelling of aerosols in earlier versions always used the grid averaged relative humidity as input to lookup tables which take into account the effects of hygroscopic growth on water uptake and optical properties, in CAM6-Nor we instead use the mean cloud-free relative humidity, as in the host model CAM6 and a number of other atmospheric models (Textor et al., 2006;Kirkevåg et al., 2018;Gliss et al., in prep.). 140 The other differences of CAM6-Nor relative to CAM6 are summarised as follows. A correction to the zonal wind increments due to the Lin and Rood (1997) dynamical core is introduced in order to achieve global conservation of atmospheric angular momentum along the Earth's axis of rotation, as described and discussed in (Toniazzo et al., 2019). The local energy update of the model is also modified by including a missing term (the hydrostatic pressure work) related with changes in atmospheric water vapour and thus achieve better local energy conservation. Finally, a set of modifications to the deep convection scheme 145 is introduced which eliminate most of the resolution dependence of the scheme, and mitigate the cold tropospheric bias of CAM6. The energy and convection changes (which are not available in the CAM6 code repository) are described in Toniazzo et al. (in prep.). 5 https://doi.org/10.5194/gmd-2019-378 Preprint. Discussion started: 10 February 2020 c Author(s) 2020. CC BY 4.0 License.

Ocean model
The ocean component BLOM is based on the version of MICOM used in NorESM1 and shares the use of near-isopycnic interior 150 layers and variable density layers in the surface well-mixed boundary layer. The dynamical core is also very similar but with notable differences in physical parameterisations and coupling. For vertical shear-induced mixing a second-order turbulence closure (Umlauf and Burchard, 2005;Ilicak et al., 2008) using a one equation closure within the family of k − ε models has replaced a parameterisation using the local gradient Richardson number according to Large et al. (1994). Parameterised eddyinduced transport is modified to more closely follow the Gent and McWilliams (1990) parameterisation with the main impact 155 of increased upper ocean stratification and reduced mixed layer depths. As for NorESM1-MICOM, the estimation of diffusivity for eddy-induced transport and isopycnic eddy diffusion of tracers is based on the Eden et al. (2009) implementation of Eden and Greatbatch (2008) with their diagnostic equation for the eddy length scale, but modified to give a spatially smoother and generally reduced diffusivity. Hourly exchange of state and flux variables with other components is now used compared to daily ocean coupling in NorESM1. The sub-diurnal coupling allows for the parameterisation of additional upper ocean mixing 160 processes. Representation of mixed layer processes is modified to work well with the higher frequency coupling and in general to mitigate a deep mixed layer bias found in NorESM1 simulations. The penetration profile of shortwave radiation is modified, leading to a shallower absorption in NorESM2 compared to NorESM1. With respect to coupling to the sea ice model, BLOM and CICE now use a consistent salinity dependent seawater freezing temperature (Assur, 1958). Selective damping of external inertia-gravity waves in shallow regions is enabled to mitigate an issue with unphysical oceanic variability in high latitude 165 shelf regions, causing excessive sea ice formation due to breakup and ridging in CMIP5 versions of NorESM1.
For the CMIP6 contribution, BLOM uses identical parameters and configuration in coupled ocean-sea ice OMIP (Ocean Model Intercomparison Project; Griffies et al., 2016) experiments and fully coupled NorESM2-LM and NorESM2-MM experiments, except for sea surface salinity restoring in OMIP experiments. As for NorESM1, 53 model layers are used with two non-isopycnic surface layers and the same layer reference potential densities for the layers below. A tripolar grid is used 170 instead of the bipolar grid in CMIP5 versions of NorESM1, allowing for approximately a doubling of the model time step. At the equator the grid resolution is 1 • zonally and 1/4 • meridionally, gradually approaching more isotropic grid cells at higher latitudes. The model bathymetry is found by averaging the S2004 (Marks and Smith, 2006) data points contained in each model grid cell with additional editing of sills and passages to their actual depths. The metric scale factors are edited to the realistic width of the Strait of Gibraltar so that strong velocity shears can be formed, enabling realistic mixing of Mediterranean water 175 entering the Atlantic Ocean. OMIP provides protocols for two different forcing datasets, OMIP1 (Large and Yeager, 2009) and OMIP2 (Tsujino et al., 2018). Tsujino et al. (2019) is a model intercomparison evaluating OMIP1 and OMIP2 experiments, including BLOM/CICE of NorESM2. Further details on the BLOM model and its performance in OMIP coupled ocean-sea ice simulations can be found in Bentsen et al. (in prep.). 180 6 https://doi.org/10.5194/gmd-2019-378 Preprint. Discussion started: 10 February 2020 c Author(s) 2020. CC BY 4.0 License.

Ocean biogeochemistry
The ocean biogeochemistry component iHAMOCC (isopycnic coordinate HAMburg Ocean Carbon Cycle model) is an updated version of the ocean biogeochemistry module used in NorESM1. The model includes prognostic inorganic carbon chemistry following Dickson et al. (2007). A Nutrient Phytoplankton Zooplankton Detritus (NPZD) type ecosystem model (Six and Maier-Reimer, 1996) represents the lower trophic biological productivity in the upper ocean. The updated version includes 185 riverine inputs of biogeochemical constituents to the coastal ocean. Atmospheric nitrogen deposition is prescribed according to the data provided by CMIP6. The parameterisations of the particulate organic carbon sinking scheme, dissolved iron sources and sinks, nitrogen fixation, and other nutrient cycling have been updated as well. NorESM2 also simulates preformed and natural inorganic carbon tracers, which can be used to facilitate a more detailed diagnostic of interior ocean biogeochemical dynamics. Details on the updates and improvements of the ocean biogeochemical component of NorESM2 are provided in

Sea ice
The sea ice model component is based upon version 5.1.2 of the CICE sea ice model of Hunke et al. (2015). A NorESM2specific change is including the effect of wind drift of snow into ocean following Lecomte et al. (2013), as described in Bentsen et al. (in prep).

195
The CICE model uses a prognostic ice thickness distribution (ITD) with five thickness categories. The standard CICE elastic-viscous-plastic (EVP) rheology is used for ice dynamics . The model uses mushy-layer thermodynamics with prognostic sea ice salinity from Turner and Hunke (2015). Radiation is calculated using the Delta-Eddington scheme of Briegleb and Light (2007), with melt ponds modeled on level, undeformed ice, as in Hunke et al. (2013). CICE uses the same horizontal grid as the ocean model (Sect. 2.3), and is configured with 8 layers of ice and 3 of snow.

Land
The NorESM2 land model is CLM5 (Lawrence et al., 2019) with one minor modification described below. A general description of the model will therefore not be presented here. It should however be noted that CLM5 has a new treatment of nitrogen-carbon limitation, which is very important for the carbon cycle in NorESM2 and has increased the land carbon uptake substantially relative to NorESM1 (Arora et al., 2019).

Coupler
The state and flux exchanges between model components and software infrastructure for configuring, building and execution of model experiments is handled by the CESM2 coupler Common Infrastructure for Modeling the Earth (CIME; . The coupler computes the turbulent air-sea fluxes of heat and momentum and in NorESM2 this is implemented as 215 a version of the COARE-3 (Fairall et al., 2003) scheme, replacing the calculation based on Large and Yeager (2004) in CESM2.
State and flux exchanges via the coupler between atmosphere, land and sea ice components occur half-hourly, aligned with the atmosphere time step, while the ocean exchanges with the coupler every hour. CIME also provides common utility functions and among these are estimation of solar zenith angle. In NorESM2, this utility function is modified with associated changes in atmosphere, land and sea ice components, ensuring that all albedo calculations use zenith angle averaged over the components 220 time-step instead of instantaneous angles.

NorESM2 initialisation and tuning
Most of the general development of the model as described in Sect. 2 was tested in stand-alone versions of the different model components, CAM6-Nor in present-day AMIP-mode under year 2000 conditions and BLOM and iHAMOCC forced by a dataatmosphere. The main targets of these separate experiments were to test improved representations of the physical processes 225 in the simulations, to mitigate model systematic biases when compared to the observed climate, and to reduce the residual radiative imbalance at the top of the model atmosphere (hereafter RESTOM) given prescribed SSTs from observations. The first coupled version of NorESM2 included all changes described in Sect. 2. This version was heavily tested in a preindustrial setting (as defined in Sect. 4).
This initial version of the coupled model was initialized using a hybrid of observational estimates and earlier model simu-230 lations. The ocean model was initialised with zero velocities and temperature and salinity fields from the Polar science center Hydrographic Climatology (PHC) 3.0 (updated from Steele et al., 2001). Following the OMIP protocol (Orr et al., 2017), the nutrients (phosphate, nitrate, and silicate) and oxygen fields in NorESM2 were initialized with the gridded climatological fields of the World Ocean Atlas database (Garcia et al., 2014a, b). For dissolved inorganic carbon and total alkalinity, we used the pre-industrial and climatological values from the Global Ocean Data Analysis Project (GLODAPv2) database (Lauvset et al.,235 2016). Other biogeochemical tracers are initialized using values close to zero. CAM and CLM were initialized using the files included in the CESM2 release. Aerosols and aerosol precursors were initialised to near zero values. As there were no lowresolution pre-industrial initial files for the land model available this was replaced by an interpolation of the 1 • initial file from CESM2. At a later stage in the coupled spin-up, the land surface fields were re-initialised from a long (approximately 1400 years) stand-alone CLM spin-up simulation driven by a repeat 50-yrs climatology fields of the earlier coupled run.

240
While preparing the coupled model for the spin-up, it was found that the sensitivities of important climatological variables, including RESTOM, to changes in parameterisations were often different in the coupled configuration compared to standalone simulations with the individual components using prescribed boundary conditions. The coupled response could be both amplified or damped with respect to single-component simulations. As a result, tuning test simulations had to be performed in coupled mode and the model had to be restarted from the initial state several times. Similar to CESM, NorESM2 adjusted towards its climatology with an initial phase of strong cooling in the high latitudes of the northern hemisphere, after which an intensification of ocean heat advection stabilised the simulation. After that point, the climatology tended to settle to a steady drift. In order to save computer resources, minor tuning, especially toward balanced RESTOM, was performed during this second stage of the spin-up phase of the model. Alongside the final tuning, the CESM components were updated to the versions found in CESM2.1.

250
The main goal of the tuning process was to create a reasonably stable pre-industrial control simulation. The simulation can produce a steady climatology only if the time-average radiative imbalance on the top of the model (RESTOM) vanishes.
In practice, a commonly used target is for RESTOM to be within ±0.1 W m −2 . Secondary tuning targets are to obtain and maintain values of mean atmospheric and ocean temperatures close to observations. As the ocean heat again reflects the top of the atmosphere imbalance, the two requirements are strongly connected. One additional constraint was that the tuning should 255 not significantly degrade other important climatological variables such as temperature, precipitation, cloud, and the main mode of coupled variability, i.e. the El-Niño Southern Oscillation (ENSO). Each tuning step was performed in isolation, and an effort was made to ensure the greatest possible similarities in the two model configurations LM and MM. No tuning was performed that attempted to target other modes of variability beside ENSO, or a particular climate response to external forcings, e.g. from changes in greenhouse gas concentration, anthropogenic aerosol emissions, or volcanic or solar forcing.

260
As found in CESM2 , also NorESM2 had development of excessive sea ice cover in the Labrador Sea (LS) region, although the temporal development in NorESM2 differed from CESM2. For any tested combination of parameter choices, NorESM2 developed excessive LS sea ice cover starting around year 60 after model initialisation. This was however only a temporary model state and in all experiments the sea ice returned close to observed state in the LS region after additional 60-80 model years of simulation.

265
One of the most common methods to tune RESTOM is to change the amount and thickness of low clouds. The main parameter used for tuning the low clouds in the CLUBB scheme is the "gamma" parameter, which controls the skewness of the assumed Gaussian PDF for subgrid velocities. A low gamma implies weaker entrainment at the top of the clouds, in particular for marine stratocumulus. This increases the amount of low clouds and results in a higher short-wave cloud forcing.
Given the same gamma values the RESTOM was higher in the low resolution version of the model. In addition the sensitivity 270 to the change of the gamma parameter was different in the two model resolutions, so a different choice of gamma was needed for the two resolutions. The final parameter values are well within the gamma range of 0.1-0.5 tested by Zhang et al. (2018).
The resulting bias in short-wave cloud forcing (SWCF) was somewhat off-set by regulating the parameter dcs (autoconversion size threshold for cloud ice to snow) in NorESM2-LM but this had only a small impact on the tropospheric temperature bias.
Changing dcs in NorESM2-MM did not improve the overall skill of this model version compared to the initial value so was 275 not used for this versions While the amount of change in SWCF could be estimated by running the atmosphere and land model in a stand-alone configuration, the change in RESTOM in coupled set-up was small compared to the change in cloud forcing. Further attempts at reducing positive RESTOM by tuning the boundary layer stability were neutralised by SST adjustment, while worsening the tropospheric cold bias. A more effective tuning of low cloud radiative effects was achieved by modifying air-sea fluxes of sea salt and DMS detail here. As described in Sect. 2.2 the disadvantage of increasing the sea-salt flux, however, is that this resulted in too dominant marine aerosols with respect to optical thickness and surface mass concentrations. RESTOM was decisively reduced by increasing outgoing long-wave radiation. This was achieved in three ways. First, alterations were made to the Zhang and McFarlane (1995) convection scheme, as described in Toniazzo et al. (in prep.), aimed at increasing midand high-altitude latent heating of the atmosphere. Second, higher sea-surface temperatures were achieved by reverting to the 285 NorESM1 level of ocean background vertical mixing after having used up to 50% higher diffusivity for the purpose of reducing upper ocean biases. Third, positive cloud radiative forcing in the terrestrial radiation spectrum was reduced by intervening on the parameterisation of ice cloud fraction.
Several versions of the ice cloud fraction parameterisation are provided (as namelist options) in CESM. Initial tuning of the parameters of the CESM2 default option appeared promising, but coupled adjustment again tended to neutralise the effect on 290 model radiative imbalance. An effective reduction in the high-and mid-level cloud cover could only be achieved by switching parameterisation in NorESM2-LM, such that there is no direct functional dependence of ice cloud fraction on environmental relative humidity (this is option number 4 in CESM). By contrast, the CESM default scheme (option number 5, with explicit RH dependence) could be tuned sufficiently in NorESM2-MM, by including a minor modification that narrows the range of cloud sensitivity to environmental RH (and thus provides a continuous switch between the two parameterisations). This purely 295 empirical part of the cloud parametrisation of CESM2 is very poorly constrained by observations, and its future development might be better rooted in physical processes.
Compared to Schwinger et al. (2017), NorESM2 has doubled the diatom-mediated DMS production parameter in order to maintain the observed high DMS concentration at high latitudes. This tuning is necessary due to the lower biological production simulated in NorESM2 (relative to NorESM1), which is a better representation to the observations, during spring bloom in both 300 hemispheres (Tjiputra et al., 2019).

Control simulations and model response to forcing
This section presents a basic description of the climatology simulated in CMIP6 experiments with the two versions of the model, NorESM2-LM and NorESM2-MM (Sect. 2.1). We consider the time evolution of temperature in historical and enhanced greenhouse gas climate scenarios, along with aspects of the ocean circulation and sea ice. We validate the historical coupled 305 simulations against observational estimates and reanalyses, and compare them with results from simulations with previous versions of NorESM (Sect. 5).
We consider three sets of experiments that are important for documentation and application of CMIP6 models: the DECK experiments, the CMIP6 Historical experiment, and the Tier 1 experiments of the ScenarioMIP. A brief description of the set-up of these experiments is given in Sect. 4.1.

310
The analysis is divided into three parts. Section 4.2 focuses on the stability of the pre-industrial control simulation. In Sect. 4.3, we consider the simulated climate sensitivity to abrupt and gradual quadrupling of CO 2 . A brief analysis of the warming, sea ice, AMOC, and the transport through the Drake Passage in the historical simulations and the scenarios is given in Sect. 4.4. stantaneously quadrupled at the start of the run (abrupt-4xCO 2 ); (4) the experiment corresponding to the piControl, but where the CO 2 concentrations are gradually increased by 1% per year (1pctCO 2 ). Both abrupt-4xCO2 and 1pctCO2 were started from year 1 of the control.

Experiment set-up
The DECK was run with both versions of the model (NorESM2-LM and NorESM2-MM) and we here consider results from the pre-industrial control and the abrupt-4xCO 2 and 1pctCO 2 (Sect. 4.2-4.3). As this paper focuses on the coupled aspect of 325 NorESM2, the AMIP runs are not included here, but are described in Olivié et al. (in prep.).
Another experiment required for CMIP6 and important for model evaluation is the historical experiment which is run with forcings from the so-called historical period, defined as 1850-2014. For the low-resolution version of the model (NorESM2-LM), we have carried out a small ensemble consisting of 3 members. The first ensemble member was initialised using initial conditions from the first year of the control experiment, while members number 2 and 3 are initialised from years 32 and 62 330 respectively. For NorESM2-MM, only a single ensemble member had been carried out when this paper was written. Consistent with historical member 1 from NorESM2-LM, the NorESM2-MM historical experiment was started from identical initial conditions to the NorESM2-MM control simulation.
One of the most important applications for Earth system models is to provide estimates for future climate development. This is typically done using scenarios where critical input for climate models through description and quantification of both land-  Files for stratospheric aerosols and emissions of aerosols and aerosol precursors were created based on the input found at the input4mips website: https://esgf-node.llnl.gov/projects/input4mips/. In addition, sulphur from tropospheric volcanoes was included similarly to Kirkevåg et al. (2018), see Sect 2.2.

Stability of the control climate
After the tuning period and the spin-up, both NorESM2-LM and NorESM2-MM were integrated for 500 years with steady  As can be seen in the figure the drift is generally small and comparable for the two model versions. The top-of-the-atmosphere 360 radiative imbalance is -0.057 W m −2 for NorESM2-LM and -0.065 W m −2 for NorESM2-MM. The ocean volume temperature change of 0.03 K over 500 years is much smaller than the rate of warming observed during the last 50 years. Similarly, there are positive trends in global mean ocean salinity of 2.6 × 10 −5 g kg −1 and 4.7 × 10 −5 g kg −1 over 500 years for NorESM2-LM and NorESM2-MM, respectively, that we consider small since for NorESM2-MM this is equivalent to an average surface freshwater loss of 2.9 × 10 −5 mm day −1 . The remaining trends are not significantly different from 0 on a 5% level t-test. We 365 found however a slight decrease in DMS sea-to-air flux of 2% over the 500 year control period, reflecting a residual drift in ocean bio-geochemistry. AMOC variations are reasonably small and show no significant trend.

Equilibrium climate sensitivity and transient response
The two enhanced greenhouse gas experiments of the CMIP-DECK aim to facilitate a comparison of climate change in response to a standardized specified forcing across different models. The corresponding NorESM2 simulations were started at the same 370 nominal model year and with the same initial conditions as piControl. They are referred to as abrupt4×CO 2 and 1pctCO 2 . Figure 3 shows the time evolution of near-surface temperature for abrupt4×CO 2 , 1pctCO 2 and piControl for both model configurations. Three commonly used metrics for the response to CO 2 forcing, based on the evolution of the simulated globalmean temperature, are the Equilibrium Climate Sensitivity (ECS), the Transient Climate Response (TCR), and the Transient Climate Response to cumulative CO 2 Emissions (TCRE). Their values are given in table 1 for the NorESM2 experiments, and 375 compared to those for NorESM1. The ECS is defined as the change in global near-surface temperature when a new climate equilibrium is obtained with an atmospheric CO 2 concentration that is doubled compared to the pre-industrial amount. In order to reach a new equilibrium, a model simulation of several thousand years is required (Boer and Yu, 2003). There are some examples in the literature of models for which this has been done, but in general ECS is more commonly estimated from the relationship between surface temperature and RESTOM from the abrupt4×CO 2 experiment using the so-called Gregory 380 method (Gregory et al., 2004). The numbers in table 1 are calculated using years 1-150 from the simulations shown in Fig. 3, and are divided by 2 to get the number for CO 2 doubling instead of quadrupling. The ECS is 2.54 K for NorESM2-LM, which is slightly lower than the equivalent value for NorESM1 of 2.8 K. Both are significantly lower than the CMIP5 mean value of 3.2 K but well inside the bounds of the likely range of 1.5-4.5 K (Stocker et al., 2013). On the other hand, the ECS in NorESM2 is markedly smaller than the ECS found in CESM2 of 5.3 K by Gettelman et al. (2019a), despite sharing many of the same 385 component models. The ECS in NorESM2 is discussed in more detail in Gjermundsen et al. (in prep.). There are indications that the different behaviour of the BLOM ocean model (compared to the POP ocean model used in CESM2), contributes to a delayed warming in the first 150 years of abrupt-4xCO2 in NorESM2. Using the Gregory et al. (2004) method on that period leads to an ECS estimate which is considerably lower than for CESM2. However, after the initial slow warming in the abrupt-4xCO2 experiment, NorESM2 shows a sustained warming similar to CESM2, when the abrupt-4xCO2 experiment is extended 390 to 500 years or longer. This suggests that the actual ECS (the value one finds when the model is run for thousands of years until equilibrium) in NorESM2 and CESM2 is not very different, but that the Gregory et al. (2004) method based on the first 150 years only does not give a good estimate of ECS for models.
The TCR is defined as the global-mean surface temperature change at the time of CO 2 doubling, and accordingly it was calculated from the temperature difference between the 1pctCO 2 experiment averaged over years 60-80 after initialisation and 395 piControl. The TCR is 1.48 K and 1.33 K for NorESM2-LM and NorESM2-MM, respectively. As for ECS, these values fall in the lower part of the distribution obtained from the CMIP5 ensemble (Forster et al., 2013), similar to those obtained for NorESM1. A recent observational estimate for the 90 % likelihood range of TCR is 1.2-2.4 K (Schurer et al., 2018).
We also give an estimate of the transient climate response to cumulative carbon emissions (TCRE) calculated from TCR and the corresponding diagnosed carbon emissions. Following Gillett et al. (2013), TCRE is defined as the ratio of TCR 400 to accumulated CO 2 emissions in units of K EgC −1 . As CO 2 fluxes were not calculated in NorESM1-M and NorESM1-Happi, the NorESM1 values are obtained from the carbon cycle version of NorESM1 (NorESM1-ME; Tjiputra et al., 2013). TCRE is reduced from 1.93 K EgC −1 in NorESM1-ME to 1.36 K EgC −1 and 1.21 K EgC −1 in NorESM2-LM and MM, respectively. Since TCR is comparable, the main difference is due to changes in carbon uptake. NorESM1, with CLM4 as the land component, had a very strong nitrogen limitation on land carbon uptake. This limitation is weaker in CLM5 (Arora et al.,405 2019) used in NorESM2.

Climate evolution in historical and scenario experiments
In this section we provide a very brief analysis of the response of the model to historical forcings in the three historical members carried out with NorESM2-LM and the one realisation carried out with NorESM2-MM. We also consider the model response for the Tier 1 experiments from ScenarioMIP (SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5). The focus here will 410 be on the response in global-mean near-surface temperature, the Atlantic Meridional Overturning Circulation (AMOC), the volume transport through the Drake Passage, and on sea ice area.   the warming at the end of the 21st century is rather similar in both versions of NorESM2. For SSP1-2.6, the temperature stabilizes in the second half of the 21st century. In NorESM1, under the scenarios RCP2.6, RCP4.5, and RCP8.5, the surface air temperature in the period 2071-2100 was 0.94, 1.65, and 3.07 K higher than in 1976-2005. For the same periods and looking at SSP1-2.6, SSP2-4.5, and SSP5-8.5, we find rather similar (but slightly stronger) warmings of 1.06, 1.81, and 3.22 K in NorESM2-LM, and 1.11, 1.83, and 3.26 K in NorESM2-MM.

435
The simulated Atlantic Meridional Overturning Circulation (AMOC) at 26.5 • N shows a multi-centennial variability that is 15% of the mean in the control simulation (Fig. 2). In the historical simulations the AMOC peaks for both MM and LM in the 1990's at around 24 Sv before starting a rapid decline at around year 2000 (Fig. 6). In both versions the AMOC reaches a quasi-equilibrium by the end of the century at around 15-10 Sv depending on the scenario. Since we only have a few ensemble members, it remains unclear how fast the AMOC declines in response to the greenhouse gas forcing and which part of e.g. the 440 initial decline is due to the multi-decadal variability. In any case, it is noteworthy that the initial AMOC decline begins already during the historical period in both versions, which is also consistent with the NorESM2 and multimodel mean response to the OMIP2 forcing (1958( , Tsujino et al., 2019. In addition to the AMOC, also the Antarctic Circumpolar Current (ACC) strength, as measured in the Drake Passage, shows multi-centennial variability that is about 3% of the mean (Fig. 7). Similar variability in the ACC has been linked to convection 445 within the Weddell and Ross seas in the CMIP5 ensemble (Behrens et al., 2016). Also in our simulations the Weddel Sea convection has similar long term variability as the ACC. Unlike the AMOC, there is no clear trend emerging from the scenario simulations, but rather the multidecadal variability continues throughout the 21st century. Again, a larger number of ensemble members could help identify the forced signal. Both models have a reasonable March sea ice area compared to observations. However, the negative trends in winter sea ice 455 area are small compared to observed trends.
During the scenario period both models show a strong reduction in summer sea ice area. The Arctic Ocean is often considered ice free when the total sea ice area drops below 1 million square km. This threshold is denoted by dotted gray lines in Fig. 8.
NorESM2-LM loses summer ice shortly after year 2050. This occurs first in the SSP5-8.5 scenario, but also the SSP2-4.5 ensemble shows values close to this threshold even before 2050. SSP3-7.0 scenarios become ice free at around 2070. Any 460 prediction of which year the Arctic Ocean first becomes ice free must therefore be considered rather uncertain due to forcing evolution uncertainty and internal variability. This is consistent with the overall assessment of sea ice evolution in CMIP6 assessed by the SIMIP Community (Notz et al., 2020). In NorESM2-LM an ice free Arctic Ocean is only avoided in the SSP1-2.6 scenario. NorESM2-MM loses ice slower and shows the first ice free summer around 2070. In that model, also the SSP2-4.5 scenario keeps the ice area above 1 mill. square km all years before 2100. However, the SSP1-2.6 scenario stabilizes 465 at a sea ice area comparable with present day observations, even with SSP1-2.6 warming levels present. Therefore, the sea ice area simulated by NorESM2-MM for the future Arctic seems to be unrealistically high.
5 Climatological mean state and circulation patterns compared to observations and NorESM1

Ocean state
In the surface ocean, the large-scale climatological biases are similar in the two NorESM2 versions (Fig. 9), but overall the 470 MM version is closer to the observations (smaller global-mean root-mean-square error; RMSE; √ A 2 in Fig. 9). In general the Southern Ocean is too warm (Fig. 9b-c), the Atlantic (and the Arctic) are too saline, but the Pacific is too fresh (Fig. 9e-f). The sea level is lower than observed in the Atlantic basin, but higher in the Indo-Pacific basin and thus the gradient between the two basins is larger than in the observations (Fig. 9h-i). If we remove the global-mean biases, the two versions produce even more similar mean errors, suggesting that some of the regional biases are largely independent of the atmosphere and land resolution.

475
Indeed, the regional patterns are common to many other models with coarse resolution ocean components . The fresh bias in the Southern Pacific is linked to the co-located positive net precipitation bias (Fig. 9) and extends throughout the surface mixed layer (Fig. 10). The salinity bias also causes a negative density bias (not shown) as it is not fully compensated by temperature, supporting an atmospheric origin. A comparison with the OMIP1 and OMIP2 simulations shows that the net 490 precipitation bias in the LM simulation, 250 mm year −1 in the mean over the region where the salinity bias is larger than 1 g kg −1 , would be large enough to cause the simulated salinity bias (assuming mixed layer depth of 100 m and a residence time of 10 years). Therefore, we suggest that the net precipitation bias leads to accumulation of excess freshwater that is spread throughout the subtropical gyre by the ocean circulation.
Most of the large-scale surface biases are also visible in the subsurface (Figs. 10-11). The upper ocean is too warm and Overall, many of such sub-surface ocean biases are similar in the ocean-only simulations and may be linked to coarse ocean 505 resolution and shortcomings in parameterised processes. In some regions, air-sea coupling tends to act to reinforce biases that may be generated in either atmosphere or ocean model components separately. The biases over the upwelling systems for example have generally a complex cause rooted in both local (including mesoscale) and remote (including equatorial) biases in both atmosphere and ocean model components (Toniazzo and Woolnough, 2014;Zuidema et al., 2016;Stammer et al., 2019). For NorESM2 the biases in the coupled simulations have a similar pattern as, but approximately twice the magnitude of the biases 510 in the OMIP simulations (not shown). The cold bias in the northern subtropical Pacific has a contribution from weak oceanic mixing as there is a large warm bias just below the surface (Fig. 10), but may be amplified by increased atmospheric stability and correspondingly enhanced boundary-layer clouds. Excessively negative short-wave cloud forcing is seen in that region, in contrast to AMIP simulations which show no such regional bias. In the central and eastern equatorial Pacific NorESM2 displays a characteristic "cold tongue" bias with cold SSTs and easterly wind stress bias. An equatorial easterly bias is present in the 515 16 https://doi.org/10.5194/gmd-2019-378 Preprint. Discussion started: 10 February 2020 c Author(s) 2020. CC BY 4.0 License.
NorESM2 AMIP simulations. Shonk et al. (2017) show that off-equatorial net precipitation biases alone can initiate a feedback leading to an equatorial Pacific cold tongue in coupled simulations, and CAM6-Nor tends to develop such a bias. Finally, the near-surface ocean temperature bias pattern in OMIP1 simulations is cold along the equator, and warm on each side, which may further enhance off-equatorial precipitation. It should be noted that OMIP2 simulations with BLOM/CICE have a warm bias along the equator (Tsujino et al., 2019). The cold equatorial bias can affect ENSO variability and teleconnections. These 520 are discussed further below.

Sea ice
The geographic distribution of sea ice in March and September, compared with observations are shown in Fig. 12 for NorESM2-LM (12e-h), and NorESM2-MM (12i-l). In common for both models for the Northern Hemisphere (12e,f,i,j) are too large sea ice extents in the Barents Sea and Greenland Sea and a too small extent in the Labrador Sea, Bering Sea, and Sea of Okhotsk 525 during winter. The total areas are quite close to the observations as shown in Fig. 8. These regional biases are most likely due to persistent biases in the oceanic and atmospheric circulation.
During summer, the distribution of sea ice in NorESM2-LM (Fig. 12f) seems to be more variable. Apart from the persistent, positive bias in the East Greenland Current, the regional biases within the Arctic Ocean are more likely due to inter-annual variability, and the effect that the observations show a larger downward trend than the model. NorESM2-MM (Fig. 12j) shows too much sea ice in the central Arctic in September. In general, the model is colder in the Arctic than NorESM2-LM (Fig. 14), and it has thicker sea ice in the Arctic Ocean. The Northern Hemisphere sea ice volume in NorESM2-MM is 19-23% (38-60%) larger in March (September) compared with the NorESM2-LM (not shown). The smaller seasonal cycle in ice area (Fig. 13) and volume is consistent with a thicker sea ice cover in NorESM2-MM, both due to less winter growth because of increased insulation, and less summer melt due to higher albedo. The situation encountered 535 in NorESM2-MM is similar to the results from NorESM1-M  and NorESM1-Happi (Graff et al., 2019).
These models simulate ice cover that is too thick, with the reduction in the Northern Hemisphere summer ice area being too slow.
The winter sea ice area and extent is too low in the Southern Ocean in NorESM2 as seen in Fig. 13 and Fig. 12(g-h,k-l).
Winter area in September is around 4 million square km too small. The largest bias is found in the Atlantic-Indian sector. This 540 bias seems to be associated with the warm bias in the ocean model, and the too warm intermediate Antarctic water (AAIW). The exact reason for this problem is not known, but the warm bias in AAIW is also evident in the OMIP simulations (not shown).
However, these uncoupled simulations have a reasonable representation of the upper ocean temperature and the winter sea ice extent that are most likely due to the inherent relaxation towards observed atmospheric temperatures in those experiments.
With the interactive atmosphere these problems increase.

Atmospheric temperature and winds
NorESM2 is a warmer model than its preceding versions. The global-mean near-surface temperature (Fig. 14) in NorESM1-M and NorESM1-Happi is generally too low with global-mean biases of -0.62 K and -0.94 K (see legends above panels in Fig. 14).
NorESM2-MM is closer to the reanalysis with a global-mean bias of -0.05 K. Regionally, cold biases are mostly found in the polar regions and over the sub-tropical oceans. Warm biases are found over the Southern Ocean, North Atlantic and in central 550 Eurasia. NorESM2-LM (panel a) is warmer still, and overestimates the near-surface temperatures in the Arctic and in the global-mean, with a bias of 0.58 K. NorESM2-MM has the best overall performance also in terms of the global-mean RMSE, with 1.33 K compared to 1.76 K for NorESM2-LM, and 1.71 K for NorESM1-Happi, and 1.79 K for NorESM1-M (cf Fig. 14).
Temperature biases are mitigated in NorESM2 compared to NorESM1, not only near the surface, but also and especially in the mid and upper troposphere (Fig. 16). In particular NorESM2 has a reduced cold bias compared to NorESM1 partic-555 ularly in the tropics and sub-tropics. This is mostly a consequence of the changes made to the cumulus convection scheme (Toniazzo et al. in prep.). NorESM2-LM being generally warmer in the tropics than NorESM2-MM, its cold biases there are smaller; however persistent cold mid-and high-latitude biases imply an excessive meridional temperature gradient. By contrast, NorESM2-MM shows improvements at all latitudes.
All four of NorESM2-MM, NorESM2-LM, NorESM1-Happi and NorESM1-M tend to produce westerly biases in zonal-560 mean zonal winds (Fig. 17). At tropical and sub-tropical latitudes, these are more widespread in NorESM2 than NorESM1-M and NorESM1-Happi, and at the same time the easterly surface biases are mitigated. At higher latitudes, all models tend to have westerly biases on the poleward side of the sub-polar surface jet (between 50 • and 60 • ) in both hemispheres. The overestimation on the poleward flank is generally more pronounced in NorESM2 than in NorESM1. Comparing NorESM1-M to NorESM1-Happi and NorESM2-LM to NorESM2-MM, the biases in the zonal wind tend to be ameliorated with increased 565 resolution. The differences in the tropics between NorESM2 and its predecessors is in part attributable to the enforcement of conservation of atmospheric global angular or rotational momentum in NorESM2 (Toniazzo et al., 2019). In all versions, in common with CAM6/CESM2, there is accumulation of westerly momentum near the model lid, where it is insufficiently damped.

Extratropical storm tracks 570
Extratropical storm tracks can be defined as regions of storminess associated with cyclogenesis, cyclone development, and cyclolysis which take place in the baroclinic zones between the sub-tropics and polar regions. They are important features at mid-and high latitudes as they are responsible for eddy transport of heat and momentum between low and high latitudes, and associated with potentially high-impact weather such as heavy precipitation and strong winds. Here, we diagnose stormtrack activity by applying a bandpass filter to retain fluctuations in the geopotential height field at 500 hPa with periodicity 575 corresponding to that of baroclinic waves, that is, between 2.5 and 6 days (Blackmon, 1976;Blackmon et al., 1977). The variability of the bandpass-filtered field is dominated by propagating low-pressure and high-pressure systems, and the storm tracks can be defined as geographically localized maxima in bandpass-filtered variability (Blackmon, 1976;Blackmon et al., 1977;Chang et al., 2002;Graff and LaCasce, 2012).
The climatological winter storm tracks are shown as the solid black contours in Fig. 18. There are two maxima in the 580 Northern Hemisphere, one over the North Atlantic and one over the North Pacific. The colors show the bias with respect to ERA-Interim (Dee et al., 2011). In NorESM1-M, storm-track activity is underestimated in both storm-track regions. In particular, the North-Atlantic storm track is overly zonal with too little activity on the equatorward side of the climatological maximum as well as over the Norwegian and Barents Sea Graff et al., 2019). The magnitude of the bias is reduced in NorESM1-Happi compared to NorESM1-M in both storm-track regions. This is likely associated with the increased 585 resolution in the atmosphere and land components (1 • in NorESM1-Happi versus 2 • in NorESM1-M).
Similar improvements are seen when comparing NorESM2-LM and NorESM2-MM. Both versions of NorESM2 are, furthermore, better able to simulate the North-Atlantic storm track with the size of the negative bias on its equatorward side being reduced. Overall, NorESM2-MM displays the smallest biases in Northern Hemisphere storm-track activity out of the four models. There remains, however, too little activity over the Norwegian Sea with extension into the Barents Sea.

590
In the Southern Hemisphere, the climatological winter storm track surrounds Antarctica with the largest variability occurring over the Indian Ocean (Fig. 18). Storm-track activity is generally too weak on the equatorward side, with the largest biases being located over the Indian Ocean, close to the storm-track maximum. As in the Northern Hemisphere, the largest biases are found in NorESM1-M and the smallest biases in NorESM2-MM.
While the bandpass-filter approach yields a measure of storm-track activity, it cannot be used to isolate the individual cyclone 595 centers. To further assess the robustness of the improvements between NorESM2-LM and NorESM2-MM, we therefore also consider results from the cyclone detection algorithm described in Wernli and Schwierz (2006). The method detects cyclones as minima in the sea-level pressure fields and sets the perimeter as the outermost closed sea-level pressure contour. The storm tracks are then seen as maxima in the local frequency of occurrence of surface cyclones, i.e. the fraction of time when cyclones are present in a given point ( Fig. 19a-b).

600
As for the bandpass-filter approach, the cyclone detection shows a clear reduction in the bias between NorESM2-LM and NorESM2-MM, which is likely to be associated with the higher horizontal resolution in the atmosphere and land components.
The cyclone occurrence is underestimated on the equator-ward side of the North Pacific and Southern Hemisphere storm tracks and overestimated on the poleward side. Over the North Atlantic, the cyclone occurrence is underestimated on the equator-ward side of the storm track and over the Norwegian Sea extending into the Barents Sea, and overestimated between The British 605 Isles and Greenland. The magnitude of the bias is clearly reduced in all regions in NorESM2-MM, with the improvement being particularly evident in the regions where the cyclone occurrence is overestimated.
Note that both the climatology and the biases should be expected to differ somewhat between the two approaches considered here because they capture different aspects of the storm tracks. The bandpass-filter approach does not distinguish between cyclones and anti-cyclones, and is dominated by growing and propagating baroclinic waves (Blackmon et al., 1977). The 610 cyclone occurrence reflects the regions where cyclone centers are identified most frequently, and is for instance more sensitive to systems that are slow moving or too long lived. Table 2 gives an overview of major forcing fluxes in NorESM2 compared to NorESM1 and observational estimates. Despite the large differences in physics and tuning, the overall numbers for top of the atmosphere fluxes and forcings are very similar to 615 the numbers found in NorESM1-Happi and are generally within the observational range. There is however a slightly stronger negative bias in clear-sky LW flux and long wave cloud forcing. The latter is an unfortunate consequence of the tuning of high clouds in the model implemented in order to increase the outgoing long wave radiation. As seen from the upward LW flux estimate the outgoing long wave radiation is still within the estimate from satellite retrievals. SWCF values are very similar to the values of NorESM1-Happi and within the observational range. This number hides, however, a major weakness 620 in NorESM1 stratiform cloud parameterisation which underestimated the cloud cover and compensated this by overestimating the cloud liquid water.

Clouds and forcing
The major updates in cloud physics from CAM4 to CAM6 (Bogenschutz et al., 2018) improved the cloud cover, and the cloud liquid water path is now quite close to the observational estimate. The global cloud cover is still slightly lower than observed. This is partly connected to the tuning in NorESM2. Prior to the tuning the modelled cloud cover was higher than 70 625 %. As seen from Fig. 15, the cloud cover underestimate is most pronounced in the tropics and subtropics in both hemispheres, while there is good agreement around the extra-tropical stormtrack regions and an overestimate in the high Arctic. Before the tuning (not shown) there was no bias at the low latitudes.
The modelled liquid water path seems to have a systematic bias towards low values at low latitudes and high values in the extra-tropics. Possible connections between cloud cover biases and the hydrological cycle are discussed in the next section.

Precipitation and hydrological cycle
The bias in the annual-mean total precipitation rate is shown in Fig. 20 for the two versions of NorESM1 and NorESM2 relative to the ERA-Interim re-analysis. along with climatology from the Global Precipitation Climatology Project (GPCP; Adler et al., 2003). While the bias of the global-mean average is not systematically reduced between NorESM1 and NorESM2, there is a reduced RMSE, indicating that there is less cancellation between positive and negative biases in the global mean.

635
The reduction of the RMSE is also seen when considering the four seasons separately in Fig. 21 along with climatology from the GPCP. The evaluation of the mean bias, RMSE, and correlation included in the bottom left corner of each panel shows that RMSE and correlation have improved in NorESM2 compared to NorESM1 for all seasons. While the overall wet bias has increased slightly, mostly due to strong biases over the Pacific ocean, there are regions with a large reduction in mean bias. This is especially pronounced over Africa and equatorial Atlantic ocean. The largest improvement compared to NorESM1-M is 640 seen for NorESM2-MM during northern hemisphere winter, when all three metrics (bias, RMSE, and correlation) consistently indicate higher skill.
As a measure of interannual variability, the standard deviation of monthly means for each season was calculated. The differences compared to GPCP are presented in Fig. 22. While NorESM1 slightly underestimates the precipitation variability, it is somewhat too high in NorESM2, with the magnitude of the bias being larger in all seasons except DJF and SON in 645 NorESM2-MM. As seen for the mean climatology in Fig. 21, the correlation has improved for all seasons in both NorESM2-LM and MM.
The hydrological cycle (or cycling of fresh water) is of major importance for the climate system. Global means of precipitation and evaporation can serve as integrated measures of the properties of many processes in an earth system model. Results Table 3 indicate that the intensity of hydrological cycle, as measured by evaporation, in NorESM2 is about 1.1%  Table 3 for NorESM2 are higher than for GPCP, they are closer to results from ERA-Interim calculated by Trenberth et al. (2011a). Although NorESM1-Happi has the highest precipitation globally, NorESM2 has the highest precipitation over ocean, suggesting a larger re-cycling of oceanic water vapor and a lower fraction transported from oceans to continents (measured by E-P over oceans). The overestimated evaporation over oceans is likely linked to the underestimated 655 cloudiness in the tropics and subtropics (see discussion above about Fig. 15 ). Solar radiation over subtropical ocean regions is an important driver of evaporation. The net moisture transport from oceans to continents is nevertheless smaller in NorESM2 than in NorESM1, consistent with more clouds in the extra-tropics and more marine precipitation in NorESM2. This analysis is only preliminary, however, and needs more in-depth studies which is out of scope of the present paper.

presented in
In the NorESM2 earth system model a closed hydrological cycle is present, with the difference between evaporation and 660 precipitation being close to zero in the long-term average at equilibrium. In NorESM2-MM the discrepancy is only 0.001 km 3 /year, whereas it is 0.027 km 3 /year in NorESM1-M and 0.016 km 3 /year in NorESM2-LM (means from members 1-3).

Northern Hemisphere blocking
While storm tracks are closely tied to precipitation, atmospheric blocking is associated with persistent anti-cyclones that inhibit precipitation for time scales up to several weeks. To diagnose blocking, we apply the variational Tibaldi and Molteni (vTM) 665 index (Tibaldi and Molteni, 1990;Pelly and Hoskins, 2003;Iversen et al., 2013;Graff et al., 2019). Blocks are identified when there is persistent reversal of the 500 hPa geopotential height field around a central latitude that last for at least five days and cover at least 7.5 consecutive longitudes. The central longitude varies with the position of the maximum in the Northern Hemisphere climatological storm track.
The seasonal blocking frequency is mostly underestimated over the North Atlantic and in Europe in the four versions of 670 NorESM (Fig. 23), particularly during winter (DJF). During spring (MAM), NorESM2-MM is closest to the reanalysis, while during summer (JJA) and autumn (SON), NorESM1-Happi performs best in these regions. While NorESM1 tends to overestimate the blocking frequency over the Pacific, NorESM2 generally lies closer to the reanalysis in that sector. Consider, for instance, the region between 120 • E and 180 • E during summer, or the region between 130 • W and 90 • W during winter. In summary, although the use of 30 years from ERA-Interim for verification may not be fully representative for blocking clima-675 tology, the representation of NH blocking continues to be a challenge in NorESM, and in particular over the Atlantic-European sector in winter.

Madden-Julian Oscillation
In the tropical atmosphere, the Madden-Julian Oscillation (MJO) is the dominant mode of variability on timescales between 30 and 90 days (Madden and Julian, 1971;Zhang, 2005). The MJO is characterized by large-scale regions of enhanced and 680 suppressed convection that slowly propagate eastwards along the equator, and interacts with a number of other circulation features such as El Niño events (Hendon et al., 2007), the Indian summer monsoon (Annamalai and Slingo, 2001), tropical cyclones (Liebmann et al., 1994), and even the North Atlantic Oscillation and extratropical variability (Cassou, 2009).
We diagnose the MJO in two ways. One is in terms of temporal correlations between subseasonally filtered anomalies of precipitation and winds along the equatorial Indian ocean. The second is in terms of wavenumber-frequency spectrum for 850 685 hPa zonal wind (U850) and for outgoing longwave radiation (Fig. 24). These diagnostics have been proposed and described in detail in Waliser et al. (2009).
Positive wavenumbers and frequencies indicate eastward propagation, while negative frequencies (or wavenumbers) indicate westward propagation. The energy in the spectrum for U850 from ERA-Interim (Fig. 24a) shows that the energy in the reanalysis is associated with wavenumbers 1-3 with a maximum for wavenumber 1, and with the energy being more or less contained 690 within timescales of 30 to 80 days. NorESM2-LM and NorESM2-MM also show maximum energy for the same wavenumbers, with the maximum occurring for wavenumber 1, as in ERA-Interim. The maximum is, however, somewhat too strong in both models and the energy is spread out over a wider range of timescales. Both NorESM2-MM and NorESM2-LM peak at longer timescales (lower frequency) than ERA-Interim, and NorESM2-LM has an additional peak at shorter timescales (higher friquency). Similar results are found when comparing the wave-number frequency spectra for outgoing longwave radiation 695 from NorESM2-LM and NorESM2-MM with that from NOAA, however here the peak energy is underestimated.
Lead-lag correlations with respect to precipitation anomalies during extended winter in the equatorial Indian Ocean around 90E are characterised in observations (Figure 25(a)) by a marked, slow eastward propagation and some poleward propagation.
There is a strong relationship with zonal winds in quadrature such that westerly wind anomaly maxima precede the precipitation maxima. These characteristics are simulated fairly clearly in NorESM-LM (Figure 25 anomaly greater than 2.5 • C in the average El-Niño event, compared with 1.5 • C in observations), and they tend to peak early 710 in the season, i.e. between November and December instead of between December and January as observed (Figure 28a). The early peak and termination may be partly attributable to weak zonal wind-stress anomalies over the equatorial region, which also peak early, notwithstanding a robust response in equatorial precipitation (Figure 28b during El-Niño events, which is seen also in OMIP1 and OMIP2 simulations forced with prescribed wind-stress (Figure 28d).
Given the weak coupled wind-stress and thermocline activity, the large SST anomalies may be partly the result of insufficient surface damping by the action of anomalous surface heat fluxes.

720
Correlation analysis shows that indeed over the eastern equatorial Pacific the model tends to generate positive downward net short-wave radiative flux anomalies when SST anomalies are positive, in contrast to observations. This might also explain the growth of positive SST anomalies in the NINO3.4 region early during El-Niño events even before positive 20 • C isotherm depth anomalies have fully reached the area; and the long persistence of both SST and precipitation anomalies in the later stages of El-Niño events. The model climatological bias of a pronounced double ITCZ, with strong ITCZ precipitation away from the 725 equator and a dry, cold equatorial region dominated by marine stratocumulus, rather than trade-cumulus cloud in the eastern Pacific probably contributes to this behaviour. Toniazzo et al. (in prep.) shows that changes in the convection scheme that were made in order to mitigate the tropospheric cold bias and the positive TOA net residual have contributed to this error. Offequatorial precipitation tends to couple with westward-propagating equatorial modes and can lead to a tendency for westward propagation of convective activity (cf. Figure 24). Westward propagation is also evident in the model's ENSO during the phase into both hemispheres during and after ENSO peaks, with a PNA pattern that extends into the storm-track entry region of the western Atlantic, as observed. In this respect NorESM2-MM validates better than NorESM2-LM, in spite of its equivalent of slightly worse equatorial ENSO biases, probably due to a better overall sub-tropical and high-latitude atmospheric circulation. In particular NorESM2-LM shows a satisfactorily evolution of recent sea ice area. In NorESM2-LM an ice free Arctic Ocean 765 is only avoided in the SSP1-26 scenario. NorESM2-MM simulates higher sea ice area both at present and in future scenarios.
The pattern of some biases seen in the fully coupled simulations considered here are similar in coupled ocean-sea ice simulations carried out for OMIP and can thus be linked to the ocean model having too coarse resolution and shortcomings in parameterised processes. NorESM2-LM and MM largely share the same biases in the surface ocean, although the MM version is somewhat closer to the observations. Most of the large-scale biases in the surface ocean are also seen in the subsurface.

770
Like CESM2, NorESM2 is generally a "cold" model, with an initial deficit in atmospheric long-wave cooling that causes a positive RESTOM and leads to heat gain by the ocean and positive SST biases particularly in the tropics. NorESM2 represents an improvement in this respect compared to NorESM1. This is particularly evident in the tropical and sub-tropical troposphere ( Fig. 16). In addition, the medium-resolution version of the model has more realistic upper tropospheric meridional temperature gradients, and reduced near-surface temperature biases.

775
The extratropical storm tracks are generally better simulated in NorESM2 than in NorESM1, particularly over the North Atlantic. The storm tracks additionally improve with higher resolution, both in the Northern and Southern Hemisphere.
Several aspects of the modeled cycling of fresh water are improved in NorESM2 compared to NorESM1, including the RMSE and spatial correlation of the bias in the total precipitation rate. The intensity of the hydrological cycle as compared to the observationally based findings of Trenberth et al. (2011a) is slightly exaggerated in NorESM2 as it was in NorESM1, consistent 780 with the underestimated cloudiness and thus overestimated solar radiation in the tropics and sub-tropics. The transport of oceanic water vapor over the continents is smaller in NorESM2 than NorESM1, indicating a slightly too efficient re-cycling of oceanic water vapor associated with over-estimated oceanic precipitation and higher cloudiness in the extratropics.
The seasonal blocking frequency in the Northern Hemisphere is in particular underestimated over the Atlantic -European sector during winter (DJF) by NorESM2. During spring (MAM), NorESM2-MM is closest to the reanalysis, while during 785 24 https://doi.org/10.5194/gmd-2019-378 Preprint. Discussion started: 10 February 2020 c Author(s) 2020. CC BY 4.0 License. summer (JJA) and autumn (SON), NorESM1-Happi performs best in these regions. While NorESM1 tends to overestimate the blocking frequency over the Pacific, NorESM2 generally lies closer to the reanalysis in that sector. Although the use of 30 years from ERA-Interim for verification may not be fully representative for blocking climatology, the simulation of NH blocking continues to be a challenge for NorESM.
The coupled model internally generates a self-sustained ENSO mode with spatial and temporal characteristics similar to 790 observations. ENSO SST anomalies are very large compared to observations (with a NINO3.4 anomaly greater than 2.5 • C in the average El-Niño event, compared with 1.5 • C in observations), and they tend to peak early in the season, i.e. between November and December instead of between December and January as observed. Nevertheless many proprties of the ENSO are similar to those observed, and El-Niño teleconnections are quite realistic both in the tropics and at mid-and high latitudes. Less satisfactory is the performance of the coupled model in terms of the Madden-Julian oscillation. Here the low resolution version 795 appears to produce more intense and more realistic sub-seasonal tropical variability than the medium-resolution version.