The Norwegian Earth System Model , NorESM 1M – Part 1 : Description and basic evaluation of the physical climate

The core version of the Norwegian Climate Center’s Earth System Model, named NorESM1-M, is presented. The NorESM family of models are based on the Community Climate System Model version 4 (CCSM4) of the University Corporation for Atmospheric Research, but differs from the latter by, in particular, an isopycnic coordinate ocean model and advanced chemistry–aerosol–cloud–radiation interaction schemes. NorESM1-M has a horizontal resolution of approximately 2 for the atmosphere and land components and 1 ◦ for the ocean and ice components. NorESM is also available in a lower resolution version (NorESM1-L) and a version that includes prognostic biogeochemical cycling (NorESM1ME). The latter two model configurations are not part of this paper. Here, a first-order assessment of the model stability, the mean model state and the internal variability based on the model experiments made available to CMIP5 are presented. Further analysis of the model performance is provided in an accompanying paper (Iversen et al., 2013), presenting the corresponding climate response and scenario projections made with NorESM1-M.

Despite the nationally coordinated effort, Norway has insufficient expertise and manpower to develop, test, verify and maintain a complete earth system model.For this reason, NorESM is based on the Community Climate System Model version 4 (CCSM4; Gent et al., 2011;Vertenstein et al., 2010) operated at the National Center for Atmospheric Research on behalf of the Community Climate System Model (CCSM)/Community Earth System Model (CESM) project of the University Corporation for Atmospheric Research.NorESM is, however, more than a model "dialect" of CCSM4.Notably, NorESM differs from CCSM4 in the following aspects: NorESM utilises an isopycnic coordinate ocean general circulation model developed in Published by Copernicus Publications on behalf of the European Geosciences Union.
Bergen during the last decade (e.g.Bentsen et al., 2004;Drange et al., 2005b;Lohman et al., 2009;Orre et al., 2010), originating from the Miami Isopycnic Coordinate Ocean Model (MICOM) (Bleck et al., 1992).The atmospheric module is modified with chemistry-aerosol-cloudradiation interaction schemes developed for the Oslo version of the Community Atmosphere Model (CAM4-Oslo; Kirkevåg et al., 2013).Finally, the HAMburg Ocean Carbon Cycle (HAMOCC) model developed at the Max Planck Institute for Meteorology, Hamburg (Maier-Reimer, 1993;Maier-Reimer et al., 2005), adapted to an isopycnic ocean model framework, constitutes the core of the biogeochemical ocean module in NorESM (Tjiputra et al., 2010).In this way NorESM adds to the much desired climate model diversity, and thus to the hierarchy of models participating in phase 5 of the Climate Model Intercomparison Project (CMIP5; Moss et al., 2010;Taylor et al., 2012).In this and in an accompanying paper (Iversen et al., 2013), NorESM without biogeochemical cycling is presented.The reader is referred to Assmann et al. (2010) and Tjiputra et al. (2013) for a description of the biogeochemical ocean component and carbon cycle version of NorESM, respectively.
There are several overarching objectives underlying the development of NorESM.Western Scandinavia and the surrounding seas are located in the midst of the largest surface temperature anomaly on earth (Drange et al., 2005a), governed by anomalously large oceanic and atmospheric heat transports (Seager et al., 2002;Shaffrey and Sutton, 2006;Jungclaus and Koenigk, 2010).Small changes to these transports may result in large and abrupt changes in the local climate.To better understand the variability and stability of the climate system, detailed studies of the formation, propagation and decay of thermal and (oceanic) fresh water anomalies are required.Only a community effort bridging observations, theory and modelling can significantly advance our understanding on these issues.
There are also many unresolved questions related to the unprecedented warming and sea ice loss in the Arctic observed during the last decades (Boé et al., 2009;Stroeve et al., 2012;Årthun et al., 2012), and how these changes may influence the generated modes of variability and long-term changes in the region.A state-of-the-art model system will contribute to address these changes.
Central to the NorESM activity is therefore improvement, implementation and verification of climate processes that are of particular importance at high (northern) latitudes, and consequently for polar climate.As the tropics are of key importance for global heat and moisture budgets, as well as for generating and influencing major climate variability modes, analysis of climate feedbacks, responses and sensitivities of low-latitude climate are an inherent part of the activity.
Several studies show that the optimal, or "best", climate model is not an individual model but the ensemble mean of all available models (e.g.Reichler and Kim, 2008), possibly excluding apparent model outliers.An individual model, like NorESM, may or may not belong to the preferred set of models for studying specific climate phenomena, like climate variability and changes at high latitudes.Deep insight into one or several models is nevertheless a prerequisite to fully acknowledge both opportunities and limitations when analysing the available suite of model output.In this way, NorESM is an advanced tool for earth system researchers.
The present paper provides a general description and basic evaluation of the atmosphere-sea-ice-ocean part of NorESM.Particular focus is put on the simulated climatology, stability and internal variability deduced from the model's control and historical simulations.An accompanying paper (Iversen et al., 2013) presents the climate response and scenario projections, mainly based on analysis of the various CMIP5 scenario integrations made with NorESM.
Currently, NorESM exists in three versions.The model version presented here is the first version of the model with intermediate resolution, labelled NorESM1-M.Intermediate resolution is in this context a horizontal resolution of approximately 2 • for atmosphere and land components and 1 • for ocean and ice components.For brevity, NorESM is used throughout this paper.NorESM is also available in a lower resolution version, labelled NorESM1-L.The latter version is primarily tailored for millennium-scale simulations of past climate (Zhang et al., 2012).Finally, the above-mentioned version of NorESM that includes biogeochemical cycling, in particular the cycling of carbon, is labelled NorESM1-ME.
The paper is organised as follows.In Sect.2, a general overview of NorESM is provided, elaborating on similarities and, in particular, differences between NorESM and the parent CCSM4.In Sect.3, the design of the various model experiments is presented.The following two sections focus on the long-term model stability and model mean state.Key modes of simulated internal variability are discussed in Sect.6. Section 7 is devoted to the simulated 21st century climate, and the paper is summarised in Sect.8.

Model description
NorESM is, as mentioned above, largely based on CCSM4.The main differences are the isopycnic coordinate ocean module in NorESM and that CAM4-Oslo substitutes CAM4 as the atmosphere module.The sea ice and land models in NorESM are basically the same as in CCSM4 and the Community Earth System Model version 1 (CESM1), except that deposited soot and mineral dust aerosols on snow and sea ice are based on the aerosol calculations in CAM4-Oslo.

Atmospheric component
CAM4-Oslo is a version of CAM4 (Neale et al., 2010(Neale et al., , 2013) ) with parameterizations of aerosols, aerosol-radiation and aerosol-cloud interactions originally developed for use in Community Climate Model-Oslo (CCM-Oslo).With respect to physics, CAM4-Oslo applies the standard configuration of CAM4, e.g. the Rasch and Kristjánsson (1998) scheme for stratiform cloud processes and the CAM-RT radiation scheme, which were both also used in CAM3 (Collins et al., 2006).As in CAM4, deep convective clouds are parameterized following Zhang and McFarlane (1995) extended with the plume dilution and convective momentum transport also used in CCSM4 (Richter and Rasch, 2008;Neale et al., 2008).We use the finite volume dynamical core for transport calculations (Rasch et al., 2006) with horizontal resolution of 1.9 • latitude by 2.5 • longitude (in short referred to as 2 • resolution) and with 26 levels in the vertical with a hybrid sigma-pressure co-ordinate and model top at 2.917 hPa.The horizontal grid mesh size is double that of the standard version used in CCSM4 (Gent et al., 2011).
The modelling of aerosol processes in CAM4-Oslo (Kirkevåg et al., 2013) is extended from versions of CAM-Oslo described by Seland et al. (2008), Kirkevåg et al. (2008b), Storelvmo et al. (2006), Hoose et al. (2009), and Struthers et al. (2011).Apart from a few modifications of the parameter tuning for cloud micro-and macrophysics, described and discussed in Sect.3, the changes we have introduced to arrive at CAM4-Oslo are all related to aerosols and their interactions with radiation and cloud microphysics.The most important changes with respect to anthropogenic impacts on climate are the inclusion of biogenic primary organics and methane sulfonic acid from oceans, as well as a nearly doubled production of land-based biogenic secondary organic aerosols compared to Kirkevåg et al. (2008b).This increased abundance of natural organic matter has contributed to a considerable decrease of the indirect radiative forcing by anthropogenic aerosols in the model.Compared to year 1850, Kirkevåg et al. (2013) estimated a change in indirect radiative forcing of −0.9 W m −2 at year 2000 and −1.2 W m −2 at year 2006.These values are closer to the estimate by the IPCC fourth assessment report (AR4) of −0.7 [−1.1, +0.4] W m −2 (only cloud albedo effect; Forster et al., 2007) than the previous estimate in CAM-Oslo of −1.9 W m −2 by Hoose et al. (2009).Due to the increased natural organic matter levels in the model, this has been obtained without imposing unrealistic artificial lower bounds on cloud droplet number concentrations, which are still used to constrain the radiative forcing by aerosols in many climate models (see, e.g.Hoose et al., 2009).The change in direct radiative forcing in CAM4-Oslo from year 1850 to 2000 amounts to −0.08 W m −2 (Kirkevåg et al., 2013).
CAM4-Oslo calculates mass concentrations of aerosol species that are tagged according to production mechanisms in clear and cloudy air in four size classes: nucleation, Aitken, accumulation, and coarse mode particles.In addition to transport and removal of aerosols, microphysical processes that are treated are gaseous and aqueous chemistry, nucleation, condensation (by sulphuric acid gas or by water vapour, i.e. hygroscopic swelling), and coagulation.Included aerosol components are sulphate, black carbon, organic matter, sea salt, and mineral dust.Transported aerosol precursor gases are dimethyl sulfide and SO 2 , while oxidant concentrations for the sulphate chemistry are prescribed.Calculation of particle numbers and sizes are based on assumed size distributions for emitted or produced primary particles, followed by subsequent growth either by condensation, coagulation, or wet phase chemistry.
To limit the computational cost during the integration of the model, physical properties of the aerosols, including the optical properties, are estimated by interpolating between pre-calculated values in look-up tables, using process-tagged aerosol mass concentrations and ambient relative humidity as input.The look-up tables provide spectrally resolved optical parameters which are used to estimate the direct effect of aerosols in the model, as well as aerosol modal size parameters which are used as input in the calculation of activation of cloud condensation nuclei (CCN-activation) and aerosol indirect effects.
The CAM4-Oslo cloud scheme includes a prognostic treatment of cloud droplet number concentration in order to represent the cloud albedo and cloud lifetime effects in liquid clouds.We use the parameterization scheme of Abdul-Razzak and Ghan (2000) for activation of aerosols into cloud droplets, taking into account the sizes (Hoose et al., 2009) and hygroscopicities of the aerosols (Storelvmo et al., 2006), as well as the competition for available vapour between the different particles.Internal mixing between aerosols of different hygroscopicities is treated by assumptions on coating (Hoose et al., 2009).With the horizontal grid resolution of a climate model such as NorESM, updraft velocities forming clouds are not resolved, and therefore need to be parameterized.As explained in Hoose et al. (2009), the current formulation of the subgrid-scale updraft velocity depends on the turbulent eddy exchange coefficient and a fixed turbulent mixing length, following Morrison and Gettelman (2008).Aerosol indirect effects on mixed-phase and ice clouds (e.g.Hoose et al., 2010) are not included in the current version of CAM4-Oslo.
Further aspects of the treatment of aerosols, aerosolradiation and aerosol-cloud interactions in CAM4-Oslo, in particular the updates compared to earlier versions of the model, are thoroughly described and discussed by Kirkevåg et al. (2013).

Land component
The land model in NorESM is the original version 4 of the Community Land Model (CLM4) (Oleson et al., 2010;Lawrence et al., 2011) of CCSM4.Incorporated in CLM4 is the SNow, ICe, and Aerosol Radiative model (SNICAR; Flanner and Zender, 2006), which enable calculations of radiative effects of snow darkening caused by deposited absorbing aerosols.The surface albedo and the vertical absorption profile depend on solar zenith angle, albedo of the underlying snow, mass concentrations of atmospheric-deposited aerosols, and ice effective grain size, which is simulated with a separate snow aging routine.Atmospheric-deposited aerosol components that may be treated in SNICAR are black carbon, mineral dust, and organic carbon.As in the standard setup of CLM4 in CCSM4, absorption by organic carbon is not taken into account in NorESM.In the NorESM experiments discussed in this study, the carbon-nitrogen (CN) cycle option of CLM4 is enabled (Thornton et al., 2007;Gent et al., 2011).Within the land component the carbon and nitrogen are prognostic variables, while carbon and nitrogen fluxes are diagnostically determined and do not influence other model components.
The land component shares the same horizontal grid as the atmospheric component, except for the river transport model that is embedded in the land component but configured on its own grid with a horizontal resolution of 0.5 • .

Sea ice component
The sea ice model in NorESM is the original CICE4 version used in CCSM4 (Gent et al., 2011;Holland et al., 2012).The code is based on version 4 of the Los Alamos National Laboratory sea ice model (CICE4) as described by Hunke and Lipscomb (2008).Important extensions of this model that are utilised in NorESM and CCSM4 are the delta-Eddington short-wave radiation transfer (Briegleb and Light, 2007), melt pond and aerosol parameterizations, all detailed by Holland et al. (2012).In NorESM, deposited aerosols on snow and ice (hydrophobic and hydrophilic black carbon, and dust) are calculated prognostically in the atmospheric component CAM4-Oslo.The sea ice component is configured on the same grid as the ocean component detailed below.

Ocean component
The ocean component of NorESM uses potential density as the vertical coordinate.The main motivation is to exploit the fact that isopycnic surfaces are a good approximation to neutral surfaces in regions of the ocean.Thus, there is a potential to formulate a numerical model with accurate transport and mixing along isopycnals and complete control of the diapycnal mixing applied.To maximize the neutrality of the isopycnal surfaces, the potential density is referenced to 2000 dbar (McDougall and Jackett, 2005).As mentioned in Sect. 1, the model is based on MICOM (Bleck and Smith, 1990;Bleck et al., 1992) and key aspects retained from MICOM are a mass conserving formulation (non-Boussinesq), Arakawa C-grid discretization, leap-frog and forward-backward time stepping for the baroclinic and barotropic mode, respectively, and a potential vorticity/enstrophy conserving scheme (Sadourny, 1975) for the momentum equation.
For the NorESM experiments presented here, a grid with 1.125 • resolution along the equator is used with the Northern Hemisphere grid singularity located over Greenland.The grid is one of the standard grids (gx1v6) provided by CCSM4, and we adopt their ocean mask.The bathymetry is created by averaging the depths of a high resolution data set (S2004; Marks and Smith, 2006) belonging to each ocean grid cell, and editing of the bathymetry is limited to setting key sills and channels to their actual depths.A total of 53 model layers are used with layer reference potential densities in the range 28.202-37.800kg m −3 .
The incremental remapping algorithm (Dukowicz and Baumgardner, 2000) is used for the advection of layer thickness and tracers (including potential temperature and salinity).The second order accurate algorithm is expressed in flux form and thus by construction conserves mass and tracers.Furthermore, it guarantees monotonicity of layer thickness for a divergence free velocity field and monotonicity of tracers for any velocity field.For a single tracer this method is not particularly computationally efficient compared to other methods of comparable accuracy, but adding additional tracers comes at a modest computational cost.The ability to handle numerous biogeochemical tracers in an accurate, robust and efficient manner was an important motivation for selecting the incremental remapping algorithm.
The pressure gradient force (PGF) is estimated by evaluating the gradient of the geopotential on pressure surfaces and the geopotential is obtained by accurate vertical integration of in situ density.This PGF estimation mitigates a longstanding issue in isopycnal models with inaccurate dynamics in regions where the pressure differs substantially from the reference pressure (Sun et al., 1999) and shares similarities with the finite volume discretization of the PGF proposed by Adcroft et al. (2008).
The new PGF formulation required modifications to the original MICOM barotropic/baroclinic mode splitting and in this process the mass and tracer conservation of the model was greatly improved.The application of time filtering in MICOM, needed for controlling the computational mode of the leap-frog time stepping, has been modified to sample variables at different time levels more consistently in terms of operators applied.This reduced the non-conservation of the time filtering.Following Morel et al. (2008), the application of column averaged variables from the baroclinic equations in the barotropic equations was modified, leading to a doubling of the allowable baroclinic time step.
To be able to handle vigorous diapycnal mixing in regions of the ocean, an implicit time integration of the diapycnal diffusion based on Hallberg (2000) has been implemented.This is particularly important in the modelling of gravity currents with strong vertical mixing in combination with sharp vertical density gradients.The potential density of interior layers might deviate from their prescribed reference potential density and this is mainly due to cabbeling, absorption of penetrating short-wave radiation, and the mixed layer detrainment process.The fluxes obtained from solving the diapycnal diffusion equation are adjusted in order to reduce the deviation from prescribed potential densities.
MICOM has a single bulk surface mixed layer while in NorESM the mixed layer is divided into two model layers with freely evolving density and equal thicknesses when the mixed layer is shallower than 20 m.The uppermost layer is limited to 10 m when the mixed layer is deeper than 20 m.The main reason for this was to allow for a faster ocean surface response to surface fluxes.The first model layer below the mixed layer is not required to stay close to its prescribed reference potential density.Then there are fewer constraints on particularly the mixed layer detrainment process that now follows closely the approach of Oberhuber (1993).Further, the static stability of the uppermost layers are measured by in situ density jumps across layer interfaces, thus allowing for layers that are unstable with respect to potential density to exist.This improved the representation of water masses in weakly stratified high latitude haloclines.
The parameterization of thickness eddy diffusivity follows the diagnostic version of the eddy closure of Eden and Greatbatch (2008) as implemented by Eden et al. (2009).The isopycnal eddy diffusivity is set equal to the thickness diffusivity.The lateral eddy diffusivity in the mixed layer uses the mean thickness diffusivity of the upper 100 m of the isopycnic interior of the model.The thickness/isopycnal/lateral eddy diffusion is reduced when the grid resolves the first baroclinic Rossby radius (R. W. Hallberg, personal communication, 2009).It should be noted that thickness diffusion is implemented as layer interface diffusion and will always act to reduce the available potential energy of the ocean.
The mixed layer depth is parameterized by considering a turbulent kinetic energy (TKE) balance equation.MICOM provides two options of TKE models based on Kraus and Turner (1967) and Gaspar (1988).We found that both formulations overestimated the mixed layer depth at high latitudes.A TKE model based on Oberhuber (1993), extended with a parameterization of mixed layer restratification by eddies (Fox-Kemper et al., 2008), is now used, leading to reduced mixed layer depth biases compared to the original MICOM alternatives.An exponential decay curve is used for penetrating short-wave radiation, assuming clear water everywhere (Jerlov water type 1).To reduce sea surface salinity (SSS) and stratification biases at high latitudes, salt released during freezing of sea ice is distributed evenly below the mixed layer down to the depth with a density contrast of 0.4 kg m −3 compared to the surface.The distribution depth of salt is limited by 500 m.
The background diapycnal diffusivity is vertically constant but with a latitude dependence approximately following Gregg et al. (2003).This gives gradually reduced diffusivities towards the equator with a value of 10 −5 m 2 s −1 at 30 • latitude.Shear driven diapycnal mixing is parameterized using the local gradient Richardson number according to Large et al. (1994) but with increased maximum allowable mixing near the ocean bottom to provide sufficient mixing downstream of overflows.Further, a portion of the energy extracted from the mean flow by the bottom drag is used to drive diapycnal mixing (Legg et al., 2006).Tidally driven diapycnal mixing is parameterized according to Simmons et al. (2004) using the estimated tidal energy dissipation by Jayne (2009).
There is no mass exchange with the other components of NorESM.Thus, the freshwater fluxes are converted to a virtual salt flux before they are applied in the ocean.In the experiments of this study, geothermal heating is not used.

Coupler
The CCSM4 coupler CPL7 (Craig et al., 2012) handles the overarching execution control of the coupled system and the exchange of information between model components.Inherent in the coupler is a top-level driver that organises the coupled model into a single executable and issues calls to initialisation, run, and finalization routines for each model component.The components can be configured to run sequentially, concurrently, or as a combination of those two.This allows for flexible execution strategies to optimize the use of available hardware resources.
In the experiments discussed here, the state fields and fluxes are exchanged between the components half-hourly except for the ocean components that are coupled once per day.The land and ice components are responsible for computing the atmosphere/land and atmosphere/ice fluxes, respectively, while the coupler computes the atmosphere/ocean fluxes every half hour, providing the instantaneous fluxes to the atmosphere and daily mean fluxes to the ocean component.

Experimental design
Aspects of the model tuning process prior to conducting the model experiments are described in this section.Further, the model experiments made available to CMIP5, including the pre-industrial spin-up, are described.

Model tuning
In earth system models, such as NorESM, there are numerous parameters associated with physical parameterizations that can be assigned values within bounds set by empirical or physical reasoning.It is beyond the scope of this study to describe all aspects of parameter tuning in NorESM.Emphasis here will be on the approach to minimize the radiative imbalance at the top of atmosphere (TOA), due to its importance for a stable climate state in CMIP5 long-term experiments.
In order to obtain a realistic simulated climate while maintaining a net radiative balance at the TOA, some of the cloud micro-and macrophysical parameters have been adjusted in CAM4-Oslo compared to the values used in CAM4.
Concerning cloud microphysics, the critical mean droplet volume radius for the onset of autoconversion, denoted r 3lc in Rasch and Kristjánsson (1998) has been increased from 10 to 14 µm.For comparison, Collins et al. (2006) and Seland et al. (2008) adopted the value 15 µm.Finally, as in Kristjánsson (2002), the maximum precipitation rate at which the autoconversion of cloud water to rain is suppressed (Rasch and Kristjánsson, 1998), has been increased from 0.5 to 5.0 mm d −1 .This is the same value as used in CAM-Oslo (Kirkevåg et al., 2008b;Seland et al., 2008;Hoose et al., 2009;Struthers et al., 2011).
The introduction of prognostic cloud droplet number concentrations (CDNC) in CAM4-Oslo (following Storelvmo et al., 2006;Hoose et al., 2009) has resulted in less numerous, larger cloud droplets than in CAM4/CCSM4.Since autoconversion of cloud water to rain is more effective for the larger cloud droplets, the new CDNC treatment leads to significantly reduced, and thus improved, liquid water paths (LWP).Whilst global estimates of LWP from satellite retrievals vary by more than a factor of 2 from about 50 g m −2 to more than 100 g m −2 , the value 131 g m −2 in CCSM4 is reduced to 100 g m −2 in CAM4-Oslo when other parameters are the same (Kirkevåg et al., 2013).The above-mentioned tuning of the autoconversion parameters, however, tends to suppress the conversion of cloud water to rain, and the modelled LWP is therefore increased to about 122 g m −2 in the pre-industrial control simulation, and about 125 g m −2 for the period 1976 to 2005 in the historical simulations (Table 1).A recent observational estimate based on NASA A-Train measurements (Jiang et al., 2012) gives a globally averaged LWP of 30-51 g m −2 as a best estimate and with upper and lower uncertainty limits at 102 g m −2 and 15 g m −2 , respectively.Using this study as a guideline, it is clear that NorESM overestimates the liquid water content, in much the same way as CCSM4 does (Jiang et al., 2012).
Another tuning adjustment was made for the lower threshold of the relative humidity when stratiform clouds start to form.The threshold is 0.90 in NorESM while it is 0.91 in CCSM4, hence while the average global cloudiness was 46 % in CCSM4, it is 54 % in NorESM-1.This is an underestimation compared to data from ISCCP (67 %, see Table 1).In effect, what the tuning has accomplished is to increase the optical thickness of the clouds (proportional to the LWP) enough to compensate for the effect of a too low cloud fraction, so that we have achieved a radiation budget very close to balance at the TOA after about 700 yr of spin-up, and without serious climate drift in the following control simulation.Nevertheless, the biases in cloudiness and cloud liquid water (see Fig. 7) are clearly a weakness of NorESM (e.g.Jiang et al., 2012).
and sea ice model and the land model CLM4 (see the cldtunorig test in Table 7 of Kirkevåg et al., 2013), the cloud fractions for low, medium and high level clouds were calculated as 0.341, 0.187, and 0.318, compared to 0.347, 0.191, and 0.318 from the original CAM4 cloud tuning.Similarly, the stratiform and convective precipitation rates in the offline test simulations were estimated at 1.096 and 1.725 mm d −1 , compared to 1.108 and 1.721 mm d −1 with the original tuning of cloud parameters.Impacts on modelled aerosol properties and direct radiative forcing, as well as on cloud droplet numbers, effective droplet radii, liquid water paths and subsequent indirect radiative forcing are discussed by Kirkevåg et al. (2013).
A consequence of the exaggerated LWP in the model, which is particularly pronounced in the Arctic (Jiang et al., 2012; see also Alterskjaer et al., 2010), was that too little snow melted on Arctic sea ice during summer.To mitigate this, the grain size of cold, old snow overlaying sea ice was increased from 250 µm to 500 µm to lower the cold snow albedo.This gave a reduction in cold snow broadband albedo of the order of 0.01 (Briegleb and Light, 2007), which caused more realistic and earlier onset of Arctic summer melt.

Model experiments
Once the above-mentioned model tuning was set, the fully coupled NorESM was spun up for 700 yr, see Fig. 1 for an illustration of the complete spin-up and model experiment procedure.For the spin-up integration, the atmospheric and ice component was initialised from model restart files available in the public release of CCSM4.The land component was initialised from the model state at 400 yr of a preindustrial CCSM4 experiment with the same grid resolutions as the NorESM experiments described here.The ocean component was initialised with zero velocities and temperature and salinity fields from the Polar science center Hydrographic Climatology (PHC) 3.0 (updated from Steele et al., 2001).
The spin-up integration used aerosol emission and concentrations of greenhouse gases (GHG) consistent with preindustrial conditions defined for year 1850 in accordance with CMIP5 (see also http://cmip-pcmdi.llnl.gov/cmip5/forcing.html,and references therein).The 1850 control run has constant forcings based on an incoming solar flux at the model top of 1360.9W m −2 and a constant CO 2 mixing ratio of 284.7 ppm.Emissions of aerosols and aerosol precursors are as in Lamarque et al. (2010) except for sea salt which is calculated according to surface wind speed and sea surface temperature (SST).After 300 yr of integration, black carbon deposition on snow was activated and a parameterization of the oceanic distribution of salt released during freezing of sea ice was adjusted.These changes had only minor influences on the mean and time evolution of the model state during the remainder of the spin-up.The purpose of the multi-century spin-up was to generate a model climate with limited longterm drift, with thorough ventilation of the upper ocean, and with multiple realisations of the internally generated, interannual to multidecadal variability modes.
The obtained climate state by the end of year 699 of the spin-up was then used as the initial value for a 500 yr long control simulation representative of the preindustrial atmosphere (hereafter piControl), using the same forcings as for the spin-up integration.An identical initial condition was used for a historical simulation incorporating observation-based variations in solar irradiance, volcanic activity, concentration of atmospheric GHG and aerosol and other particles for the time period 1850-2012 (hereafter Historical1).Two additional members with identical forcing protocol to Historical1 were run, starting from year 730 and 760 of piControl (named Histor-ical2 and Historical3, respectively).For the three historical experiments, the forcings are based on observationbased data for 1850-2005 for solar radiation (Lean et al., 2005;Wang et al., 2005), stratospheric sulphate aerosol concentrations from explosive volcanoes (Ammann et al., 2003), as well as anthropogenic GHG concentrations, aerosol emissions (Lamarque et al., 2010), and land-cover changes (http://cmip-pcmdi.llnl.gov/cmip5/forcing.html).They are extended from 2006 to 2012 using the Representative Concentration Pathway (RCP) 8.5 forcing protocol (van Vuuren et al., 2011).
In addition, three idealised forcing experiments were initiated from the end of year 699 of the spin-up, covering the time period 1850-2012.The forcings in these experiments follow those of the three historical experiments, namely observation-based data from 1850-2005 and forcing from the RCP8.5 protocol for 2006-2012.These simulations are the so-called GHG-only experiment, with observationbased greenhouse gases as the only varying forcing field (the remaining constituents were kept fixed as in piControl); Aerosol-only experiment, with aerosol forcing only; and Natural-only experiment, with solar and volcanic forcing only.These experiments are further discussed by Iversen et al. (2013).
Two additional sensitivity experiments were initialised from the end of year 699 of the spin-up.These were the one percent per year increase in the atmospheric concentration of CO 2 (Gradual 4 × CO 2 ; run for a total of 140 yr), and an abrupt quadrupling of the atmospheric concentration of CO 2 (Abrupt CO 2 ; run for a total of 150 yr).In Iversen et al. (2013), these experiments are used to estimate equilibrium climate sensitivity and transient climate response.
Finally, four scenario integrations were initialised from the model state by the end of year 2005 of Historical1.These integrations follow the RCP protocols RCP8.5, RCP6.0,RCP4.5 and RCP2.6, representing "business-as-usual" emissions, two intermediate emission scenarios and a scenario with very strong reductions in the emissions, respectively (see van Vuuren et al., 2011, for an overview of the four RCPs).Of these scenarios, all but RCP4.5 were run until year 2100, whereas RCP4.5 was continued until year 2300.Selected aspects of the simulations based on RCP scenarios are discussed in detail by Iversen et al. (2013).

Model stability
In this section the long-term evolution of the climate state of the NorESM piControl is assessed.Most emphasis will be on the evolution of oceanic quantities due to the large heat reservoir and inertia compared to the other climate system components and the relatively weak direct interaction of SSS with other components.Other quantities considered in some detail are the net radiation of the TOA since it controls the energy balance of the climate system, sea ice area because it is sensitive to ocean drift and has a strong impact on shortwave heat fluxes within the climate system, and gross fluxes in the global atmospheric water cycle because of the importance of the water substance for a wide range of processes in the climate system.Linear trends are estimated by linear regression of annual mean data.Statistical significance is tested using the simple t test with the number of degrees of freedom adjusted to account for autocorrelation according to Eq. ( 31) in Bretherton et al. (1999).We consider a trend with a p value less than 0.05 to be statistically significant.Time series of various global mean quantities from NorESM piControl are shown in Fig. 2. The global mean net radiation at the TOA averaged over the whole control integration is 0.086 W m −2 with a small linear trend of −0.019 W m −2 over 500 yr that is not statistically significant.This radiation imbalance at the TOA causes a steady heating of the earth system.The time-mean of the global mean net heat flux into the ocean is 0.122 W m −2 with a small linear trend of −0.020 W m −2 over 500 yr that is not statistically significant, leading to a clearly manifested drift in the global mean ocean temperature time series.During the 500 yr of integration the global mean ocean temperature increases by 0.126 K.With no geothermal heating of the ocean and no surface mass exchange, the net surface heat flux should fully explain the evolution of the global mean ocean temperature of NorESM.This is indeed the case and confirms good conservation properties of the ocean component.The time-mean TOA net radiation multiplied by the ratio of the earth area to the ocean area is 0.121 W m −2 and thus close to the time-mean net heat flux into the ocean.This, in turn, indicates that the terrestrial and cryospheric heat reservoirs are in near thermodynamic balance during the duration of the control integration and that heat is well conserved in the model.
The linear trend of the SST time series is 0.031 K over 500 yr and thus much lower than the trend of global mean, volume-averaged ocean temperature.Further, the linear trend of global near surface air temperature is 0.037 K over 500 yr.Both the SST and near surface air temperature trends are statistically significant.
For the pre-industrial model spin-up there is a reduction in the global mean SSS from about 34.75 g kg −1 to about 34.50 g kg −1 during 700 yr of integration (not shown).This freshening tendency gradually reduces as the spin-up progresses.In the time series of global mean SSS from pi-Control, a remnant freshening tendency can be seen during the first 200 yr of integration (Fig. 2), with a possible remaining drift likely masked by multidecadal variability thereafter.Comparing global mean SSS for the first 50 and the last 300 yr of the NorESM piControl, there is a reduction of about 0.02 g kg −1 , thus we consider the global mean SSS to be fairly stable throughout the control integration.There is a very small, although statistically significant, linear trend in the time series of global mean salinity of −3.14 × 10 −4 g kg −1 over 500 yr.With no mass exchange through the ocean surface and assuming balanced freshwater surface fluxes and fairly constant sea ice volume, the global mean salinity should remain close to constant.
The time series of Drake Passage net volume transport indicate a slight weakening during the control integration that is most apparent in the first half of the time series.In the pre-industrial spin-up a decreasing tendency of the Drake Passage transport is indeed present (not show), but the tendency gradually reduces during the course of the integration.The linear trend in the time series of Drake Passage transport is −6.29 Sv over 500 yr and is statistically significant.In the pre-industrial spin-up the strength of the Atlantic Meridional Overturning Circulation (AMOC) accelerates during the first 50 yr of integration (not shown) before settling around a mean state with a modest long-term drift.The time series of maximum strength at 26.5 • N of the AMOC has a linear trend of −0.6 Sv over 500 yr that is statistically significant.
Latitude-time Hovmöller diagrams of zonal mean SST and SSS from NorESM piControl are provided in Fig. 3a and b, respectively.The small drift of global mean SST is confirmed here with no particular model drift at any latitudes.The before-mentioned freshening of global mean SSS during the first 200 yr of piControl is evident and manifested fairly uniformly south of 70 • N. below.The last 200 yr of piControl the salinity remains almost constant at all depths except for the very deepest water masses that occupy only a small fraction of the total ocean mass.
Time series of Northern and Southern Hemisphere sea ice extent for March and September is shown in Fig. 4. The summer minimum values are stable without significant trends in both hemispheres, whereas the winter maximum extents decrease during the simulation.The winter trends for the Northern and Southern Hemispheres are 0.15 × 10 6 km 2 and 0.81×10 6 km 2 over 500 yr, respectively, and these trends are statistically significant.
Overall, atmospheric variables have small trends throughout the 500 yr of piControl.As mentioned above, the net radiation at the TOA has a small negative trend that is not statistically significant.The individual components, net shortwave (SW) and long-wave (LW) radiation at the TOA, have mean values of 232.43 W m −2 and 232.33 W m −2 , respectively, with corresponding linear trends of 0.033 W m −2 and 0.052 W m −2 over 500 yr.Cloud characteristics are also stable during the control integration, and the mean total cloud cover is 54.1 % with a linear trend of 0.012 % over 500 yr.Total cloud LWP has a mean of 122.3 g m −2 and a trend of 0.043 g m −2 over 500 yr.Further, the long-term means of short-wave (SWCF) and long-wave cloud forcing (LWCF) are −54.83W m −2 and 30.91 W m −2 , respectively, with linear trends of −0.021 W m −2 and 0.028 W m −2 over 500 yr.None of the trends of atmospheric variables discussed here are statistically significant.
In piControl, annual mean values of the difference between global mean evaporation minus precipitation (E-P ) fluctuate between 0.02 and −0.02 mm d −1 with a linear trend of −0.0040 mm d −1 over 500 yr that is not statistically significant.The long-term mean of E-P is 2.3 × 10 −5 mm d −1 , confirming a very well balanced fresh water budget of the atmosphere.This is consistent with the virtually negligible drift of global mean salinity discussed above, indicating that NorESM conserves the fresh water substance to a large extent.Annual mean values of global mean P fluctuate between 2.80 and 2.81 mm d −1 (i.e.520 × 10 3 to 521 × 10 3 km 3 yr −1 globally) and have a linear trend of 0.0029 mm d −1 over 500 yr that is not statistically significant.
Due to the generally small linear trends of global mean variables, we have not subtracted the trend of NorESM pi-Control from other NorESM experiments in any of the subsequent analyses.For analyses that are sensitive to the time evolution of the deep ocean temperature, e.g.studies of sea level change, we do recommend taking into account the longterm trend in ocean temperature.

Mean model state
In the evaluation of the NorESM mean state, the majority of the analysis is from the Historical1 experiment and considering means over years 1976-2005.During this time period the observational coverage of several components of the climate system is good and includes satellite measurements.With a 30 yr averaging period, the influence of internal model variability up to decadal timescales is expected to have a modest influence on the assessment of the model mean state.An exception is the analysis of the gross cycling of fresh water (Table 2) using means for the years 2000-2005 of the Historical1 experiment to be more consistent with corresponding mean values from observational synthesis and atmospheric reanalysis covering the years 2002-2008.Further, the mean ocean meridional overturning circulation (MOC) is from a 30 yr period of the piControl experiment.(Gent et al., 2011;years 1990-2000), ECMWF ERA-Interim reanalysis (years 2002-2008), and observationally based estimates (years 2002-2008).The three latter sets of numbers are provided by Trenberth et al. (2011).Quantities marked with an asterisk (*) are, for NorESM only, estimated using E = P globally integrated.See also Table 5 in Iversen et al. (2013).

Heat budget considerations and surface temperature
P GLOBAL (E-P ) OCEAN E OCEAN E LAND * P LAND * P OCEAN * 10 3 km 3 yr −1 10 3 km 3 yr −1 10 3 km 3 yr −1 10 3 km 3 yr −1 10 3 km 3 yr −1 10 3 km 3 yr clear-sky SW and LW flux at TOA are within and slightly below the observational range, respectively, while the associated TOA SW and LW cloud forcing are slightly below and within the observational range, respectively.The apparently small biases in the TOA cloud forcing are in contrast to the clear underestimation of total cloud cover as mentioned in Sect.3.1, probably because the underestimated cloud cover is compensated by an overestimated LWP (Jiang et al., 2012).In Fig. 5 the annual mean sensible and latent heat fluxes from NorESM Historical1 are compared to the FLUXNET Model Tree Ensembles (MTE) estimates (Jung et al., 2011).FLUXNET-MTE estimates are restricted to vegetated land surface, and this is the reason why no fluxes are estimated for the desert zones.The NorESM simulated annual mean sensible heat flux (Fig. 5a) is in the same range as the FLUXNET-MTE estimations (Fig. 5b).As seen in Fig. 5c, NorESM underestimates sensible heat flux in most of the African continent south of Sahara, on the west coast of India, in Australia, and in the western part of the United States.The model overestimates sensible heat flux in the extreme eastern part of South America.Comparing NorESM and FLUXNET-MTE estimates, the root mean square error (RMSE) normalized by the standard deviation of the FLUXNET-MTE estimate is 1.01 and 0.65 for sensible and latent heat flux, respectively, and the spatial correlations are 0.52 and 0.82 for sensible and latent heat flux, respectively.Thus, from the distribution point of view, the simulation of annual mean latent heat flux (Fig. 5d) compares better with the FLUXNET-MTE estimate (Fig. 5e). Figure 5f show that NorESM generally overestimates latent heat fluxes compared to FLUXNET-MTE, but with clear underestimations in the extreme eastern part of South America.As listed in Table 1, the global mean surface sensible heat flux for the years 1976-2005 of Histor-ical1 is 17.8 W m −2 and within the observational range of 13.2-19.4W m −2 , while the global mean surface latent heat flux of 81.7 W m −2 is slightly below the observational range of 82.4-89.1 W m −2 , which is in contrast to the general overestimation compared to FLUXNET-MTE.
Figure 6 shows the difference in air temperature over land at reference height (2 m) above the ground surface between the NorESM Historical1 experiment and the observational data set TS3.1 from the Climatic Research Unit (CRU) for the years 1976-2005.The model generally underestimates the temperature over the continents with a mean difference of −1.09 K.For the same experiment and time period, the SST bias is only −0.15 K (see Sect. 5.5 below).There are notable exceptions over South America and in western parts of Eurasia (including Europe) where there are overestimates.Thus, NorESM produces a slightly too cold surface climate and is colder compared to the last few decades of 20th century experiments with CCSM4 (Gent et al., 2011).One possible candidate that may account for a considerable part of this difference is the inclusion of the aerosol indirect effect in NorESM.Furthermore, there is a possibility that the model slightly overestimates the cooling by the aerosol direct effect since there appears to be a small but ubiquitous overestimate of aerosol loads in the upper free troposphere (Kirkevåg et al., 2013).As discussed by Iversen et al. (2013), clouds contribute to a small negative gross feedback which thus dampen the simulated 20th century temperature increase.Also, the model overestimates the Arctic cloudiness and the summer-season snowmelt is probably too slow.Combined with slightly too weak winds across the polar basin, this leads to too thick sea ice in the polar oceans adjacent to the Eurasian continent.The summer sea ice extent in the Arctic is too large (Fig. 4), and this contributes to underestimated global temperatures.Note that the global pattern of this underestimate (see Fig. 6) reflects dynamical factors such as changed occurrence of modes of variability or flow regimes (Palmer, 1999;Branstator and Selten, 2009) and geographically determined feedbacks in the climate system associated with strong interactions between the atmosphere and the ground surface (e.g.sea ice and snow cover), as discussed by Boer and Yu (2003).Hence, given that there is a slightly too cold climate, it is natural that the amplitude is larger over continents than the ocean (e.g. the cold-ocean/warmland pattern, Wallace et al., 1996) and at high latitudes.(Mitchell and Jones, 2005) observational data set for the same period interpolated to the same grid using conservative remapping.Global bias error is −1.0868K with a RMSE of 2.347 K.

The water cycle
The cycling of fresh water profoundly moderates and modulates the earth's climate.The water substance is present in all three phases in the atmosphere and is an important vehicle for cycling of energy between the atmosphere and the other compartments of the earth system.Clouds are important for the radiation budget both for visible and terrestrial infrared radiation.Precipitation amounts and types, as well as the absence of precipitation, are weather parameters with profound impacts on nature and society.The release of latent heat when water vapour condensates in the atmosphere is a source of energy that tends to feed back positively on the dynamic processes responsible for triggering precipitation.It is therefore unfortunate that many aspects of the water cycle are difficult to simulate accurately in climate models (e.g.Meehl et al., 2007), mainly due to the small spatial scales involved and the intricate interaction between micro-and macrophysics in clouds.Hydrological processes on the land surface represent another source of complexity.Finally, snow and ice on land  Wentz, 1997;O'Dell et al., 2008).Zonally averaged boreal winter (DJF) estimated annual precipitation of NorESM compared the data from GPCP and Legates (Spencer, 1993;Legates and Willmott, 1990) , and ( and oceans strongly modulate the absorption and reflection of solar radiation in the earth system and represent a major source of feedback. Following Trenberth et al. (2011), E and P integrated over oceans are key quantities to diagnose gross properties of the earth's water cycle, given that the global totals of the same quantities are equal.The fraction (P /E) OCEAN defines the recycling of fresh water over the oceans, and is a (simple) measure of the atmospheric residence time of a bulk part of the water vapour provided to the atmosphere.The remaining difference (E-P ) OCEAN is the net transport of water vapour from oceans to the earth's continents.The global average precipitation in NorESM is slightly more than 2.80 mm d −1 (or about 521 × 10 3 km 3 yr −1 ), which is 0.13-0.14mm d −1 or 4.9 % too large if compared to the Global Precipitation Climatology Project (GPCP) data (Adler et al., 2003), but this value was claimed to be an underestimate by Trenberth et al. (2007).Taking the observational synthesis of Trenberth et al. (2007Trenberth et al. ( , 2011) ) to be the best estimate of the true atmospheric cycling of water and water vapour, NorESM overestimates the oceanic evaporation by about 4 % and the flux of water vapour from ocean to land (and hence also the return flow of water from land to ocean) by about 8 %.The recycling of oceanic fresh water (P /E) OCEAN is only slightly underestimated (0.4 %), indicating that while the intensity of the water cycle is slightly on the high side, the atmospheric residence time of oceanic water vapour is quite correct.In comparison, CCSM4's estimate of the flux of water vapour from ocean to land is very close to the observationally based estimate by Trenberth et al. (2011).However, the oceanic evaporation in CCSM4 is, also according to the estimates of Trenberth et al. (2011), exaggerated by 8 %, while the re-cycling is overestimated by almost 1 % in that model.It is believed that these differences between NorESM and CCSM4 are linked to aerosols and the tuning of cloud properties.In CCSM4, a larger fraction of solar radiation reaches the ground and the precipitation autoconversion is faster.The values from the ERA-Interim reanalysis www.geosci-model-dev.net/6/687/2013/Geosci.Model Dev., 6, 687-720, 2013 are closer to CCSM4, but it should be borne in mind that the global water cycle is not conserved in those data since (E-P ) GLOBAL = 8 × 10 3 km 3 yr −1 , which is almost 2 % off equilibrium.
Figure 7 confirms that NorESM in general underestimates cloudiness.As found in Table 1, the global mean value for the years 1976-2005 of Historical1 gives a bias of −23.88 % with a RMSE of 25.74 % when compared to the CLOUD-SAT data, and a bias of −13.04 % with a RMSE of 17.39 % compared to ISCCP.Exceptions include very high latitudes where there are too much clouds, and this is most pronounced for the winter and summer months in both hemispheres (not shown).At the same time, the liquid water in clouds is generally exaggerated, although Fig. 7 is only valid over oceans due to the nature of the SSMI retrievals.The exaggeration is particularly pronounced in the extratropical storm track regions, a result further corroborated by Jiang et al. (2012).We have no good explanation for these bias errors concerning the clouds, which were already present in the original model version CCSM4 (Gent et al., 2011).Still it is a fair conclusion that NorESM is slightly improved compared to CCSM4.There are more clouds in NorESM and the overestimated LWP is about 5 % lower than for CCSM4.The prognostic cloud droplet scheme is directly linked to CCN-activation from the online aerosols, and permits tuning of the precipitation autoconversion to yield improved gross properties of the cycling of fresh water in the model (Table 2).
To assess the quality of the simulated precipitation climatology, we compare it with the GPCP, the Tropical Rainfall Measuring Mission 3B43 product (TRMM, Huffman et al., 2007) and the merged Legates precipitation data (Legates and Willmott, 1990;Spencer, 1993).
Figure 7e and f show the zonal mean precipitation during the boreal winter (DJF) and summer season (JJA), respectively.The model captures the main latitudinal distribution with a precipitation peak near the equator, secondary peaks associated with the winter midlatitude storm tracks and minima for the drier subtropics.However, the magnitude of tropical precipitation is clearly overestimated compared to GPCP.Although the comparison with the Legates data set gives somewhat conflicting results, the excessive tropical precipitation in NorESM is also evident when compared to TRMM (Fig. 8a and b).
Figure 8 shows that a main feature of the tropical bias is related to the double Intertropical Convergence Zone (ITCZ) problem that has long been a common error among climate models (Lin, 2007).The simulated precipitation is clearly characterized by a double ITCZ structure over the central Pacific and equatorial Atlantic which is not found in the TRMM observations (Fig. 8a and b).This bias is also associated with too much precipitation variability at the same locations (Fig. 8c and d).
In order to compare the simulated precipitation with that of CCSM4, the bias relative to GPCP is shown in Fig. 7b (global bias of 0.198 mm d −1 with a RMSE of 1.23 mm d −1 ) and can be compared to the discussion by Gent et al. (2011) around their Fig. 5. Compared to the 2 • version of CCSM4, the NorESM does not have a significant negative bias along the equator in the Pacific.In that respect its climatology is more similar to the 1 • version of the CCSM4, although the spatial resolution is 2 • in NorESM.Another distinct feature of the NorESM is that the ITCZ-related bias is more confined to the Northern Hemisphere, while the bias in CCSM4 is more symmetric about the equator.
One possible contributor to reducing the bias in NorESM compared to CCSM4 is the aerosol direct and indirect effect in NorESM.Earlier work suggests that aerosol forcing may cause a southward displacement of the ITCZ (Rotstayn and Lohmann, 2002;Kristjansson et al., 2005).This effect is further discussed by Iversen et al. (2013), but as discussed already by Kirkevåg et al. (2008a), this impact on ITCZ by aerosols may cause co-influence on tropical precipitation by the GHG and aerosols separately, which tends to reduce (but not remove) the double ITCZ feature.Some other notable tropical biases are shared with CCSM4.There is a characteristic dipole bias in the Indian Ocean and excess precipitation over the African continent and along the western coast of South America (Fig. 7b).The latter bias is connected to too high orographic precipitation rates in the Andes (Cook et al., 2012).There is also a significant negative precipitation bias in the Atlantic Ocean at around 10 • N-20 • N which is co-located with the cold SST bias shown in Fig. 12b.The midlatitude precipitation biases in the North Atlantic can also be seen to be closely associated with biases in the simulated SSTs.

Zonal mean atmosphere
Figure 9 shows the zonally averaged mean atmospheric temperature for the Historical1 simulation of the NorESM and its bias relative to the ERA40 reanalysis (Uppala et al., 2005).The air temperature in the troposphere is in general too cold for both boreal summer and winter.This general feature is also confirmed with other reanalysis and satellite data (not shown).In the lower polar troposphere during DJF, there is a notable positive bias over the Arctic and a negative one over the Antarctic.Although this may be related to the overestimation of the polar cloud cover, it should be emphasized that the reanalyses themselves have significant biases over the data sparse polar regions (e.g.Bromwich et al., 2007).Near the polar tropopause there is a relatively large cold bias of magnitude 7-8 K (up to 10 K over Antarctica during DJF).A cold bias in this region is also prominent in the CMIP5 ensemble mean (Charlton-Perez et al., 2013).
The simulation of the zonal mean zonal wind and bias with respect to the NCEP reanalysis (Kalnay et al., 1996) are shown in Fig. 10.Both the strength and seasonal migration of the Northern Hemisphere tropospheric jets can be seen to be well captured by the model, although the subtropical and eddy-driven jets are slightly more separated during DJF.In the Southern Hemisphere the subtropical jet is too strong during DJF with an eastward bias of 4-6 m s −1 .There is also a notable eastward bias over the equator in the upper troposphere.In the stratosphere, the polar-night jets in both hemispheres are shifted slightly polewards relative to the reanalysis.

Sea ice mean state
Figure 11 shows mean sea ice thickness for the period 1976-2005 of Historical1, including observation-based estimates of the sea ice extent .The simulated sea ice extent is not shown in the figure but is approximately the area with ice thicker than 0.05 m coloured in the figure.In accordance with Fig. 4, the sea ice extent is underestimated during winter and overestimated during summer in the Northern Hemisphere.In the Southern Hemisphere both winter and summer extent are overestimated, but this is less pronounced than in the Arctic.The underestimation in March in the Northern Hemisphere is mostly due to too little ice in the Labrador Sea, consistent with the warm SST bias found in that area (Fig. 12b).The ice extent during summer is larger than observed in the Northern Hemisphere.The modelled geographic distributions of ice are close to observations, but the main overestimations are in Baffin Bay and the Kara Sea.
The thickest ice found in the simulations is north of Greenland and Ellesmere Island, in agreement with observational climatologies (Rothrock et al., 2008).Also, in the central Arctic, the thickness is comparable with estimates based on submarines from the late 1970s (Rothrock et al., 2008;Kwok and Rothrock, 2009) not realistic (above 5 m).Satellite estimates from 2006-2007 (Kwok and Cunningham, 2008) give values near the North Pole of 2-2.5 m, while thickness near the East Siberian coast is 1-2 m.Clearly, the modelled ice is too thick compared with these estimates.However, as discussed by Kwok and Rothrock (2009), there was a considerable loss of Arctic ice volume after year 2000.The modelled central Arctic sea ice thickness therefore seems to be more similar to that observed during the 1970s than after 2005.The Antarctic sea ice thickness shown in Fig. 11 is comparable with observations (Worby et al., 2008) in large regions, with thin first-year ice with thickness less than 1 m over large regions, and with the thicker ice in the western Weddell Sea, close to the coast.Spring values reported by Worby et al. (2008) indicate mean thickness of 0.89 m and 1.33 m in eastern (sector 45 • W-20 • E) and western (sector 60 • W-45 • W) Weddell Sea, respectively, while modelled mean values are 0.5-1.5 m in the eastern, and from 1.5 m to more than 4 m in the western Weddell Sea, respectively.Thus, modelled ice is too thick in the thickest regions and does not melt during summer, consistent with Fig. 4.   of the Southern Ocean, and in the subpolar North Atlantic.Negative SST biases are mainly found at around 20 • S and 20 • N in the Atlantic Ocean and in a zonal band around 20-30 • N in the Pacific Ocean.Except for the bias in the subpolar North Atlantic, the pattern of differences has many similarities with recent CCSM4 experiments (Gent et al., 2011).They find that particularly the bias in the upwelling regions is reduced with increasing atmospheric resolution since the higher resolution atmosphere allows for more realistic wind patterns near the coast, favouring stronger upwelling, thereby reducing SST directly and indirectly through cloud feedback (Gent et al., 2010).

Ocean mean state
The global mean SSS difference is −0.15 g kg −1 with a RMSE of 0.83 g kg −1 .Negative SSS biases cover most of the low latitudes with small biases in the Southern Ocean and mostly positive biases north of 40 • N. Large positive biases of more than 2 g kg −1 cover most of the Arctic Ocean with even larger differences on the Siberian shelf.
The SSH model climatology, with a RMSE of 0.16 m compared to an observational estimate, indicates that the model captures the general aspects of the large-scale circulation.Positive SSH biases are found in large portions of the Indian Ocean, and negative biases cover most of the Atlantic with large negative biases (< −0.35 m) in the north-western North Atlantic.
A 30 yr mean of the global ocean meridional overturning circulation (MOC) from the NorESM piControl is shown in Fig. 13a.A Deacon cell with a strength of about 25 Sv is present at 60 • S-40 • S.This cell is not present when the MOC is presented as a function of potential density (not shown), consistent with the findings of Döös and Webb (1994).There is an abyssal counterclockwise circulation with maximum strength at 50 • S-30 • S in excess of 10 Sv.The prominent clockwise circulation with maximum strength at 30 • N-40 • N is due to the AMOC that is shown separately in Fig. 13b.The time series of the maximum annual mean AMOC strength at 26.5 • N in the NorESM piControl experiment is shown in Fig. 2 and has a time-mean of 30.8 Sv.This is in the upper range of AMOC strengths found in models contributing to phase 3 of the Climate Model Intercomparison Project (CMIP3) (Medhaug and Furevik, 2011) and well above an estimate of 17.4 Sv of the AMOC strength at 26.5 • N using observations for the years 2004-2011 (Srokosz et al., 2012).It is not clear what causes the vigorous AMOC intensity in this NorESM experiment.The counterclockwise abyssal circulation of Antarctic Bottom Water (AABW) in the Atlantic basin is not very prominent in the simulation.The MOC of the NorESM Historical1 experiment is very similar to that of the pre-industrial control.
Zonal means of potential temperature and salinity from NorESM Historical1 years 1976-2005 is compared to WOA09 observational estimates in Fig. 14.In the global zonal mean potential temperature difference (Fig. 14a), the largest positive biases are found in the upper ocean (< 500 m depth) in latitude bands 60 • S-20 • S and 10 • N-60 • N and around 60 • N covering the whole depth.Below 1000 m there are large regions with temperature biases larger than 0.3 K. Cold biases in the global zonal mean are prominent in the depth range 200-1000 m south of 50 • N. North of 60 • N there is a negative bias above 500 m depth and below 2000 m with a warm bias in between.Compared to the global zonal mean, the Atlantic zonal mean potential temperature biases (Fig. 14b) are much larger and likely explain some of the differences seen in the former.Except from a prominent cold bias in the depth range 200-1000 m and latitude range 40 • S-50 • N, there is an overall large warm bias north of 40 • S in the Atlantic domain, reaching above 2 K in considerable regions.In the Southern Ocean south of 40 • S there is a warm bias below 1500 m and a cold bias above.The overall picture of a fresh and saline bias of the upper and deep ocean, respectively, seen in Fig. 3d     observations (Fig. 14d) is very similar to that of potential temperature.
The strong warm and saline bias of the Atlantic Ocean at depth is likely associated with the vigorous AMOC circulation of NorESM, where relatively warm and saline surface water masses are brought to depth in the northern North Atlantic at an excessive rate and thereafter propagating southwards at depth.We also find in the model that the The cold bias in the depth range 200-1000 m seen in Fig. 14a and b indicates that the thermocline depth in NorESM is shallower than in observations.According to Munk (1966) 2012) compare climate model experiments that only differ in the choice of ocean components, which is either a z coordinate model or a model with interior isopycnic layers, the latter of similar type of that used in NorESM.Both comparisons indicate a shallower than observed thermocline depth with isopycnal models and deeper than observed depth with z coordinate models.This is attributed to less diapycnal mixing in the isopycnic models compared to the z coordinate models.Although NorESM share this thermocline depth bias with other climate models featuring isopycnic ocean components, it is not clear that unrealistic weak diapycnal mixing is causing the shallow thermocline depth in NorESM since in particular the strong AMOC might contribute to excessive deep water formation.

Meridional heat transport
In Fig. 15a the meridional heat transport of NorESM Histori-cal1 for the years 1976-2005 is compared to estimates by Fasullo and Trenberth (2008) (FT08 hereafter).In FT08 there is almost no meridional heat transport across the equator, while in NorESM there is a northward heat transport of 0.6 PW carried by the ocean.In both hemispheres, NorESM slightly underestimates the total transport compared to FT08, with largest differences in the Southern Hemisphere.In Fig. 15b the ocean heat transport is split up in contributions from the Atlantic Ocean and Pacific and Indian Ocean combined.Compared to FT08, NorESM carries more heat northward over the whole extent of the Atlantic Ocean with most pronounced differences between 20 • N and 60 • N that might be attributed to the strong simulated AMOC.The strong ocean heat transport in this latitude band seems to be compensated by a weaker atmospheric transport compared to FT08.The ocean southward transport between 65 • S and 30 • S is also stronger than indicated by FT08.Further, the ocean heat transport deviation from FT08 from the equator to 15 • S is mainly due to weaker southward transport in the Pacific and Indian Ocean.Fasullo and Trenberth (2008), where the CERES data set is used for TOA terms.

Simulated internal variability
In this section aspects of the internal variability of the NorESM piControl and historical ensemble experiments are discussed.

Tropical variability
The El Niño-Southern Oscillation (ENSO) is a coupled ocean-atmosphere phenomenon that has a major impact on the climate variability of the tropical Pacific on seasonal to interannual timescales (Wallace et al., 1998), but also with a strong influence on the global scale (Trenberth et al., 1998).The monitoring of ENSO is commonly summarised by indices involving tropical Pacific SST anomalies or sea level pressure (SLP) differences.To investigate the representation of ENSO in NorESM, the detrended monthly SST anomalies of the NINO3.are SST anomalies from the corresponding years of the NorESM piControl.The standard deviation of NorESM His-torical1 and piControl are 0.92 K and 0.86 K, respectively, and both are larger than the standard deviation of HadISST of 0.75 K.
Figure 17 shows the power spectra of the normalized time series of where the correlation is not significantly different from vanishing correlation at the 95 % confidence level.In the estimation of confidence intervals, the number of degrees of freedom is adjusted due to autocorrelation according to Quenouille (1952).
contribute to this difference in decadal and longer variability, although a more careful analysis is required to establish this.
The correlation of NINO3.4SST anomalies with global SST anomalies for HadISST and NorESM (Fig. 18) reveals a too narrow meridional width of the correlation in the tropical Pacific east of 160 • E. The U-shaped pattern of negative correlation in the Pacific is seen in both HadISST and NorESM but less pronounced in the model.NorESM has positive correlations in the western part of the Indian Ocean although weaker than in HadISST.The positive correlation seen in HadISST in the north-eastern Indian Ocean, extending into Gulf of Thailand and South China Sea, is missing in NorESM.The patterns of correlation in the Atlantic Ocean are quite similar in both HadISST and NorESM.In Gent et al. (2011), similar correlation maps are shown for CCSM3 with T85 atmospheric resolution and CCSM4 with 1 • atmospheric resolution.Overall, CCSM4 compares favourably to observations compared to both CCSM3 and NorESM and many of the discrepancies of NorESM discussed above also hold for CCSM3.The dominant power of the NINO3.4index on 2-4 yr timescale in NorESM falls between CCSM3 and CCSM4, with dominant variability in the range 1.75-3 yr and 3-6 yr, respectively (Deser et al., 2012).
The Madden-Julian oscillation (MJO) is the dominant mode of intraseasonal (30-90 days) variability in the tropical atmosphere (Madden and Julian, 1971;Zhang, 2005).It is characterized by large-scale regions of enhanced and suppressed convection coupled to circulation anomalies that together propagate slowly eastward along the equator.The MJO interacts with several large-scale climate phenomena including El Niño events (Hendon et al., 2007), the onset and break of the Indian summer monsoon (Annemalai and Slingo, 2001), formation of tropical cyclones (Liebmann et al., 1994) and its influence also extends to the NAO and extratropical variability (Cassou, 2008).A realistic simulation of the MJO is therefore important to improve climate prediction of such phenomena.
General circulation models (GCMs) have long struggled to realistically simulate the basic features of the MJO.Lin et al. (2006) found that only 2 out of 14 GCMs participating in the IPCC AR4 had MJO variance comparable to observations.However, Subramanian et al. (2011) have shown that the representation of the MJO in the CCSM4 model, with a new convective parameterization scheme (in CAM4), is significantly improved.To evaluate the MJO in the NorESM and to facilitate comparison with other GCMs, we use the diagnostic tools proposed by Waliser et al. (2009).The complete set of recommended diagnostics is beyond the scope of this paper so only the wavenumber-frequency spectra and coherence figures for boreal winter are presented here.
The wavenumber-frequency of the NCEP 850 hPa zonal wind (Fig. 19a) is characterized by maximum energy for zonal wavenumbers 1-3 over periods of 30-80 days, with a dominant peak for zonal wavenumber 1 and period of approximately 60 days.The NorESM contains significant eastward-propagating energy in the 30-80 day band for wavenumber 1 (Fig. 19b).However, its energy is spread over more frequencies and peaks at a longer period.Similarly, for outgoing long-wave radiation (OLR) the energy in the NorESM is spread over more frequencies and the peak is slightly weaker compared to the observed power spectra (Fig. 19c and d).
Cross-spectral calculations are performed to quantify the coherence and phase lag between equatorial OLR and 850 hPa zonal winds.Figure 20a shows that observations exhibit a high degree of coherence for wavenumbers 1-3 and a phase lag of approximately 90 • in the 30-80 day band.NorESM also exhibits strong coherence for wavenumber 1 in the 30-80 day band, although it peaks at slightly higher frequencies compared to observations.This characteristic was also observed in CCSM4 by Subramanian et al. (2011) and may suggest more convectively coupled linear Kelvin wave activity in the model than observations.Similarly to CCSM4, the NorESM also fails to simulate the strong coherency for wavenumbers 2-3 found in observations.Despite such shortcomings, these diagnostics show that NorESM is able to produce coherent eastward-propagating waves in the intraseasonal zonal winds and OLR over the tropical Indian and Pacific Oceans.NCEP (1979NCEP ( -2008) ) and (b) NorESM (1976NorESM ( -2005)), and daily OLR fields of (c) NOAA satellite OLR (1979OLR ( -2008) ) and (d) NorESM (1976NorESM ( -2005)).Individual spectra were calculated for each year and then averaged over all years of data.Only the climatological seasonal cycle and time mean for each November-April segment were removed before calculation of the spectra.Units for the zonal wind (OLR) are m −2 s −2 (Wm 2 s −1 ) per frequency interval per wavenumber interval.The bandwidth is (180 day −1 ).84 Fig. 19.November-April wavenumber-frequency spectra of 10 • S-10 • N averaged daily zonal 850 hPa winds of (a) NCEP (1979NCEP ( -2008) ) and (b) NorESM (1976NorESM ( -2005)), and daily OLR fields of (c) NOAA satellite OLR (1979OLR ( -2008) ) and (d) NorESM (1976NorESM ( -2005)).Individual spectra were calculated for each year and then averaged over all years of data.Only the climatological seasonal cycle and time-mean for each November-April segment were removed before calculation of the spectra.Units for the zonal wind (OLR) are m −2 s −2 (W m 2 s −1 ) per frequency interval per wavenumber interval.The bandwidth is (180 day −1 ).

Annular modes
The Northern and Southern Annular Modes (NAM and SAM) are the dominant patterns of variability in the extratropical atmosphere on intraseasonal to interdecadal timescales (Thompson and Wallace, 2000).They represent north-south variations in the extratropical zonal wind with centres of action located at approximately 55-60 • and 30-35 • latitude.The NAM and SAM are largely zonally symmetric, although NAM is more regionally confined to the Atlantic sector.
The NAM is defined as the first empirical orthogonal function (EOF) of the Northern Hemisphere (20-90 • N) winter SLP anomalies.Figure 21 shows the leading EOF for Historical1 and NCEP-2 data (Kanamitsu et al., 2002).The leading EOF in the NorESM simulation can be seen to closely resemble the NAM structure in the NCEP-2 data but with a few notable exceptions.The simulated NAM has larger amplitudes over the Arctic and North Pacific and the Atlantic centres of action are shifted to the east.This likely explains why the simulated NAM represents 36 % of the hemispheric variance compared to the observed 23 %.These main biases in the simulated NAM are very similar to those documented by Hurrell et al. (2006) for the CAM3 model.The more symmetric zonal structure of the simulated NAM and stronger teleconnection between the Pacific and Atlantic are biases that have been found earlier in many models (Osborn, 2004).The SAM is defined as the first EOF of Southern Hemisphere (20-90 • S) monthly mean 850 hPa geopotential height data.The spatial structure of the simulated SAM is well simulated with only small differences in amplitude (Fig. 21).The represented variances of the leading EOF of the Southern Hemisphere for NorESM and NCEP-2 are 23 % and 26 %, respectively.

Atlantic variability
The AMOC is a major branch of the global thermohaline circulation and the associated heat transport is important for the climate of the North Atlantic, Nordic seas, and Arctic Ocean and their adjacent continental regions.Observationbased time series of the AMOC strength is only available since year 2004 (Srokosz et al., 2012), and thus the simulation of this circulation in climate models is still important for the understanding of decadal and longer timescale variability.Medhaug and Furevik (2011) found a large spread in both mean value and variability of the AMOC strength in models contributing to CMIP3.
The time series of maximum AMOC strength at 26.5 • N in piControl is shown in Fig. 2. The standard deviation of the detrended time series is 0.81 Sv and the associated power spectrum is displayed in Fig. 22.Compared to a fitted red noise process, there is more power on timescales longer than about 25 yr.The power spectrum of maximum AMOC strength at 45 • N shows a peak with a timescale of approximately 20 yr but with less pronounced multidecadal variability compared to 26.5 • N. The evolution of the AMOC strength in the NorESM historical ensemble members and the RCP scenarios is discussed in Iversen et al. (2013).Delworth and Mann (2000) argue for a multidecadal variability with an approximate timescale of 70 yr in proxybased reconstructions of surface temperature.This variability is most clearly manifested in the North Atlantic region but with some degree of global expression, and is often referred to as the Atlantic Multidecadal Oscillation (AMO).Several indices of AMO exist in the literature.The AMO index as defined by Sutton and Hodson (2005) 23c reveals power around a period of 20 yr that is significantly larger than expected from a red noise process.In contrast to Delworth and Mann (2000), we do not find any variability with timescales longer than 30 yr that significantly stands out compared to red noise.The 20 yr peak was also present in the time series of maximum AMOC strength at 45 • N, discussed above, and is also evident in a subpolar gyre SSH index shown in Fig. 23c.Thus, it seems likely that the variability with an approximate timescale of 20 yr is associated with northern North Atlantic processes possibly involving the subpolar gyre.Variability in the North Atlantic with similar timescales has been documented in several studies of climate proxies, observations, and climate model simulations (e.g.Frankcombe and Dijkstra, 2009;Frankcombe et al., 2010;Chylek et al., 2011).Further, Escudier et al. (2013) attributed a 20 yr cycle found in the IPSL-CM5A-LR model to a coupled oscillatory mode involving propagation of temperature and salinity anomalies in the subpolar gyre, sea ice changes in the Nordic seas, and changes to the strength of the East Greenland Current across the Denmark Strait due to modified regional atmospheric circulation.Interestingly, the peak at 20 yr in NorESM piControl is neither present in any of the NorESM historical ensemble members (Fig. 23d), nor in the HadISST-based index (Fig. 23e).Possible explanations are that variability or trend in the external forcing of the historical experiments either disrupts some feedback mechanism causing the signal in the control simulation or masks the 20 yr variability by increasing the variance of North Atlantic climate indices.A thorough analysis of the power peak at 20 yr in several time series of the NorESM piControl and the apparent lack of it in the historical experiments is beyond the scope of this study.An alternate AMO index defined by Trenberth and Shea (2006), where the SST anomalies between 60 • S and 60 • N are subtracted from the North Atlantic anomalies, did not change the main findings of the above discussion.

Modelled climate evolution of the 20th century
A comparison between observed and simulated 2 m air temperature (T2m) for the globe and poleward of 60 • N is provided in Fig. 24.On a global scale, the three historical members follow observed temperature rather closely.Linear trends for the period 1911-2010 are 0.077 and 0.063 K decade −1 for the observed (HadCRUT4) and the mean of the three historical NorESM members, respectively.The corresponding trend figures for the period 1961-2010 are 0.14 K decade −1 for both observations and model.The simulated magnitude of the recent global warming signal is thus captured by the model.
Poleward of 60 • N, the linear trends for the last century are now 0.11 and 0.14/0.13/0.13K decade −1 based on observations and Historical1/Historical2/Historical3, respectively.For the last 50 yr, the corresponding trends are 0.33 and 0.40/0.16/0.20 K decade −1 .There is thus a tendency that the simulated recent warming at high northern latitudes is on the weak side compared to HadCRUT4, although Histori-cal1 shows a somewhat stronger warming than the observed one (cyan line in Fig. 24).It is therefore hard to conclude based on this comparison whether the model simulates the observed high-latitude warming realistically.An important factor in this respect is the weak signal-to-noise ratio of simulated surface temperature trends at high northern latitudes (Räisänen, 2001;Sorteberg et al., 2005), potentially masking the warming signal for decades.
The weak high-latitude signal-to-noise ratio is also a possible candidate for the failure of the three historical members to simulate the so-called early warming signal of the 1920s to the 1940s (e.g.Delworth and Knutson, 2000; see lower panel of Fig. 24).A growing body of evidence indicates, however, that the early warming signal can, at least partially, be explained as an interaction between internal generated variability and external forcings (e.g.Otterå et al., 2010;Ting et al., 2011;Booth et al., 2012).The last hypothesis may indicate that NorESM's climate state, likely in the Atlantic-Arctic sector, prohibits the onset of feedback processes needed to generate high-amplitude variations at high northern latitudes.Candidates for the latter include ventilation of the subpolar North Atlantic (Hátún et al., 2005;Häkkinen and Rhines, 2004), the poleward transport of heat towards the Arctic    Ocean ( Årthun et al., 2012) or localised atmosphere-ocean interactions in the Atlantic sector of the Arctic (Bengtsson et al., 2004).
Figure 25 shows the time evolution of sea ice extent from the historical ensemble (1850-2005) compared with satellite-based estimates from the Sea Ice Index (NSIDC, Fetterer et al., 2009) in black, which are mainly based on the NASA Team algorithm, and an alternative series in grey (Comiso, 1999) based on the Bootstrap algorithm.Clearly, the strong reduction in Arctic summer ice extent observed in the satellite records is not captured in the historical ensemble.Over the years 1979-2005, the trends in September extent are −0.59 and −0.29 mill km 2 decade −1 for the observed (NSIDC) and NorESM historical ensemble, respectively.This shows a delayed melting of sea ice in NorESM during global warming compared with observations.Given that the trend in temperature poleward of 60 • N is fairly realistic (Fig. 24), the sea ice is not sensitive enough to the temperature increase.As discussed earlier, the ice thickness shown in Fig. 11 is too thick in the east Arctic, north of the Siberian coast, and it is therefore too resistant to summer melt.This thickness maximum is mainly a result of biases in the surface winds.On the other hand, the general thick Arctic sea ice in the model is a result of too little summer melt of snow in the model, giving too little surface melt of the ice.(Fetterer et al., 2009), and in gray data based on the Bootstrap algorithm (Comiso, 1999).(Fetterer et al., 2009), and, in grey data, based on the Bootstrap algorithm (Comiso, 1999).

Summary
The Norwegian Earth System Model in its main grid configuration but without interactive carbon cycle, denoted NorESM, is presented, together with a first-order assessment of the model stability, the mean model state and the internal variability.Further analysis of the model performance is provided in Iversen et al. (2013), presenting CMIP5-type of climate response and scenario projections made with NorESM.Two additional versions of NorESM are available, but not discussed here: a low-resolution version particularly suited for millennium-scale climate studies (Zhang et al., 2012) and the model with interactive carbon cycle (Tjiputra et al., 2013).
NorESM is based on the CCSM4 (Gent et al., 2011;Vertenstein et al., 2010), but differs from the latter by, in particular, an isopycnic coordinate ocean model (see Sect. 2.4) and chemistry-aerosol-cloud-radiation interaction schemes developed for the Oslo version of the Community Atmosphere Model (CAM4-Oslo; see Sect.2.1 and Kirkevåg et al., 2013).
As shown in Fig. 2 and elaborated in Sect.4, the long-term model drift in NorESM is generally small, exemplified with linear trends for the 500 yr long piControl of global mean surface temperature and SST of 0.039 K and 0.031 K over 500 yr, respectively.However, there is a warming tendency in global mean ocean temperature (0.126 K over 500 yr) and this is explained by a small but persistent TOA radiative imbalance with a mean value of about 0.086 W m −2 .For the first few hundred years of the piControl there is a modest freshening tendency of surface SSS with an associated salinification of the deep ocean.The model's fresh water budget diagnosed from global mean evaporation, precipitation, total cloud LWP, sea ice volume and ocean salinity is also in close long-term balance, illustrating the model's ability to (closely) conserve heat and fresh water in their modelled forms.The conservation properties of NorESM are encouraging, fulfilling a highly desirable constraint for climate models aiming for multidecadal, centennial and longer simulations.
Long-term stability of heat and fresh water fluxes between the various model components does, of course, not ensure a model system without biases relative to observation-based estimates.The main model state based on NorESM Histori-cal1, as elaborated in Sect. 5 and Table 1, can be summarised as follows (the comparison is mainly made for the 30 yr period 1976-2005): -The global mean net clear-sky SW and LW flux at TOA, and the associated TOA SW and LW cloud forcing, are close to or within the observational range.This is in contrast to a clear underestimation of global mean cloud cover.With respect to the radiative fluxes, the overestimation of LWP is probably compensating for the bias in cloud cover (Jiang et al., 2012).
-The simulated annual mean sensible and latent heat flux over land are in the same range as observations, although land areas in NorESM are too moist with subsequently too high latent heat fluxes (Fig. 5).The model underestimates sensible heat flux in most of the African continent south of the Sahara, in the west coast of India, in Australia, and in the western part of the United States, but overestimates sensible heat flux in the extreme eastern part of South America.
-The model generally underestimates the global mean near surface air temperatures over the continents with about 1.1 K.The global mean SST bias is smaller, about −0.15 K.For the latter, warm biases are found in the major upwelling regions west of the main continents and in the northern North Atlantic (Fig. 12b).The too cold surface model climate might be linked to the inclusion of the aerosol indirect effect in NorESM, a slight overestimation of the cooling by the aerosol direct effect (Kirkevåg et al., 2013), and too extensive summer sea ice extent (Figs. 3 and 4).
-The simulated global mean SSS is too low (−0.15 g kg −1 ), with negative biases covering most of the low latitudes and large positive biases (> 2 g kg −1 ) in the Arctic Ocean (Fig. 12d).
-The simulated AMOC is on the strong side (30.8 Sv at 26.5 • N, Fig. 13b) compared to observation-based estimates and other model simulations.This bias leads to a too warm (Fig. 14b) and saline (Fig. 14d) Atlantic Ocean at depth.It is not clear which mechanisms are responsible for the strong AMOC in NorESM.
-The global mean SSH is comparable to observations, although the SSH of the Atlantic Ocean is too low (Fig. 12f).
-NorESM simulates a northward oceanic heat transport of 0.6 PW at equator, in contrast to a vanishing heat transport based on observation-based estimates (Fig. 15a).North of the equator, the simulated northward oceanic heat transport is too large in all ocean basins.The bias is particularly evident in the Atlantic Ocean, likely linked to the overly strong AMOC.
-NorESM overestimates the oceanic evaporation by about 4 % and the flux of water vapour from ocean to land by about 8 %.The atmospheric residence time of oceanic water vapour is, however, consistent with observations.
-Cloudiness in NorESM is underestimated by between 13 to 24 % (relative to ISCCP and CLOUDSAT data, respectively).The liquid water in clouds is generally exaggerated, particularly in the extratropical storm track regions (cf.Jiang et al., 2012).
-The magnitude of tropical precipitation is clearly overestimated in the model (Fig. 7e and f), a problem related to the well-known double ITCZ problem also present in NorESM (Fig. 8).
-The zonally averaged mean troposphere temperature is in general too cold in both summer and winter (Fig. 9).The strength and seasonal migration of the Northern Hemisphere tropospheric jets are well captured by the model (Fig. 10), although the Northern Hemisphere subtropical and eddy-driven jets are slightly more separated during DJF, and the Southern Hemisphere subtropical jet is too strong during DJF.
-The simulated geographic distribution of sea ice is fairly realistic, although with too extensive extents in both hemispheres during the respective summer seasons (Figs. 4 and 11).The simulated sea ice thickness is, in general, comparable to or on the thick side compared to available observations.An analysis of the simulated internal variability of the NorESM piControl and historical ensemble experiments, focussing on tropical variability, the annular modes and Atlantic variability, shows the following features (Sect.6): -The main power of the NINO3.4index of Historical1 occurs on timescales of 2-4 yr, whereas observations (HadISST) covering the period 1900-2005 show a peak on 3-7 yr (Fig. 17).The standard deviation of Histori-cal1 (and piControl) is approximately 15 % larger than in the observations.On decadal and longer timescales, piControl has less power compared to both HadISST and Historical1, possibly because of the absence of natural forcings in piControl.The spatial correlations of NINO3.4SST anomalies with global SST anomalies in Historical1 and HadISST shares many features (Fig. 18), although the model produces a too narrow correlation pattern in the tropical Pacific east of 160 • E and a too weak horse-shoe shaped pattern of negative correlation in the sub-tropical Pacific.
-Analysis of the Madden-Julian oscillation shows that NorESM is able to produce coherent eastwardpropagating waves in the intraseasonal zonal winds and OLR over the tropical Indian and Pacific Oceans.This is particularly the case for wavenumber 1 in the 30-80 day band, whereas the model fails to simulate the strong coherency for wavenumbers 2-3 found in observations.
-A comparison between the simulated and observationbased Northern and Southern Annular Modes shows a close resemblance for the SAM, whereas the simulated NAM has larger amplitudes over the Arctic and North Pacific and the Atlantic centres of action are shifted to the east (Fig. 21).
-In piControl, the power spectrum of simulated maximum AMOC strength at 26.5 • N show more power compared to a theoretical red noise spectrum on timescales longer than 25 yr.However, at 45 • N the maximum AMOC strength has a power peak at about 20 yr.The strong variability at 20 yr timescale is also seen in the AMO index and in a subpolar gyre SSH index, and thus is likely associated with northern North Atlantic processes (Fig. 23c).This 20 yr peak is absent in the AMO index of the three historical experiments as well as in HadISST (Fig. 23), indicating that the external forcing of the historical experiments may disrupt some feedback mechanism causing the signal in the control simulation.
For the climate evolution of the 20th century, the following results are presented (see Iversen et al., 2013, for a more comprehensive discussion of the contemporary and future climate simulated with NorESM): -On global scale, the three historical members match the evolution of the observed surface temperature for the last 100 and 50 yr (Fig. 24a), with a warming of +0.14 K decade −1 in both observations and model for the latter period.None of the historical members www.geosci-model-dev.net/6/687/2013/Geosci.Model Dev., 6, 687-720, 2013 simulate the early warming signal of the 1920s to the 1940s poleward of 60 • N (Fig. 24b).
-For the Arctic sea ice, the simulated melting rate during summer is about half of the observed rate since the late 1970s.Also the simulated winter melting lags the observed melting (Fig. 25a).The too slow melting in the model is likely linked to the too thick sea ice in the model, particularly north of the Siberian coast.For the Antarctic (Fig. 25b), both observations and the model show changes since the late 1970s that are either insignificant or on the borderline of being significant, so no conclusions can be made based on trends.
A detailed comparison between the parent model CCSM4 and NorESM is beyond the scope of this paper.However, a few notable differences between the two model systems can be mentioned: -The net TOA radiative imbalance in the pre-industrial control integrations is smaller in NorESM than in CCSM4.
-The global mean near surface air temperature of NorESM is lower compared to CCSM4 during the last few decades of the 20th century experiments, with CCSM4 being closer to observations.However, the temperature increase during the historic experiment seems more realistic in NorESM compared to CCSM4 with CCSM4 warming faster than observations, particularly during the last 30-40 yr (see Sect. 7 and Gent et al., 2011).A considerable part of this difference might be explained by the inclusion of the aerosol indirect effect in NorESM.
-AMOC in both model systems are vigorous compared to observational estimates, but with the AMOC of NorESM clearly stronger than CCSM4.
-NorESM (CCSM4) overestimates the oceanic evaporation by about 4 % (8 %), whereas the water vapour from ocean to land is consistent with observations in CCSM4 but is overestimated by about 8 % in NorESM.These differences are possibly linked to aerosols and the tuning of cloud properties (Sect.5.2).
-For clouds in general, there are indications that NorESM is slightly improved compared to CCSM4 (Sect.5.2).A possible reason for this is the prognostic cloud droplet scheme in NorESM that is directly linked to CCN-activation from the online aerosols, permitting tuning of the precipitation autoconversion to yield improved gross properties of the cycling of fresh water in the model.
-For precipitation, NorESM does not have a significant negative bias along the equator in the Pacific as compared to the 2 • version of CCSM4.Furthermore, NorESM's ITCZ bias is mainly confined to the Northern Hemisphere while the bias in CCSM4 is more symmetric about the equator.This difference may, in part, originate from the aerosol direct and indirect effects in NorESM (Sect.5.2).
The readers are referred to Iversen et al. (2013) for further analysis of NorESM.

sionFig. 1 .
Fig. 1.Schematic of the spinup and integration procedure followed for the various model experiments with NorESM.See text for a description of the experiments.

Fig. 2 .Fig. 2 .
Fig. 2. Annual mean time series between year 700 and 120 Net radiation at TOA (Wm −2 ) with positive values indicating war value is +0.086Wm −2 ); near surface air temperature ( • C; mean into the ocean-sea (Wm −2 , positive value means ocean warming sea surface temperature ( • C, mean value is 17.68 • C); volum ( • C, mean value is 3.81 • C); sea surface salinity (g kg −1 , mean averaged ocean salinity (g kg −1 , mean value is 34.72 g kg −1 ), n Drake Passage (Sv, mean value is 130 Sv), and the strength o value is 30.8Sv).The black dashed lines in the two heat flu whereas the solid black lines in the other panels show the linea 67

Fig. 3 .
Fig. 3. Latitude-time Hovm öller diagrams of (a) annual, zonal mean SST (K) and (c) SSS (g kg −1 ) where the corresponding zonal time means have been subtracted, and depth-time Hovm öller diagrams of (b) global mean ocean potential temperature (K) and (d) salinity (g kg −1 ) presented as anomalies compared to World Ocean Atlas 2009 (WOA09; Locarnini et al., 2010; Antonov et al., 2010) annual mean potential temperature and salinity.All panels are based on year 700-1200 of NorESM piControl, time filtered with a 10 yr running mean.Note the non-linear scaling with depth in panel (b) and (d).

Fig. 4 .
Fig. 4. Time series of (a) northern and (b) southern hemispheric sea-ice extent (10 6 km 2 ) for March and September in piControl.Black lines show simulated, annual mean time series and red lines show observed, annual mean and ±2std for the years 1979-2005 (data from NSIDC,Fetterer et al., 2009).

Fig. 6 .
Fig. 6.Comparison of simulated air temperature at reference height over the ground surface with NorESM for 1976-2005 (Historical1) with the CRU TS3.1(Mitchell and Jones, 2005) observational data set for the same period interpolated to the same grid using conservative remapping.Global bias error is −1.0868K with a RMSE of 2.347 K.

Fig. 8 .
Fig. 8. (a) Simulated and (b) observation-based annual mean prec mean value of year 1976-2005 from Historical1 is shown, whereas TRMM version 3B43 for the period 1998-2011 (Huffman et al., 2007) the monthly mean standard deviation of the fields in (a) and (b), res shown on their native horizontal resolution.

Figure 12 Fig. 11 .
Figure 12 shows mean SST, SSS, and sea level height (SSH) for the period 1976-2005 of Historical1 including comparison with observation-based estimates.The global mean SST difference is −0.15K with a RMSE of 1.22 K. Positive SST biases are seen in the major upwelling regions, large portions 76

Fig. 11 .
Fig. 11.Mean sea ice thickness (m) over years 1976-2005 of the NorESM Historical1 experiment for both hemispheres and for (a, c) March and (b, d) September.For these panels, the solid black line shows the 15 % monthly sea ice concentration from the OSI SAF reprocessed data set (EUMETSAT, 2011).
is confirmed in the global zonal mean salinity difference from observations seen in Fig. 14c.Fresh biases are found in a large portion of the ocean above 1500-2000 m and south of 60 • N, with the largest negative bias (< −0.5 g kg −1 ) in the upper 500 m in the latitude band 40 • S-20 • N. Saline biases are found below 1500-2000 m and south of 60 • N, while north of 60 • N there is a positive bias in most of the ocean column in excess of 0.25 g kg −1 .The pattern of Atlantic zonal mean salinity difference from www.geosci-model-dev.net/6

Fig. 12 .
Fig. 12.(a) Mean simulated SST (K), (c) SSS (kg g-1), and (e) SSH (m) of years 1976-2005 of Historical1.Right panels (b, d, f) show corresponding difference from observational based data sets (model minus observation) where SST and SSS are obtained from WOA09 and SSH uses a 1992-2002 data set described inMaximenko et al. (2009).Areas with missing observations are shaded with dark grey color and the mean for the area with observed SSH of the simulated and observed SSH was subtracted from both data sets.
this would, on the global scale, indicate either too strong upwelling, balanced by excessive deep water formation, or too weak diapycnal mixing.Both Megann et al. (2010) and Dunne et al. ( Fig. 15.(a) Northward heat transports for atmosphere (red line), ocean (blue line) and total (black line) from NorESM Historical1 years 1976-2005.The ocean heat transport are diagnosed directly in the ocean component while the atmospheric transport is found by meridional integration of the difference between zonal integration of net TOA and surface heat fluxes.(b) Corresponding northward heat transports for the global ocean (black line), Atlantic Ocean (red line), and Pacific and Indian Ocean (blue line).The hatched areas in both panels are heat transports with uncertainty estimates fromFasullo and Trenberth (2008) where the CERES dataset is used for TOA terms. 80

Fig. 15 .
Fig. 15.(a) Northward heat transports for atmosphere (red line), ocean (blue line) and total (black line) from NorESM Historical1 years 1976-2005.The ocean heat transports are diagnosed directly in the ocean component while the atmospheric transport is found by meridional integration of the difference between zonal integration of net TOA and surface heat fluxes.(b) Corresponding northward heat transports for the global ocean (black line), Atlantic Ocean (red line), and Pacific and Indian Ocean (blue line).The hatched areas in both panels are heat transports with uncertainty estimates fromFasullo and Trenberth (2008), where the CERES data set is used for TOA terms.
4 region (5 • S-5 • N; 170 • W-120 • W) are used.The NINO3.4 index is obtained by normalizing these SST anomalies by their long-term standard deviation.Figure 16 shows time series of detrended monthly SST anomalies of the NINO3.4region for the HadISST data set and NorESM Historical1 for the years 1900-2005.Also shown

Fig. 16 . 81 Fig. 16 .
Fig. 16.Time series of detrended monthly SST anomalies of the NINO3.4region (5 • S-5 • N; 170 • W-120 • W).The anomalies are found by subtracting the monthly means for the whole time series.Upper panel shows Hadley Centre Sea-Ice and Sea Surface Temperature data set (HadISST; Rayner et al., 2003), while middle and lower panel are Historical1 and piControl, respectively.

Fig. 18 .
Fig. 18.Correlation between local and NINO34 region SST anomalies for (a) HadISST and (b) Historical1.The anomalies are found by subtracting the monthly means for the whole time series that span years 1900-2005 for both data sets.Hatched area indicates regions where the correlation is not significantly different from vanishing correlation at the 95 % confidence level.In the estimation of confidence intervals, the number of degrees of freedom is adjusted due to autocorrelation according to Quenouille (1952).83 Fig. 18.Correlation between local and NINO34 region SST anomalies for (a) HadISST and (b) Historical1.The anomalies are found by subtracting the monthly means for the whole time series that span years 1900-2005 for both data sets.Hatched area indicates regionswhere the correlation is not significantly different from vanishing correlation at the 95 % confidence level.In the estimation of confidence intervals, the number of degrees of freedom is adjusted due to autocorrelation according toQuenouille (1952).

Fig. 20 .Fig. 20 .
Fig. 20.Coherence squared (colors) and phase lag (vectors) between zonal winds at 850 hP and OLR are shown for (a) NCEP winds and NOAA satellite OLR, and (b) NorESM.Only th symmetric spectra are shown here.Cross spectra are calculated using daily data during a seasons on 256 day-long segments, overlapping by 206 days.Vectors represent the phas by which wind anomalies lag OLR anomalies, increasing in the clockwise direction.A phas of 0 • is represented by a vector directed upward.Dispersion curves for the (n = −1) Kelvin n = 1 equatorial Rossby (ER) and n = 0 eastward inertia-gravity (EIG) waves corresponding t three equivalent depths (h = 12, 25 and 50 m) in the shallow water equations are overlaid (blac contours).MJO is defined as the spectral components within zonal wave numbers 1-3 an having periods of 20-80 days.85 Fig. 20.Coherence squared (colours) and phase lag (vectors) between zonal winds at 850 hPa and OLR are shown for (a) NCEP winds and NOAA satellite OLR, and (b) NorESM.Only the symmetric spectra are shown here.Cross-spectra are calculated using daily data during all seasons on 256 day-long segments, overlapping by 206 days.Vectors represent the phase by which wind anomalies lag OLR anomalies, increasing in the clockwise direction.A phase of 0 • is represented by a vector directed upward.Dispersion curves for the (n = −1) Kelvin, n = 1 equatorial Rossby (ER) and n = 0 eastward inertia-gravity (EIG) waves corresponding to three equivalent depths (h = 12, 25 and 50 m) in the shallow water equations are overlaid (black contours).MJO is defined as the spectral components within zonal wavenumbers 1-3 and having periods of 20-80 days.

Fig. 21 .Fig. 21 .
Fig. 21.The leading EOF of winter (DJFM) monthly mean SLP anomalies (hPa) over the No ern Hemisphere (20-90 • N) in Historical1 (top left panel) and NCEP-2 (top right).The lead EOF of monthly mean 850 hPa geopotential height anomalies (m) over the Southern He sphere (90-20 • S) for Historical1 (bottom left panel) and NCEP-2 (bottom right).Prior to EOF analysis the data were weighted by the square root of the cosine of latitude so that eq areas are afforded equal weight.The principal components (PCs) were scaled to unit st dard deviation and projected on the original (not area-weighted) data to obtain correspond EOFs.The leading EOFs shown are associated with a one standard deviation change in corresponding PCs (index time series).86

Fig. 22 . 87 Fig. 22 .
Fig. 22.Power spectrum of time series of annual maximum AMOC at 26.5 • N (black line) and 45 • N (blue line) using the multitaper method of Ghil et al. (2002) with resolution p = 4 and number of tapers t = 7.The thin black line represents a fitted red noise spectrum of the 26.5 • N maximum AMOC spectrum and the dashed line the 95 % confidence limit about the red noise spectrum.

Fig. 23 .
Fig. 23.(a) Time series of AMO index defined as annual detrended SST anomalies for region 75 • W-7 • W, 0 • -60 • N in NorESM piControl and (b) HadISST (black line) together NorESM historical ensemble members (blue, red, and green lines).Lower panels show po spectra of: (c) NorESM piControl AMO index (black line) and subpolar gyre SSH index (ano lous SSH for the region 60 • W-15 • W, 48 • N-65 • N; yellow line), (d) NorESM historical ens ble members (colors as in (b)), (e) HadISST.The multitaper method of Ghil et al. (2002) resolution p = 4 and number of tapers t = 7 is applied to time series normalized by their stand deviation.Thin lines represents fitted red noise spectra of the power spectra with same c The dashed lines are the 95 % confidence limits about the red noise spectra. 88

Fig. 23 .
Fig. 23.(a) Time series of AMO index defined as annual detrended SST anomalies for the region 75 • W-7 • W, 0 • -60 • N in NorESM piControl and (b) HadISST (black line) together with NorESM historical ensemble members (blue, red, and green lines).Lower panels show power spectra of (c) NorESM piControl AMO index (black line) and subpolar gyre SSH index (anomalous SSH for the region 60 • W-15 • W, 48 • N-65 • N; yellow line), (d) NorESM historical ensemble members (colours as in b), and (e) HadISST.The multitaper method of Ghil et al. (2002) with resolution p = 4 and number of tapers t = 7 is applied to time series normalized by their standard deviation.Thin lines represent fitted red noise spectra of the power spectra with the same colour.The dashed lines are the 95 % confidence limits about the red noise spectra.

Fig. 25 .
Fig. 25.Time series of (a) Northern and (b) Southern Hemisphere winter and summer seaice extent from the historical ensemble (red), and two satellite-based estimates for the sea-ice extent: the Sea-Ice Index in black from NSIDC(Fetterer et al., 2009), and in gray data based on the Bootstrap algorithm(Comiso, 1999). 90

Fig. 25 .
Fig. 25.Time series of (a) Northern and (b) Southern Hemisphere winter and summer sea ice extent from the historical ensemble (red), and two satellite-based estimates for the sea ice extent: the Sea Ice Index in black from NSIDC(Fetterer et al., 2009), and, in grey data, based on the Bootstrap algorithm(Comiso, 1999).

Table 1 .
Global and annual averages of model calculated (for the period 1976-2005 of Historical1) vs. observationally based or reanalyzed atmospheric data.
a CERES2 (Collins et al., 2006)d global mean values for the years 1976-2005 of the Historical1 simulation with NorESM along with observations or reanalysis products from recent decades.The net TOA SW flux of NorESM is 234.9Wm−2 and the observations listed in the table are in the range 234.0-244.7 W m −2 .It should be noted that the NorESM values are adjusted for the fact that the top of the model is slightly below the TOA seen from satellites(Collins et al., 2006).The actual net downward SW flux at the top of the model is 231.8W m −2 , while the net upward LW flux at the top of model is 231.3W m −2 .Hence, the model experiences an approximate radiative imbalance of +0.5 W m −2 at its upper boundary during the years 1976-2005.The adjusted net TOA LW flux of the model is 232.4W m −2 , i.e. slightly below the observational range of 233.9-239.6W m −2 .The net

Table 2 .
Calculated key elements of the gross cycling of fresh water in the earth system valid for the early years after 2000.Values from the NorESM Historical1 experiment(years 2000-2005)are compared to values for CCSM4