Ocean biogeochemistry in the Norwegian Earth System Model version 2 (NorESM2)

. The ocean carbon cycle is a key player in the climate system through its role in regulating the atmospheric carbon dioxide concentration and other processes that alter the Earth’s radiative balance. In the second version of the Norwegian Earth System Model (NorESM2), the oceanic carbon cycle component has gone through numerous updates that include, amongst others, improved process representations, increased interactions with the atmosphere, and additional new tracers. Oceanic dimethyl sulﬁde (DMS) is now prognostically simulated and its ﬂuxes are directly coupled with the atmospheric component, leading to a direct feedback to the climate. Atmospheric nitrogen deposition and additional riverine inputs of other biogeochemical tracers have recently been included in the model. The

reservoir and absorber of heat and of the greenhouse gas CO 2 (carbon dioxide), an expansion of climate models to include ocean components and a representation of the carbon cycle became a necessity. Coupled atmosphere-ocean-land models (now referred to as Earth System Models -ESMs) were eventually developed to account for further reservoirs and feedbacks including biogeochemical cycles (Bretherton, 1985;Cubasch et al., 2013;Heinze et al., 2019). The inclusion of the ocean carbon cycle is a first order requirement, because on timescales beyond a few decades, the ocean becomes the sole major sink for atmospheric 5 excess CO 2 from anthropogenic emissions (Sarmiento and Gruber , 2006). ESM simulations can therefore be used to estimate historical carbon budgets and future partitioning of carbon fluxes into Earth system reservoirs under specified scenarios.
Today, modern ESMs are state-of-the-art tools utilized to study the complex interactions and feedbacks between various components in the Earth system in a comprehensive way (Flato et al., 2013). They are applied regularly to simulate climate evolutions across time scales and to study transient climate change, its drivers, and its impact on the environment (e.g., Bopp 10 et al., 2013;Gehlen et al., 2014). By prescribing plausible scenarios of future emissions and land use, these models provide projections for possible future climate states. ESMs are typically upgraded every few years with new and improved process representations as well as adaptations to technical advancements (e.g., new hardware systems, higher resolution, etc.).
The ocean and its biogeochemistry play a crucial role in controlling the rate of anthropogenic climate change through a substantial negative feedback (Friedlingstein et al., 2006;Arora et al., 2013). However, the absorption rates of heat and carbon 15 by the ocean are non-linear in space and time, due to the complex interactions with the internal climate variability (e.g., Tjiputra et al., 2012;Schwinger et al., 2014). Once passed the air-sea interface, the dissolved carbon dioxide reacts with seawater and is chemically transformed into different carbonate species. The ocean circulation and biological processes sequester parts of these into the deep ocean, where they stay isolated from the atmosphere, for decades to millennia (Sarmiento and Gruber , 2006). In addition to carbon dioxide, the ocean biogeochemistry also influences the Earth's climate through other seawater 20 chemical constituents, such as dimethyl sulfide emissions. This could result in a number of feedback mechanisms within the Earth system that can either amplify or dampen the rate of climate change (Rhein et al., 2013). ESM projections are therefore critical for narrowing uncertainties in the future global carbon budget, and consequently its climate feedback.
ESMs have been increasingly used to estimate the evolution of marine ecosystem stressors, such as ocean acidification and deoxygenation, under anthropogenic climate change (e.g., Gehlen et al., 2014;Tjiputra et al., 2018). ESM applications to 25 support climate policy-making, such as quantifying future compatible anthropogenic carbon emissions under a set of predefined scenarios as well as addressing emerging questions associated with the United Nations sustainable development goals have become more regular. With its growing relevance, a more realistic representation of the observed contemporary ocean biogeochemistry in Earth system models is urgently necessary to increase the fidelity of future projections.
In this manuscript, we present improvements in the ocean biogeochemical component of the Norwegian Earth System Model 30 (NorESM). This component is based on the Hamburg Oceanic Carbon Cycle model (HAMOCC5.1), which was originally developed by Maier-Reimer (1993) and has gone through several development iterations (e.g., Six and Maier-Reimer, 1996;Maier-Reimer et al., 2005). At the Bjerknes Centre for Climate Research in Norway, the model has been branched off, further developed Tjiputra et al., 2010Tjiputra et al., , 2013Schwinger et al., 2016), and coupled to the Bergen Layered (isopycnic coordinate) Ocean Model (BLOM-iHAMOCC). Within the CMIP5 experiments, we have identified several shortcomings 35 in the previous model version of NorESM (NorESM1-ME, hereafter referred to as NorESM1). Here, we discuss several developments, relative to the NorESM1 version (described in Tjiputra et al., 2013), in the current model version (NorESM2, Seland et al., 2020), which contributes to CMIP6. NorESM1 does not include interactively coupled DMS emissions from the ocean, which is the largest natural source of atmospheric sulfur, and contributes to aerosol and cloud formation. Instead, the atmospheric chemistry module of NorESM1 5 prescribed mean fixed climatology oceanic fluxes of DMS, and there is no feedback associated with climate-induced change in marine biological production. However, a model study has suggested that the inclusion of interactive DMS in future climate projection could potentially result in spatially-varying changes in warming rate and a non-negligible feedback on the climate system (Schwinger et al., 2017). This has prompted us to include a fully-interactive DMS cycle (production, emissions, and radiative impact) into NorESM2. 10 In the past years, the concept of emergent constraints was employed to identify key biogeochemical processes that lead to uncertanties in ESMs projections (e.g., Wenzel et al., 2014;Kwiatkowski et al., 2017). For instance, biases in the seasonal cycles of biological production and surface ocean pCO 2 have been identified as one of the factors contributing to the uncertainty in the projected carbon sinks and storage in the ESMs participating in CMIP5 (Kessler and Tjiputra, 2016;Goris et al., 2018).
These studies motivate further improvement in the representation of biological processes, particularly focusing on the high 15 latitude regions, such as the Southern Ocean. The biological improvements in NorESM2 were mainly achieved through tuning of the poorly constrained parameters in the ecosystem module.
In NorESM1, as well as other ESMs participating in CMIP5, there is a large uncertainty in the simulated interior oxygen concentration when compared to observations. This is partly attributed to the short model spin-up and bias in the interior remineralization processes (Cabré et al., 2015;Séférian et al., 2016). To alleviate this, we have tested several parameterizations 20 of particulate organic carbon (POC) sinking schemes in our model (Schwinger et al., 2016). Currently, there are three different sinking schemes that can be selected in NorESM2. Based on our earlier assessments, we have used the scheme with sinking velocity that increases linearly with depth as the default parameterization in NorESM2.
Increasing atmospheric CO 2 and associated climate change will alter both the natural and anthropogenic components of ocean inorganic carbon chemistry. Disentangling future dynamical responses of these components separately is therefore 25 needed to identify which processes (e.g., biological and physical, among others) regulate the simulated changes in ocean carbon storage today and in future. In NorESM1, only a single dissolved inorganic carbon (i.e., total DIC) tracer was implemented, and there is no suitably accurate method to separate between the driving mechanisms of its changes. In NorESM2, we have introduced a new 'natural DIC' tracer, which simulates changes in the natural component of DIC (i.e., it only interacts with fixed preindustrial CO 2 concentration during the air-sea gas exchange). Other key diagnostic tracers such as preformed source is through atmospheric nitrogen fixation and aerial dust deposition, which is converted into dissolved iron. Implementation of riverine input of nutrients in HAMOCC5.1 was initiated by Bernard et al. (2011) and a study by Gao et al. (in prep.) indicates that riverine nutrient inputs could improve the regional representation of marine biogeochemistry, with consequences for future projections, i.e., they increase coastal carbon sinks and alleviate nutrient limitations due to warming-induced stratification. In NorESM2, we have accounted for riverine inputs of nutrients and other biogeochemical constituents. Other 5 processes representing the external nitrogen sources such as atmospheric nitrogen deposition and nitrogen fixation have also been implemented and updated, respectively.
Since ESMs are used to project future climate change, questions have arisen whether or not these models are able to simulate past climate change. To this end, NorESM1 has been applied to study the sensitivity of climate states to different boundary conditions in the past, going back from the last glacial to as far back as the mid-Pliocene (Zhang et al., 2012;Guo et al., 10 2019). The model was also used to study the sensitivity of the ocean carbon cycle to different background climates, which could provide insight into the ocean's role in regulating past atmospheric CO 2 variability (Kessler et al., 2018;Luo et al., 2018). There has also been a growing interest to determine if such complex models are able to reproduce climate states directly inferred from proxy-records. These paleo proxies, e.g., carbon isotopes (δ 13 C and ∆ 14 C), typically measured from foraminiferal samples collected from seafloor sediment, are natural archives that store watermass properties and are used to infer large scale 15 ocean circulation patterns as well as ventilation time scales during past climate changes (Peterson et al., 2014). In NorESM2, we have now implemented 13 C and 14 C tracers, which will be used as an additional ocean circulation and biogeochemistry constraint when simulating past climate variability.
Beyond the above mentioned biogeochemical processes, other improvements in tracer initialization, iron chemistry, air-sea gas exchange parameterization, code readability, and documentations of the codes have been made. Moreover, there have been 20 several updates within the physical ocean component of NorESM2 and a selection of these updates are discussed in this paper as they have direct impacts on the ocean biogeochemistry. In addition to describing the model improvements, we also evaluate the performance of the ocean biogeochemical component of NorESM2. Whenever possible, we compare and describe the performance of NorESM2 model with the first generation model, NorESM1.
The manuscript is organized as follows: advancements in the physics and more detailed descriptions of all improvements in 25 biogeochemical processes are described in Section 2. Implementations of new tracers are presented in Section 3. The different model simulations performed and used in the paper are described in Section 4. In Section 5, we present results from our coupled ESM simulations as well as that of the ocean-only (preindustrial) simulation to illustrate the performance of the simulated carbon isotope tracers. Summary and discussion are presented in Section 6.
2 Model changes and improvements 30

General configuration changes
In NorESM2, the ocean-ice model adopts a tripolar grid instead of the bipolar grid of NorESM1. The tripolar grid was chosen because it is more isotropic at northern high latitudes and therefore is more efficient with respect to time integration (longer time step) (Guo et al., 2019). Compared to NorESM1, NorESM2 has enabled a higher frequency ocean coupling to resolve the diurnal cycle in the flux-and state-exchange between ocean and atmosphere (i.e., hourly instead of daily atmosphereocean coupling). We also applied a full leap-frog time-stepping in both physical and ocean biogeochemical models to improve conservation of biogeochemical tracers (Schwinger et al., 2016).
There are several configuration options for NorESM2. For CMIP6, two configurations will mostly be used, differing only 5 in the horizontal resolution of the atmospheric and land models with otherwise identical model components. They are named NorESM2-LM and NorESM2-MM and have atmosphere-land resolution of approximately 2 • and 1 • , respectively. In addition, both versions have a different parameter tuning in the atmospheric component, necessary to achieve a top of the atmosphere radiative balance under preindustrial conditions. Both NorESM2-LM and NorESM2-MM share the same ocean physical ocean model (BLOM) and the same biogeochemical component (iHAMOCC) with a horizontal resolution of approximately 1 • . The 10 ocean model adopts 53 isopycnal layers with two non-isopycnic surface layers, which represent (1) the top 10 m depth and (2) the rest of the surface mixed layer. A more detailed description of the ocean physical component is provided in Bentsen et al. (in prep.). The ocean biogeochemistry performs similarly in both model versions. Unless explicitly specified, our analysis refers to the results from NorESM2-MM (hereafter referred to as NorESM2). 15 The simulated Atlantic Meridional Overturning Circulation (AMOC) strength in NorESM1 is on the strong side (30.8 Sv at 26.5 • N; 1 Sv = 10 6 m 3 s −1 ) compared to observation-based estimates and other CMIP5 models (Bentsen et al., 2013). The vigorous AMOC leads to a too warm and saline Atlantic Ocean at depth. Lack of upper ocean stratification at high latitudes is thought to contribute to this AMOC bias. Reformulating the eddy-induced transport (Gent and McWilliams, 1990) in order to make it more efficient in the upper ocean non-isopycnic regime of the model and adjustments to the parameterized re-20 stratification by mesoscale eddies (Fox-Kemper et al., 2008) lead to improved high latitude stratification and a more realistic MLD, particularly at high latitudes.

Physical parameterizations
NorESM1 has a cold bias in the depth range 200-1000 m between 50 • S and 50 • N (Bentsen et al., 2013) that could be related to insufficient upper ocean vertical mixing. Further, the new hourly atmosphere-ocean coupling and associated tuning of MLD parameterizations lead to a cold bias in the Pacific equatorial thermocline that also indicates too little vertical mixing. 25 To improve this, two different approaches have been used. First, we allow surface turbulent kinetic energy (TKE) originating from wind stirring to penetrate below the mixed layer and be added to the TKE reservoir of the k − ε turbulence model handling the shear instability mixing of the isopycnic interior. Secondly, with hourly atmosphere-ocean coupling, high frequency winds are available to parameterize wind work on near-inertial motions. Inspired by the approach of Jochum et al. (2013), a fraction of this energy source is added to the mixed layer TKE reservoir, modifying the MLD estimation, while a fraction of the remaining 30 energy is used to drive upper ocean mixing through assumed excitation and breaking of internal waves. Combining these approaches, we were able to reduce the above-mentioned upper ocean biases significantly. The mid-depth warm bias seen in NorESM1 is also reduced. More details on the physical improvements are described in Bentsen et al. (in prep.). As a result of these improvements, NorESM2 simulates a more reasonable AMOC strength of approximately 21 Sv, which is within the range of the observational estimates of 17.9±3.3 Sv (Srokosz et al., 2012).

Dimethyl sulfide
NorESM2 prognostically simulates DMS concentration according to the formulation of Kloster et al. (2006), as follows: where the first term on the right hand side denotes the physical transport, the second term denotes DMS production, and the third, fourth, and fifth terms denote sinks due to bacterial activity, photolysis, and flux to the atmosphere. The production term is computed as a function of temperature, pH, simulated detritus export production (opal and CaCO 3 , implicitly computed as a function of silicate concentration): (2) 10 The last term in Eq. 2 describes a pH-dependence of DMS production following , which is parameterized based on the deviation of actual pH from the climatological monthly mean surface pH (pH pi ) calculated from the preindustrial control. Due to uncertainties in the pH-control on DMS production, we have omitted the pH-dependency term by setting α to zero in the presented simulations. Parameters γ o and γ c represent scaling factors for sulphur (S:Si and S:C) and are set to 0.025 and 0.125, respectively, such that DMS production is more sensitive to CaCO 3 production in the model. This assumption is due 15 to the fact that haptophyte-class phytoplankton generally has a higher DMSP to cell carbon ratio than diatoms (Keller et al., 1989). Since the opal and CaCO 3 portions of export production are implicitly computed as a function of silicate concentration, e.g., opal (CaCO 3 ) production is high (low) when surface silicate is abundant and vice versa, the DMS production is also sensitive to the long-term trend in the silicate concentration.
The DMS photolysis term is defined as a function of photolysis rate (γ uv =0.0011 m 2 (W day) −1 ), incoming shortwave radiation attenuated by water and phytoplankton, I z , and DMS concentration: 25 Lastly, in NorESM2, the atmospheric model receives the DMS emissions (S gas ) that are prognostically computed by the ocean biogeochemistry model (see Section 2.8).

Riverine input
The influx of carbon and nutrients from over 6000 rivers to the coastal oceans has been implemented based on previous work of Bernard et al. (2011) with modifications. Bernard et al. (2011) have implemented the riverine fluxes based on the levels in 30 year 1995, provided by an early version of the Global-NEWS model (Seitzinger et al., 2005). In addition to the riverine DIC, dissolved organic carbon (DOC), inorganic nitrogen and phosphorus, particulate organic carbon (POC), particulate nitrogen and phosphorus, and dissolved silicate, we have also included riverine alkalinity (ALK) and iron (Fe). Except for DIC, ALK and Fe, all data are provided by the more recent Global-NEWS2 model (Mayorga et al., 2010), which is a hybrid of empirical, statistical and mechanistic model components that simulate steady-state annual riverine fluxes of carbon and nutrients. The

5
DIC and ALK fluxes are taken from the work by Hartmann (2009). Riverine Fe-flux is calculated as a proportion of global gross dissolved Fe input of 1.45 Tg yr −1 (Chester, 1990), weighted by water runoff of each river. Only 1% of the riverine dissolved Fe is added to the oceanic dissolved Fe, since approximately 80 to 99% of the fluvial gross dissolved Fe is removed during estuarine mixing (e.g., Sholkovitz et al., 1981;Shiller and Boyle, 1991;Bruland and Lohan, 2014). Instead of releasing the riverine fluxes to the nearest ocean grid cell (Bernard et al., 2011), we have interpolated all fluxes at river mouths in the and therefore other forms of dissolved organic matters (e.g., DON, and DOP) are not explicitly simulated. Since Global-NEWS provides estimates of dissolved organic matter in carbon, nitrogen, and phosphate forms (DOC riv , DON riv , and DOP riv , 15 respectively) separately, only the minimum of the three riverine dissolved organic constituents is added to the DOC term in the model (i.e., DOC = DOC + min(DOC riv , r C:N · DON riv , r C:P · DOP riv )). Any excess or remaining organic matter of the three constituents is then added to the corresponding inorganic pools (DIC, NO 3 or PO 4 ). The same concept also applies to riverine inputs of particulate organic carbon (POC) (see also Bernard et al., 2011). 20 We apply anthropogenic nitrogen deposition fluxes that potentially affect the simulated ocean carbon uptake (through an increase of the available nitrate for biological production) to the ocean biogeochemical model. The monthly input fields, spanning the years 1850-2014, are simulated by chemistry transport models and provided by CMIP6 (Jones et al., 2016).

Atmospheric nitrogen deposition
Conservative remapping is used to interpolate the input data from a regular 2.5 • x1.9 • latitude-longitude grid to the tripolar ocean model grid of NorESM2. Four species of wet or dry and oxidized or reduced nitrogen deposition rates are included in 25 the input fields. All of them are added to the nitrate pool in the top-most ocean layer.

Particle export
In NorESM1, the export of particulate organic matter is parameterized with a constant sinking speed and a constant remineralization rate at depth, when sufficient oxygen is available. This simplistic formulation has been shown to have difficulty in accurately representing the observed particle fluxes at depths (Kriest and Oschlies, 2008) and leads to biases in the interior 30 distributions of biogeochemical tracers (e.g., as seen from remineralized phosphate concentrations). In NorESM2, we have developed two additional particle flux parameterizations that can be selected. They are referred to as WLIN and AGG schemes.
The WLIN scheme is similar to the standard scheme but the sinking speed is a linear function of depth: w POC = min(w min +az, w max ). Here, we use w min = 1 md −1 , w max = 60 md −1 , and a = 60/2400 md −1 m −1 . When w min and w max are set to zero and infinity, respectively, this scheme is equivalent to the widely used Martin-curve formulation (Martin et al., 1987).
The AGG scheme implements a prognostic sinking speed, calculated according to a size distribution of sinking particles.
The total concentration of particulate matter, formulated as a function of phytoplankton and detritus, forms sinking aggregates with explicitly computed size distribution that follows a power-law formula (Kriest and Evans, 1999;Kriest, 2002): Here, d represents the particle diameter and A and β represent parameters of the size distribution. The minimum particle diameter l is set to 0.002 cm. The sinking speed of aggregates is computed according to the diameter of each aggregate. More details of the implementation of the AGG formulation are described in Schwinger et al. (2016). Based on the performance of the different schemes, we have set WLIN to be the default scheme and it is used in all experiments presented here. More 10 details on the differences in the simulated interior biogeochemistry using the different particle export schemes are available in Schwinger et al. (2016).

Nitrogen fixation
Nitrogen fixation by cyanobacteria is computed implicitly and directly converted into nitrate concentration. In NorESM1, nitrogen fixation is implemented such that it can occur anytime and anywhere in the surface ocean as long as the nitrate 15 concentration is lower than phosphate (multiplied by the stoichiometric nitrate to phosphate ratio R N:P ) following Maier-Reimer et al. (2005). However, Breitbarth et al. (2007) provided some evidence that the large scale distribution of Trichodesmium, a type of cyanobacteria, is well constrained by seawater temperature between 20 and 30 • C. Based on this, a temperature-dependent function f (T ) according to Kriest and Oschlies (2015) has been added to the nitrogen fixation formulation in NorESM2. The rate of change of nitrate owing to nitrogen fixation is formulated as follows: with where µ is the maximum N 2 fixation rate (0.005 day −1 ). In addition, for each mole of N fixed to nitrate, 1.25 moles of dissolved oxygen and 1 mole of alkalinity are consumed and lost in the surface layer. The corrected formula essentially limits 25 the occurrence of N-fixation to warm low latitudes.

Air-sea gas exchange
In NorESM2, the air-sea gas exchange of CO 2 , O 2 , N 2 , N 2 O, DMS, CFC-11, CFC-12, and SF 6 is computed prognostically according to the updated formulation of Wanninkhof (2014). Here, the air-sea flux F of gas X is computed as a function of surface wind speed U, Schmidt number Sc, gas solubility K o , and partial pressure difference of gas X, based on a bulk formulation: The net fluxes are computed for the top layer of the ocean model (i.e., 10 m depth). The updated formulation includes a refitted temperature-dependent Schmidt number that can be applied for a temperature range of -2 to 40 • C. The solubility of all gases 5 is computed as a function of surface temperature and salinity following Weiss (1970) for O 2 and N 2 , Weiss and Price (1980) for CO 2 and N 2 O, Warner and Weiss (1985) for CFC-11 and CFC-12, and Bullister et al. (2002) for SF 6 . The flux of DMS is only unidirectional (outgassing; pDMS air = 0).
In the historical simulation, we use the annual atmospheric concentrations of CFC-11, CFC-12, and SF 6 , divided into northern and southern hemispheric values, according to Bullister (2014). For the atmospheric CO 2 concentration, monthly global-10 mean values from the CMIP6 dataset are used. In the preindustrial simulation, a constant CO 2 concentration of 284.32 ppm is used.

Dissolved iron parameterization
Adjustments in the iron parameterization have been made to ensure that NorESM2 simulates the observed iron-limited primary production in the Southern Ocean, Equatorial and North Pacific. In the model, the consumption and release of dissolved iron 15 associated with biological activities are determined using a fixed stoichiometry ratio (R Fe:P = 6.1e-4 mol Fe mol P −1 , previously 3.7e-4 mol Fe mol P −1 ). In the surface layer, a constant fraction (3.13e-6 kmol Fe kg −1 day −1 , previously 6.27e-6 kmol Fe kg −1 day −1 ) of aerial dust deposition (Mahowald et al., 2005) is instantaneously converted into bio-available iron. The updated global input of bio-available iron from the atmosphere is 2.8 Gmol Fe yr −1 , which is within the range of values used by other models (Tagliabue et al., 2016). Finally, throughout the water column, complexation of iron by organic substances (ligands) is 20 assumed: where the strength of this complexation λ Fe is set to 0.05/365 (previously 0.005/365) day −1 and the threshold Fe o has been reduced from 0.6 to 0.5 nmol Fe m −3 . The latter is motivated by the fact that the observed iron concentration in the deep Southern Ocean is lower than 0.6 nmol Fe m −3 (Boyd and Ellwood, 2010). The higher complexation rate leads to a faster 25 relaxation toward this value.

Ecosystem parameterization updates
The underlying marine ecosystem parameterization in NorESM2 remains the same as in NorESM1, but many of the parameters have been adjusted to reduce biases that are present in NorESM1. The main deficiencies of the spatial annual mean primary productivity pattern in NorESM1 are a too strong primary production (PP) in the Southern Ocean, the Eastern Tropical Pacific, 30 and to a lesser degree the North Atlantic, contrasted by a very low PP in the subtropical gyres (most pronounced in the Pacific).
The high productivity in the high latitudes is accompanied by an exaggerated annual cycle of PP showing a too strong spring bloom and a too fast decline of PP afterwards. Tuning of the ecosystem parameterization during the development of NorESM2 was focused on improving these regional shortcomings. The changes in ecosystem parameters listed in Table 1 mainly serve two purposes: (i) to increase top-down limitation by zooplankton grazing during phytoplankton peak bloom (but not before and after), and (ii) to increase the fraction of nutrients that is routed through dissolved organic matter (DOM).

5
The first point is achieved through a reduction in zooplankton mortality as well as increases in the maximum grazing rate, assimilation efficiency, and half-saturation constant for zooplankton grazing. The latter point is mainly achieved by reducing the production of detritus by zooplankton (fecal pellets, 1 − ε zoo of grazing) and increasing exudation and excretion rates. This allows, in combination with a decrease of the DOM remineralization constant, more nutrients to be laterally transported out of regions where nutrient trapping occurs, i.e., the Tropical Pacific. 10 Additionally, CaCO 3 and silicate to phosphate ratios were tuned to remove the high alkalinity bias of NorESM1 (see Schwinger et al., 2016, for details). Note that some adjustments of parameters given in Table 1 were also necessary because of the new sinking scheme (increasing sinking speed with depth, see Section 2.6), and the different physical circulation fields between NorESM1 and NorESM2. An updated schematic diagram of the marine ecosystem module in NorESM2 is presented in Fig. 1. 15 In NorESM1, biological productivity was computed only for the top 100 m depth, which presumably represents the averaged euphotic layer. In an isopycnic model, there is no static interface separating this depth level. In cases where the bulk MLD extends below 100 m, we virtually split this at 100 m and the biological production is simulated only down to this interface.
In NorESM2, we have omitted this virtual layer splitting and the biological production is allowed to occur below 100 m as long as it is within the bulk mixed layer depth and has sufficient light (attenuated with depth and chlorophyll concentration) 20 for phytoplankton growth.
3 New tracers

Preformed tracers
Four new preformed tracers have been implemented in NorESM2, namely preformed dissolved oxygen (O pre 2 ), phosphate (PO pre 4 ), alkalinity (ALK pre ), and dissolved inorganic carbon (DIC pre ). They are initialized to zero at the beginning of the 25 spin-up. During the model integration, in the bulk mixed layer of the model (i.e., the top 2 levels), the preformed tracers are set to the respective total, non-preformed values at each time step. Below the mixed layer, the preformed tracers are advected as passive tracers by the physical processes and hence are a measure of transport-induced (e.g., circulation, ventilation, etc.) changes. The preformed tracers are included for diagnostic purposes, such as identifying the sources of model-data misfits (Duteil et al., 2012). The preformed oxygen can be used to quantify the total and apparent oxygen utilizations (TOU and AOU) 30 as well as O 2 disequilibrium (∆O 2 ) in the interior ocean (Eqs. 10-11;Ito et al., 2004): and Here, the saturated oxygen (O sat 2 ) is determined as a function of temperature and salinity (Garcia and Gordon, 1992). Due to its closed coupling with ocean circulation, the preformed oxygen can be used to separate biologically-from physically-induced biases in the simulated interior biogeochemical tracers.
Following Bernardello et al. (2014), PO pre 4 is used to quantify the organic (soft tissue, DIC so f t ), whereas both PO pre 4 and ALK pre are used to quantify the inorganic (carbonate, DIC carb ), biologically-mediated carbon pump (Eqs. 12-14). We use a stoichiometry ratio of P : N : C = 1 : 16 : 122 in the model, such that DIC so f t represents remineralized particulate and dissolved organic materials and DIC carb dissolution of particulate calcium carbonate in the water column: 10 and Equation 14 is slightly different from that of Bernardello et al. (2014), i.e., it uses the term r N:P + 1 instead of r N:P because, 15 in our updated code, we do not neglect the contribution of phosphate to alkalinity changes during biological production and remineralization. For instance, both nitrate and phosphate produced during organic remineralization increase the concentration of proton and therefore reduce the alkalinity (see also Section 3.1.3 of Paulmier et al., 2009).
The difference between total DIC and biologically altered DIC (DIC so f t +DIC carb , or the biological carbon pump) therefore represents the physical carbon pump (DIC pre ), which comprises saturated and disequilibrium components (Eq. 15). Here, a 20 tracer of saturated DIC (DIC sat ) has been implemented. It is computed at the surface layer as a function of atmospheric pCO 2 , total alkalinity, SSS, and SST. Below the first layer, DIC sat is treated as a passive tracer advected by the circulation. Since the equilibration time-scale of the air-sea CO 2 fluxes is typically longer than the surface watermasses' residence time (e.g., in the deep water formation regions), a disequilibrium term (DIC diss ) is computed by subtracting the saturated from the preformed DIC components. The DIC disequilibrium component is used to diagnose the importance of ventilation variability on the 25 physical solubility pump, and to the overall DIC storage (Eggleston and Galbraith, 2018):

Natural inorganic carbon tracers
In order to comply with the Ocean Model Intercomparison Project (OMIP) of CMIP6 , we have implemented a natural tracer of DIC (DIC nat ), which is formulated in the same manner as DIC tot , except that the air-sea gas exchange is 30 computed with a fixed preindustrial atmospheric CO 2 concentration of 284.32 ppm. In a transient simulation with increasing atmospheric CO 2 concentrations, the difference between DIC tot and DIC nat represents anthropogenic DIC (DIC ant ): The inclusion of a DIC nat tracer also requires the implementation of respective 'natural' components for both the alkalinity and the particulate inorganic carbon (CaCO 3 ) tracers. This is because the anthropogenic CO 2 uptake alters the carbonate sys-5 tem and therefore the dissolution of CaCO 3 . We note that anthropogenic carbon does not influence the biological production (e.g., nutrient concentrations and phytoplankton growth rate) in our model. Similarly, a 'natural' air-sea CO 2 flux has been added to the model output. These 'natural' tracers are only activated in simulations with atmospheric CO 2 that departs from the preindustrial value. Here, DIC nat , TALK nat , and CaCO nat 3 are initialized in the same manner as DIC, TALK, and CaCO 3 , respectively. In transient historical and future scenario simulations, these tracers provide insight into the natural carbon evolution 10 under anthropogenic climate change.
In addition, DIC nat also provides a more precise estimate of DIC ant entering the ocean since the preindustrial period than simply taking the difference between DIC tot at a specific time and its preindustrial mean value. Alternatively, these 'natural' tracers could be computed with a parallel transient simulation but with fixed preindustrial atmospheric CO 2 as the ocean carbon cycle boundary condition. The inclusion of natural tracers provides substantial saving of computational time, especially 15 for high-resolution simulations. We note that the DIC nat tracer has not been implemented in the sediment module. Hence, in very long time-scale transient simulations where the sediment changes become substantial, the DIC nat tracer may include some uncertainties.
The 14 C is standardized as ∆ 14 C (see Section 3.3.3). Variations in δ 13 C are due to isotopic fractionation processes during air-sea 30 gas exchange, photosynthesis and CaCO 3 formation, but the latter is neglected in our implementation (see also Section 3.3.2).
X represents any 14 C state variable. In the model, all 14 C tracers are normalized to prevent numerical errors from carrying 5 values close to the precision of the model.
The newly implemented marine carbon isotope code parallels the respective total DIC code, and in addition includes fractionation during photosynthesis and air-sea gas exchange (as well as decay for the 14 C tracers). In addition to the dissolved inorganic tracers, the following 13 C and 14 C state variables have also been implemented: dissolved organic carbon, particulate organic carbon, calcium carbonate, phytoplankton, and zooplankton. Therefore, 12 new tracers are added (isotopic DOC, 10 POC, CaCO 3 , zooplankton, and phytoplankton). Due to the long equilibration time, the isotopic tracers are only activated in the computationally efficient configurations, such as the ocean carbon cycle stand-alone configuration (NorESM2-OC) or the low resolution version of the coupled model. Equilibrium times of the carbon isotopes are long due to the slow air-sea gas equilibration (Broecker and Peng, 1974) and long-term transient effects from the balance between sediment burial and weathering (Roth et al., 2014). Realistic equilibration times are therefore currently only obtained when bypassing the sediment module 15 of the model when running the carbon isotopes, as well as omitting carbon isotope input from rivers. When bypassing the sediment, mass balance is maintained by redistributing POC fluxes to the sediment over the entire overlying water column, and by dissolving inorganic carbon fluxes as well as opal fluxes at the bottom of the model immediately. Ongoing work will add the possibility for a fast sediment spin-up for use in future versions of the biogeochemical ocean model. Since ∆ 14 C is governed mainly by radioactive decay and circulation, we focus on δ 13 C in our description of isotopic fractionation effects (Sect. 3.3.1 20 and 3.3.2).

Air-sea gas exchange fractionation
During air-sea gas exchange, the lighter 12 C isotope preferentially escapes to the atmosphere. This fractionation process increases δ 13 C of surface ocean DIC, although the local net effect depends on the interplay between the local thermodynamic air-sea disequilibrium, the air-sea gas exchange rate, and the strength of the fractionation (Schmittner et al., 2013;Morée et al., 25 2018). The air-sea fractionation is a function of temperature (T [ • C]) and CO 2− 3 fraction ( fCO 3 ), such that fractionation increases with decreasing temperatures, resulting in higher δ 13 C in colder than in warmer surface water (Zhang et al., 1995).
The fractionation during air-sea gas exchange, which varies between ∼8 and 10.5 o / oo , is due to the combined effects of (1) the fractionation between CO 2 and the different carbon species, α CT g , (2) kinetic fractionation, α k , and (3) fractionation during gas dissolution, α aq g . The net air-sea 13 CO 2 flux is formulated as follows: Any fractionation α i in Eq. 19 can be expressed as and In Eq. 19, k w represents the gas transfer velocity for CO 2 according to Wanninkhof (2014) and T is in degrees C.

Biological fractionation
Phytoplankton prefers the lighter ( 12 C) isotope during photosynthesis, thereby increasing δ 13 C of DIC in the surface ocean and producing low-δ 13 C POC. In the interior ocean, the low-δ 13 C POC is released back into the water column during rem-10 ineralization/respiration, though without fractionation (Laws et al., 1995;Sonnerup and Quay, 2012). The 'biological isotope pump' thus creates a gradient of higher surface water δ 13 C and lower deep water δ 13 C. The average fractionation during photosynthesis is approximately 19 o / oo (Lynch-Stieglitz et al., 1995;Tagliabue and Bopp, 2008). Even though many relationships for biogenic isotope fractionation have been proposed (e.g., Rau et al., 1996;Keller and Morel, 1999;Popp et al., 1998), the modelled δ 13 C distributions are not very sensitive to the different parameterizations, especially not in the surface 15 ocean (Schmittner et al., 2013;Jahn et al., 2015). In addition, some relationships may be unsuitable for global scale modelling applications due to the dependency on unknown parameters (e.g., specific species, cell size, and cell membrane permeability).
A parameterization by Laws et al. (1997) is chosen in NorESM2, where the biological fractionation ε bio depends on the ratio between phytoplankton growth rate µ [day −1 ] at every model time step and the aqueous CO * 2 concentration [µmolkg −1 ]: (23) 20 The growth rate µ is the brutto growth rate, uncorrected for losses such as mortality and exudation. ε bio increases with increasing CO * 2 and decreasing µ. Fractionation values are kept within a realistic range of 5-26 o / oo to correct for the influence of µ/CO * 2 extremes (similarly as done by Tagliabue and Bopp, 2008). This nonlinear parameterization of Laws et al. (1997) as described in Eq. 23 is preferable over the linear formulation by Laws et al. (1995), because the linear formulation could result in unrealistic fractionation at high growth rates. Isotope equilibrium fractionation during CaCO 3 formation increases 25 δ 13 C of CaCO 3 and therefore depletes seawater of 13 C (thereby lowering seawater δ 13 C). Nevertheless, the fractionation effect during CaCO 3 formation is relatively small (i.e., in the order of -2 to +3 o / oo , depending on species and environmental conditions; Grossman and Ku, 1986;Ziveri et al., 2003;Zeebe and Wolf-Gladrow, 2001) compared to the effects of air-sea gas exchange and photosynthesis. Therefore, fractionation during CaCO 3 formation is commonly omitted in modelling studies (e.g., Lynch-Stieglitz et al., 1995;Schmittner et al., 2013). There are additional uncertainties related to temperature-and 30 species-dependencies (Grossman and Ku, 1986;Zeebe and Wolf-Gladrow, 2001). Taking this into consideration, we have chosen not to implement fractionation during CaCO 3 formation in NorESM2.

Diagnostic and initialization
In order to evaluate the carbon isotopes against observations, we compute the diagnostic variables δ 13 C and δ 14 C according to their standard formulations, as follows: and 10 where ( 13 C/ 12 C) PDB is the established Pee Dee Belemnite standard ratio of 0.0112372, and NBS std is 1.170e −12 . The DI 12 C is computed as the difference between total DIC and DI 13 C.
In the model, the initialization of the carbon isotope tracers is done as follows: first, the model is spun up with preindustrial boundary conditions until the non-isotope carbon chemistry reaches a quasi-equilibrium state. Next, we compute DI 13 C initial values by solving Eq. 24 for 13 C using the simulated AOU and DIC together with the δ 13 C:AOU relationship reported by using Eq. 25. The remaining isotope tracers are initialized as C · R · ζ , with C as the total carbon counterpart of the respective isotope tracer, R as DI 13 C/DI 12 C or DI 14 C/DI 12 C ratio, and ζ = 0.98, a typical value for biological fractionation (applied to 20 organic carbon components only).
Here, the prognostic atmospheric pCO 2 was initialized at 278 ppm from the start of the simulation. At initialization of the carbon isotopes, atmospheric δ 13 C and ∆ 14 C are set to -6.5 o / oo and 0 o / oo , respectively. Lastly, the results are presented as calibrated ∆ 14 C to account for long-term drift. That is, DI 14 C is multiplied with a factor F before standardization to DI 14 C corresponding to an atmospheric ∆ 14 C of zero permil ( 14 C atmzero ).
where 14 C atm in our simulation is found from the pre-calibrated ∆ 14 C atm of approximately 36 o / oo and 14 C atmzero is likewise determined by solving Eqs. 25 and 26 for ∆ 14 C=0 o / oo in both cases using model δ 13 C atm (-7.5 o / oo ) and atmospheric pCO 2 (294 ppm). This leads to F=∼2.5%.
Due to the long time scale of the large-scale ocean thermohaline circulation (flushing time is in the order of 1000-1500 years), a sufficiently long model integration in the order of at least 1000 years is required to achieve a quasi-equilibrium biogeochemical state in the interior ocean (Séférian et al., 2016). Prior to running any transient simulations, we have spun the model up for 1200 model years in a fully coupled configuration so that the ocean biogeochemistry reaches a quasi equilibrium state. In the spin-up 5 simulation, all boundary conditions are fixed to constant preindustrial values, following the CMIP6 protocols (Eyring et al., 2016a). Due to the oceanic DMS fluxes, there is an internal feedback between the ocean biogeochemistry and the atmospheric radiative forcing during the spin-up. The end of the model spin-up is used as a starting point for (1)  Except for the carbon isotopes (see next paragraph), all analyses shown in this paper were extracted from the two transient historical simulations (prescribed CO 2 -historical and prognostic CO 2 -esm-hist). Since both simulations produce nearly identical climatology states, we only present results from the prescribed CO 2 -historical simulation.
For the newly implemented carbon isotopes, a considerably longer spin-up is required. Therefore, the carbon isotope tracers 20 were not activated in our fully coupled simulations. Instead, simulations with carbon isotopes switched on have been performed using the ocean carbon cycle stand-alone configuration of NorESM2. Additionally, a lower resolution ocean grid (using a similar tripolar grid as described above but with a nominally 2 • resolution) has been used. This configuration avoids the high computational cost of running a long spin-up in a fully coupled configuration. The atmospheric forcing of the spin-up is the CORE normal year forcing (Large and Yeager, 2004), which represents a climatological mean year with a smooth transition 25 between end and start of the year. Atmospheric CO 2 , 13 CO 2 , and 14 CO 2 concentrations are kept track of with a box-atmosphere (i.e., assuming 100% instant mixing), which is updated each time-step according to the modelled air-sea CO 2 fluxes using a conversion factor of 2.13 ppm Pg C −1 ). These simplified prognostic atmospheric fields are simulated from the start of the spin-up (and the subsequent start of the carbon isotope simulation). As mentioned above, we have not implemented the carbon isotopes in the sediment compartment of the model. Therefore, the sediment module of HAMOCC was switched off for the 30 carbon isotope simulations presented here. Otherwise, the setup of the ocean carbon cycle stand-alone configuration is as described in Schwinger et al. (2016). The model spin-up was run for 5000 years, of which the first 1000 years were run without carbon isotopes. At year 1000 (4000), after the largest transient changes in biogeochemical tracers have flattened out, we initialised the 13 C ( 14 C) tracers as described above. At the end of year 5000, the model reaches a quasi-equilibrium state, which approximately corresponds to the preindustrial state. Here, we evaluate the simulated carbon isotope simulations against the preindustrial estimates of the respective tracers.

Results
Here, we evaluate the model performance in simulating the mean climatological state of key biogeochemical tracers as well as the evolution of air-sea CO 2 fluxes from the transient historical and esm-hist simulations. The carbon isotope results are from 5 a stand-alone ocean simulation.

Statistical performance summary
We assess, relative to the observational estimates and the earlier version NorESM1, the simulated long-term annual mean of global hydrography and biogeochemical tracer distributions. Where relevant, the mean seasonal cycles are also evaluated for the surface values. The list of parameters, the averaging periods, and the respective observational data used to assess the 10 model performance are given in Table 2. We use spatial correlation and normalized-RMSE (root mean squared error) metrics to measure the model-data difference and determine whether or not the current model version has improved and performed better than the earlier version. The normalization was done by dividing the RMSE with the standard deviation of the respective observations. Figure 2b shows that, except for the surface layer, the normalized-RMSE of most of the NorESM2 variables' mean clima- 15 tology is either noticeably improved or relatively similar to that of NorESM1. At 500 m depth, all variables but salinity have lower RMSE values. For many of the biogeochemical tracers (phosphate, nitrate, oxygen, and DIC), NorESM2 shows better agreement with data throughout the water column. Similarly, the respective spatial correlations with the observations are considerably improved for most variables, especially for nitrate, DIC, and alkalinity (Fig. 2a). For surface seasonality, NorESM2 performs fairly comparable to NorESM1, with improvements in all seasonal metrics (NRMSE and spatial pattern) for surface 20 salinity and net primary production. NorESM2 simulates slightly larger NRMSE in its surface nutrients (phosphate, nitrate, and silicate) in both their annual and seasonal average values relative to NorESM1. This is attributed to the anomalously high concentration in the Arctic Ocean and parts of the Southern Ocean (just north of the Antarctic Circumpolar Current). More details on the performance of each specific variable are discussed in the following subsections.

25
NorESM2 simulates a warm bias for both preindustrial conditions and the contemporary surface ocean (Fig. 3d), where a warm bias as high as 5 • C is simulated in some regions of the Southern Ocean. Cold biases are simulated in parts of the north and equatorial Pacific, North Atlantic subpolar gyre, and the Arctic Ocean. The climatological mean temperatures are broadly similar between both NorESM versions with NorESM2 having less spatial correlations at 1 and 1.5 km depths (Fig. 2a,b). At depths below 2 km, with the exception of North Pacific, NorESM2 simulates less biases (Fig. 3e,f,h,i). This improvement is

Salinity
The RMSE in surface salinity is reduced in NorESM2, mostly owing to improvements in the Arctic, where the previous model version simulates too saline water (Fig. 4d,g). Also at the surface, NorESM2 simulates anomalously too fresh and too saline 10 biases in the Pacific and the Atlantic basins, respectively. In the subtropical south Pacific, the negative bias is as high as -2 psu, and is consistent with a too strong precipitation rate in this region simulated by the atmospheric component (not shown).
Positive biases are simulated in the northern Indian Ocean, and along the Benguela current extension. In the interior, the salinity bias in the Pacific and the Atlantic is in the order of ±0.11 and +0.4 psu, respectively (Fig. 4e,f). In the NADW, NorESM2 displays a positive bias, which is, however, smaller than that of NorESM1 ( Fig. 4h; bias greater than 0.5 psu). Similar to tem-15 perature bias, NorESM2 produces too much saline water around 30 • N at 1 km depth in the Atlantic (Fig. 4e), due to overflow of high salinity Mediterranean watermasses. In the interior Pacific and Southern Ocean, both models' performances are generally comparable (Fig. 4e,f,h,i). Except for the surface salinity, simulation with lower atmospheric resolution (NorESM2-LM) depicts similar salinity pattern throughout the water column (Supplemental Fig. S3).  Fig. S4).

Ocean ventilation
To assess the simulated ventilation, we compare the passive chlorofluorocarbon (CFC-11) tracer distribution in the interior simulated CFC-11 from calendar year 2000 with climatological estimates of Key et al. (2004). Similar to the observations, high values of CFC-11 are generally simulated in the upper 1 km in both the Atlantic and the Pacific, with the exception of the North Atlantic, where up to 0.5 nmol CFC-11 m −3 penetrates down to 5 km depth. In the mid-latitude North Atlantic (i.e., 30 • N), the model is unable to simulate the observed high values at around 1 km depth. This is likely due to the discrepancy in the watermass origin between the model and observations. Here, the watermass in the model is too old, as also seen in the 5 too high Apparent Oxygen Utilization and dissolved inorganic carbon (see below subsections on oxygen and DIC). The model simulates too strong deep water ventilation in both the Atlantic and Pacific sectors of the Southern Ocean as well as in the high latitudes of the North Atlantic. Similar ventilation patterns are also produced by NorESM2-LM (Supplemental Fig. S5), with slightly less ventilated North Atlantic.

10
In general, the climatological nutrient distributions (phosphate, nitrate, and silicate) in NorESM2 are either comparable or improved compared to those of NorESM1, both in spatial correlation and normalized RMSE (Fig. 2a,b). At depth, these improvements are due to the new particulate organic carbon sinking scheme, where the sinking velocity increases linearly with depth. In NorESM1, the sinking speed is constant with depth, leading to more organic material being remineralized in the upper 1.5 km. The sinking scheme in NorESM1 produces anomalously high regenerated phosphate at intermediate depth 15 (approximately between 100 and 1500 m depth; Fig. 7h,i). A consequence of this is a too strong oxygen minimum zone (OMZ; see also next subsection) and denitrification in the low latitudes, where the nitrate concentration becomes overly depleted.
On the other hand, at depths greater than 1500 m, NorESM1 exhibits too low concentrations of all nutrients relative to the observations (panels (h) and (i) of Figs. 7-9).
With the new sinking scheme, more organic material reaches the deep ocean and gets remineralized there. As a result, 20 model data biases in phosphate and nitrate in the interior are noticeable improved (Figs. 2b, 7e,f, 8e,f). In the Pacific OMZ, a reduced export production in NorESM2 also contributes to the reduced nutrient bias. In the interior North Atlantic, despite strong positive spatial correlation with the observations, NorESM2 shows a positive bias at depth below 3000 m and north of 30 • S (Figs. 7e, 8e, 9e). Through analyzing the quasi-conservative 'PO' tracer (Broecker, 1974), we are able to attribute this to the bias in the simulated watermass (Supplemental Fig. S1). In NorESM2, this region is dominated by Antarctic Bottom 25 Water, whilst the observations indicate NADW as the main watermass. As with NorESM1, the current model still simulates too low nutrient concentrations in the North Pacific intermediate watermass (Figs. 7f, 8f, 9f). This is likely associated with the too low surface primary production simulated in this region (see also Section 5.8), leading to limited organic matter available for remineralization at depth. We note that the circulation bias in this region could also contribute to the nutrient bias. In the Southern Ocean, too strong ventilation potentially leads to the negative nutrient biases in both Atlantic and Pacific sectors. 30 In NorESM2, less export production and deeper distribution of POC through the changed sinking scheme lead to a reduced rate of denitrification in the Pacific oxygen minimum zone and a greatly improved nitrate distribution (Fig. 8f). In NorESM1, far too much nitrate was consumed for denitrification (Fig. 8i). The silicate spatial distribution in NorESM2 closely resembles the phosphate distribution (Figs. 9a-c and 7a-c), with similar biases across the vertical sections of Atlantic and Pacific (Figs. 9e,f and 7e,f). At high-latitudes, NorESM2 overestimates the surface silicate concentration (Fig. 9d), which could be attributed to the reduced opal export sinking speed relative to our earlier model version (30 instead of previously 60 m day −1 ).
In NorESM, the bulk phytoplankton growth rate is limited by multiple nutrients (i.e., phosphate, nitrate, and dissolved iron), in addition to temperature and light (see also Section 2.3.2 of Schwinger et al., 2016). The model does not explicitly simulate diatom and calcifier classes.The inclusion of iron is critical to simulate the observed HNLC (High Nutrients Low Chlorophyll) 5 regions in the world ocean where year-long elevated concentrations of both nitrate and phosphate are not exhausted by phytoplankton (de Baar et al., 1995;Martin and Fitzwater, 1988). The three major HNLC regions are the subarctic North Pacific, the eastern and central equatorial Pacific, and the Southern Ocean. Here, low levels of bio-available iron concentrations limit the phytoplankton growths. In NorESM1, only the subarctic North Pacific is simulated to be iron-limited, with the other two HNLC regions being mostly limited by nitrate, as shown in Fig. 10. Iron limitation is also incorrectly simulated in parts of the 10 subtropical North Pacific, South Pacific, and most parts of the western Pacific. Following improvements in the iron parameterization, the three HNLC regions are now shown to be iron-limited in NorESM2. Nevertheless, the iron limitation bias in the western equatorial Pacific remains.

Dissolved oxygen
At surface level, dissolved oxygen is mostly constrained by solubility and hence sea surface temperature. Both NorESM1 and 15 NorESM2 perform similarly for dissolved oxygen at the surface with subtle improvements in the seasonal spatial correlation coefficient of NorESM2 (Fig. 2a,c). As with nutrients, the climatological distributions of interior oxygen improved considerably when compared to NorESM1 (Fig. 11), which is one of the CMIP5 ESMs with too strong OMZ (Cabré et al., 2015). In NorESM2, the OMZ volume is reduced considerably in both the equatorial Atlantic and Pacific, and the absolute dissolved oxygen values are also much improved (Fig. 11e,f). Here the remaining bias is related to the nutrient trapping issue (Six and 20 Maier-Reimer, 1996), as can also be seen from too high apparent oxygen utilization (AOU; Fig. 12d). In the interior Pacific and Atlantic roughly at 2 km and deeper, NorESM2 simulates considerably higher oxygen concentration than the observations by as much as 50% (North Pacific deep water; Fig. 11f). In the North Pacific deep water, this is primarily due to the still too little organic matter flux reaching below the mesopelagic zone, leading to too low apparent oxygen utilization (Fig. 12d). The uncertainty in the dissolved organic carbon (not shown), which is currently not well constrained due to lack of observational 25 data, could also play a role in this. In addition to biological constraints, the remaining interior oxygen bias can be attributed to the too strong Antarctic Bottom Water (AABW) ventilation, which carries a colder (Fig. 3e,f) and higher oxygen saturated watermass into both the Atlantic and Pacific bottom waters (Fig. 11e,f).

Biological production
In NorESM1, the primary production is generally too strong in the equatorial Pacific and at high latitudes during summer 30 months. In the subtropical oceans, the simulated production is too low relative to the observational estimates. With the newly tuned ecosystem parameters of NorESM2, the spatial productivity patterns are improved considerably with an average absolute bias in the open ocean of less than 5 mol C m −2 yr −1 (Fig. 13a). In NorESM1 these biases reach more than 10 mol C m −2 yr −1 , especially in the Equatorial Pacific, the South Atlantic and the Southern Ocean. Seasonally, the spatial patterns are also improved when compared to the observations with a correlation of approximately r =0.5 (Figs. 2c, 13c). The RMSE has also reduced considerably, particularly in the October-November-December months, where NorESM1 simulates a much too strong spring bloom in parts of the Southern Ocean which is overestimated by as much as a factor of eight (e.g., Figs. 13e, 14e and Nevison et al., 2015). Similarly, the spring blooms in the North Atlantic and North Pacific are overestimated in NorESM1 5 (Figs. 13e, 14a). The improved spring bloom characteristics at high latitudes, however, comes at the cost of a too low annual mean productivity in the North Atlantic and North Pacific.
Due to unresolved physical dynamics, the model is still unable to reproduce the observed high productivity in coastal regions.
Globally, the contemporary total primary production in NorESM2 (NorESM1) is 35.3 (39.9) Pg C yr −1 . These estimates are lower than the observational (MODIS, excluding coastal areas) estimates of 46.1 Pg C yr −1 but within the broad range of 10 CMIP5 models . In our model, export production is estimated as the flux of particulate organic carbon exiting the base of the euphotic zone (here assumed to be 100 m depth), as a function of zooplankton and phytoplankton mortalities and zooplankton fecal materials. Table 3 shows that the export production in NorESM2 is 5.39 Pg C yr −1 , as compared to 7.90 Pg C yr −1 simulated by NorESM1. Current estimates of the global export production remain highly uncertain, from 5 to >12 Pg C yr −1 , with a more recent satellite-based approach that accounts for food-web processes revealing values of 6±1.2 Pg C 15 yr −1 (Siegel et al., 2014). Table 3 also summarizes other export production metrics from both model versions. As a consequence of the new particle sinking scheme, there is more POC exported into the deep ocean. This is especially reflected by the values of transfer efficiency (T e f f −1km ), which is calculated as a ratio between POC fluxes at a depth of 1000 m and of 100 m.
The T e f f −1km in NorESM2 (0.24) is increased by a factor of four relative to that in NorESM1 (0.06). Compared to a recent estimate of transfer efficiency 20 reconstructed from observed interior biogeochemistry ( Fig. 13b; Weber et al., 2016), NorESM2 still simulates too high transfer efficiencies of >50% in the eastern Equatorial Pacific (Fig. 13d). On the other hand, the transfer efficiency in the Southern Ocean and northern high latitudes are comparable with observations. This represents a significant improvement when compared to NorESM1 (Fig. 13d,f), which simulates nearly uniform T e f f −1km values of less than 0.1 in all ocean regions with the exception of eastern Equatorial Pacific and Equatorial Atlantic.
25 Figure 14 shows the seasonal cycle of biological production rates as simulated by NorESM1 and NorESM2 averaged over different ocean regions together with observation-based estimates. The regional mean values in NorESM2 are generally lower than those in NorESM1, except in the North and tropical Pacific (Fig. 14b,c). Nevertheless, Fig. 2d shows that NorESM2 simulates a lower normalized RMSE than NorESM1 in all seasons. As stated above, this bias could be reduced even more if we excluded the continental shelf regions. Nevertheless, Fig. 14 shows that the seasonal phase and amplitude of biological 30 production rates in NorESM2 in the North Atlantic, North Pacific, and the southern hemisphere regions are closer to the observations when compared to NorESM1. In the southern hemisphere (south of 20 • S), NorESM2 no longer produces the large summer bias seen in NorESM1.

DMS production and fluxes
In NorESM2, DMS production is tuned towards the climatological observations from Lana et al. (2011). Figure 15 shows the surface DMS concentration for each season together with the observation-based estimates. To first order, the DMS distribution follows the spatial pattern of the simulated primary production, with higher concentrations during spring/summer periods in high latitude regions. In the Equatorial Pacific and Atlantic upwelling regions, the model overestimates DMS concentrations.

5
Similarly, in the North Pacific, where the model underestimates productivity during the boreal spring bloom (see also Fig. 14b), the DMS concentration is lower than observed (Fig. 15b,f).
The simulated DMS concentration in the North Atlantic is comparable with observations. In the Southern Ocean, the model produces the observed high concentrations during summer months (JFM) along the 40-45 • S latitude band well Fig. 15e). In winter periods, the DMS concentration approaches zero in the model owing to the too low productivity, whilst the observations 10 still indicate values of approximately 1 µmol S m −3 . For the present day period , the simulated global DMS flux is 19.76±0.19 Tg S yr −1 , which is at the lower end of observation-based estimates (17.6-34.4 Tg S yr −1 ).

Dissolved inorganic carbon and alkalinity
Except for the polar Southern Ocean, the surface alkalinity concentrations in NorESM1 are approximately 100 µmol L −1 higher than the observational estimate. And since the spin-up was forced with constant preindustrial atmospheric CO 2 concen- 15 trations, the surface DIC concentration adjusted (also anomalously high bias; Figs. 16g and 17g) to yield the 'correct' surface pCO 2 . Consequently, biases in surface DIC and alkalinity compensate one another to yield the approximately 'correct' carbonate ion concentration (i.e., alkalinity minus DIC), seawater CO 2 buffer capacity, and air-sea CO 2 fluxes.
In NorESM2, we alleviate the surface alkalinity bias by increasing the surface sink associated with calcium carbonate production. This is done by increasing the CaCO 3 to phosphate uptake ratio (R CaCO 3 :P in Table 1). As a result, the export of 20 particulate CaCO 3 increases from 0.49 Pg C yr −1 in NorESM1 to 0.66 Pg C yr −1 , with both estimates still within the range of the observation-based synthesis of 0.52±0.15 Pg C yr −1 (Dunne et al., 2007). This modification increases the PIC:POC export ratio to 0.13 (0.06 in NorESM1, see also Table 3) and considerably improves the surface DIC and alkalinity concentrations in NorESM2 (Figs. 16a,d and 17a,d). At the same time, the spatial correlations with observations are also improved as well as the normalized RMSE (see Fig. 2a,b). We note that in few regions, e.g., the subtropical South Pacific, the model still underestimates 25 the observed DIC and alkalinity concentrations. We note that the PIC:POC ratio is still within the large range of other studies (0.03 to 0.25; Koeve, 2002;Sarmiento et al., 2002) As at the surface, the spatial distributions and normalized bias of interior DIC and alkalinity in the NorESM2 improve considerably throughout the water column relative to NorESM1 (Fig. 2a,b). In both the interior Atlantic and Pacific basins, the spatial distribution and magnitude of DIC biases closely resembles that seen in the nutrients (e.g., phosphate; Figs. 16e,f, 7e,f), 30 when multiplied by the constant stoichiometry in the model (R C:P =122). This suggests that the mechanisms driving the nutrient bias are also responsible for the bias in interior DIC. For instance, the positive anomaly in the equatorial Pacific between 1 and 3 km depths originates from too much biological remineralization in the model.
The simulated interior alkalinity concentrations in NorESM2 between 1 and 3 km depths are anomalously high, which can partially be attributed to too low interior remineralization in the Atlantic (see also AOU values Fig. 12c) or too high CaCO 3 export production (Table 3).

Surface pCO 2 and sea-air CO 2 fluxes
The NorESM2 pCO 2 spatial correlation relative to the observations is improved in almost all seasons when compared to 5 NorESM1, while the NRMSE is reduced for JFM and AMJ months (Fig. 2c,d). Within these months, improvements are mostly seen in the Southern Ocean, where too strong surface mixing in NorESM1 leads to anomalously high pCO 2 . The climatological mean of contemporary surface pCO 2 in NorESM2 compares well with the observational compilations, as seen in Fig. 18a,e. Similar improvements are also seen in the CO 2 fluxes, where NorESM2 simulates the broad spatial patterns relatively well (Fig. 18c,g). NorESM2 also produces moderately weaker annual mean carbon flux in the northern mid-to high-10 latitudes than NorESM1. Figure 18f,h shows the zonally-averaged monthly surface pCO 2 and CO 2 fluxes in NorESM2. Here, the model also agrees well with the observation-based estimates (Fig. 18b,d) in term of amplitude and temporal variability, with distinct seasonal cycle between 20 • and 45 • south/north. In the extra-tropical oceans (between 45-65 • N and south of 60 • S), the simulated winter (January-March and July-September in northern and southern hemispheres, respectively) pCO 2 is considerably lower than that seen in the observations. In these regions, nevertheless, the carbon sink during these periods is 15 comparable with observations (Fig. 18d,h). (hence a weaker carbon sink) in these regions than the observational estimates from Takahashi et al. (2009). Despite the warm SST bias in most of the tropics, NorESM2 simulates a moderately weaker than observed CO 2 outgassing throughout the year (red line as compared to blue line in Fig. 19c). A potential explanation for this is the weaker upwelling rates of DIC-rich deep watermasses in the model. In the Southern Ocean (south of 40 • S), strong bias in the seasonal amplitude in NorESM1 is 25 considerably reduced (Fig. 19e,f), due to the improvements in the biological production in this region. Nevertheless, here the seasonal phase in NorESM2 is approximately reversed compared to observations: the strongest sink is simulated in the austral winter months whereas observational estimates indicate summer months.

Transient changes
In this subsection, we describe the global mean surface air temperature and oceanic carbon uptake over the historical period 30 (1850-2014) as simulated by NorESM2. For this, we discuss the two transient simulations: (i) historical and (ii) esm-hist.
Similar to SST (see Fig. 3), there is a warm-bias in the simulated surface air temperature. For the period of 1850-1859, the NorESM2 global mean surface air temperature is 14.5 • C compared to 13.7 • C from the HadCRUT4 dataset (Morice et al., 2012). Nevertheless, the simulated warming between 2005-2014 and 1850-1859 is 0.7 • C, comparable to that from observations of 0.8 • C (Fig. 20a). The warming in esm-hist is lower, 0.6 • C. Figure 20a also shows that NorESM2 (orange and green lines) simulates warming rates that are comparable to observations (blue line) between 1970 and 2010, but shows a stronger cooling between 1950 and 1970 (historical and esm-hist).
In both simulations, the global mean surface ocean pCO 2 follows the respective atmospheric CO 2 trend over the simulation 5 periods. Towards the end of the historical period, the oceanic pCO 2 (solid green and orange lines) diverges from, i.e., is lower than, the atmospheric counterpart (dashed-green and -orange lines) by approximately 20 ppm (Fig. 20b). Consistent with the lower surface ocean (than the atmosphere) pCO 2 , the ocean carbon uptake continues to increase over most of the transient period in both simulations (Fig. 20c). In the historical simulation, the carbon sink stabilizes at roughly 0.7 Pg C yr −1 between 1910 and 1960. This flattening of the uptake strength is associated with the slow down in the atmospheric CO 2 growth rate.
10 Figure 20d confirms that the atmospheric growth rate weakens during this time window (orange bars in Fig. 20d). On the contrary, the carbon sink in esm-hist steadily increases during the same period, also consistent with the steady increase in the simulated prognostic atmospheric CO 2 (green-bars in Fig. 20d).
From 1960 onward, the atmospheric CO 2 growth rates in both simulations increase almost linearly in time from 0.8 to more than 2.0 ppm yr −1 . Consequently, the ocean carbon uptake strengths increase as well, from approximately 1 Pg C yr −1 in the 15 1960s up to 2.5 Pg C yr −1 in 2014 (in both historical and esm-hist). Rapid strengthening in ocean carbon uptake is also evident in estimates from the Global Carbon Project (GCP, blue line in Fig. 20; Le Quéré et al., 2018). For the 1980s and 1990s, the simulated carbon uptake is also well within the IPCC (Intergovernmental Panel for Climate Change, Denman et al., 2007) estimates (Table 4). In NorESM2, a new tracer DIC nat has been implemented (see also Section 3.2), and it enables us to more accurately estimate the anthropogenic carbon uptake and storage in the ocean. Using this tracer, we can also estimate the net 20 flux of anthropogenic carbon into the ocean during the historical period, as depicted by the purple line in Fig. 20c). The strong resemblance between the orange and purple lines suggests that the long-term trend in oceanic carbon sinks is associated with the increasing atmospheric CO 2 , rather than changes in climate states (as expected for the historical runs where the change in the climate state is relatively small). We note that, in the preindustrial control simulation, the ocean is a weak net carbon sink, of approximately 0.1 Pg C yr −1 . Table 4 also shows that the cumulative carbon uptake by the ocean for 1850-1994 and 25 1994-2011 is well within the observation-based estimates. The cumulative carbon uptake in NorESM2 is lower than that of NorESM1 and compares better with observations. 5.13 Distribution of δ 13 C and ∆ 14 C As stated above, the carbon isotopes tracers are not included in the coupled ESM configuration and only simulated in a preindustrial stand-alone ocean configuration. We also note that although the stand-alone ocean simulation is not a direct 21a-c shows that the spatial pattern of δ 13 C is in fair agreement with the gridded preindustrial observational estimate of Eide et al. (2017). Nevertheless, the ocean component of NorESM2 simulates weaker variability than observed, i.e., it is unable to reproduce both very low (negative) and high (positive) values. Therefore, the simulated δ 13 C concentrations in the relatively younger watermasses, such as the North Atlantic Deep Water (NADW) and the Antarctic Intermediate Water (AAIW), have negative bias relative to the observations (Fig. 21e). On the contrary, positive biases are depicted in the older Antarctic Bottom Water (AABW) and in the deep North Pacific (Fig. 21e,f). The δ 13 C can be decomposed into biological and residual (air-sea 5 gas exchange and circulation) components (Broecker and Maier-Reimer, 1992;Eide et al., 2017): where δ 13 C BIO can be estimated as a function of phosphate and DIC distribution and their global mean values: Applying this decomposition, we can attribute the majority of the differences between model and observational estimates of 10 δ 13 C to biases in the biological component (Fig. 22). The δ 13 C BIO reveals that in the Southern Ocean (south of ∼55 • S), as well as in its exported deep waters, NorESM2-OC simulates too high δ 13 C due to the too small regenerated signal in these waters.
The negative δ 13 C anomaly in the North Atlantic of about -0.5 o / oo is a combination of a too low δ 13 C BIO signature in NADW ( Fig. 22e), combined with a too negative influence from air-sea gas exchange and circulation as compared to observational estimates (the residual term δ 13 C AS ). In the Pacific the positive anomaly from the Southern Ocean persists throughout most of 15 the basin due to too weak remineralization, except for at intermediate depths where remineralization is high (see also AOU, Fig. 12).
The simulated ∆ 14 C distribution is in general agreement with the pre-industrial observational estimate by Key et al. (2004). and biases in air-sea equilibration and fractionation. Calibrated ∆ 14 C atm is by definition 0 o / oo , δ 13 C atm equilibrates at -7.5 o / oo , 25 and atmospheric pCO 2 is 294 ppm at the end of the simulation.

Summary and discussion
The ocean biogeochemical component of the Norwegian Earth System Model (NorESM) has been updated from version 1 (as used in CMIP5) to version 2 (NorESM2). These developments focus on alleviating known biases in mean states and seasonal cycles of key variables when compared to the observations. This paper describes new and improved processes, newly imple- 30 mented diagnostic and carbon isotope tracers, and highlights the model improvements relative to the earlier model version.
On the biogeochemistry side, we have introduced a revision to the particulate organic carbon vertical sinking scheme, an improved tuning of ecosystem parameters, riverine inputs of nutrients and other biogeochemical constituents, an updated air-sea gas exchange parameterization, and atmospheric deposition of nitrogen. In NorESM2, the ocean biogeochemistry also simulates (i) fully interactive oceanic DMS fluxes coupled to the atmospheric model, (ii) newly implemented preformed tracers, which enable us to perform a more detailed diagnosis of physical and biogeochemical sources and sinks in future studies, and 5 (iii) carbon isotopes that can be used e.g. in extended paleo time-scale simulations.
On the physical side, the simulated Atlantic Meridional Overturning Circulation strength is considerably improved. An hourly coupling (previously daily) across the ocean-atmosphere-ice interfaces has been implemented. Intensive fully coupled testing prompted further attempts to optimize mixed layer physics parameters, which slighly reduce the bias in MLD when compared to our earlier model version. ical state, such as alkalinity, dissolved inorganic carbon, and buffer capacity, in order to adequately simulate the long-term trends (Hauck and Völker, 2015;Fassbender et al., 2018;Lebehot et al., 2019). NorESM1 exhibits a considerable bias for the background biogeochemical state, especially in surface alkalinity concentrations. These biases have been substantially reduced in NorESM2, resulting in a more adequate representation of the background buffer factor and improved model fidelity in simulating the temporal evolution of the ocean carbon sink. 25 The seasonality of upper ocean biogeochemistry has been identified as critical for quantifying a model's uncertainty in its long-term projections of oceanic carbon uptake (Fassbender et al., 2018). Further, with ongoing anthropogenic CO 2 -emissions, the seasonal variability of regional sea-air CO 2 -fluxes is expected to amplify. This amplification, which is linear for the thermal component and nonlinear for the biophysical component of oceanic pCO 2 (Fassbender et al., 2018), can also impact the longterm evolution of sea-air CO 2 -fluxes. In this aspect, NorESM1 simulates a considerable bias in the seasonal cycle of biological 30 production and air-sea CO 2 fluxes, particularly in the Southern Ocean. The updated set of ecosystem parameters in NorESM2 is able to mitigate these biases to some extent. In the North Atlantic, North Pacific, and most parts of the Southern Ocean, the seasonal cycle (both in phase and amplitude) of primary production and CO 2 -fluxes in NorESM2 reflects observational estimates and their driving mechanisms more accurately (Landschützer et al., 2018). We note however that there is a mismatch between the CO 2 -fluxes' seasonal cycle in observations and NorESM2 in the Southern Ocean between 40 • N and 60 • N. A 35 closer attention to the associated physical (e.g., solubility) and biogeochemical (e.g., biological productivity) drivers in this region is needed when considering future scenarios in the future. However, general seasonal characteristics are improved and we are confident that NorESM2 will provide a better estimate of future changes.
Despite numerous model improvements, several biases remain in the updated model. A long-standing issue of nutrient trapping in the intermediate depth of the equatorial Pacific persists. Here, the simulated phosphate concentration is still higher 5 than observed, but the bias is less severe than that of NorESM1. Consistently, dissolved inorganic carbon is still anomalously high and oxygen too low in the same region. The simulated Southern Ocean still has a too strong ventilation, leading to biases in the newly formed bottom watermass. Other physical biases such as the Antarctic Bottom Water penetrating far north also contributes to the simulated model-data differences in the interior North Atlantic.
Currently, evaluations of ocean biogeochemical components in ESMs have been largely focused on assessing the model 10 ability in simulating the mean climatological state. However, further options exist (Eyring et al., 2016b;Heinze et al., 2019).
The ocean biogeochemical observing community has put a lot of effort in sustaining key monitoring networks, such as the Surface CO 2 Atlas (SOCAT; Bakker et al., 2014) and the Global Ocean Data Analysis Project (GLODAP; Olsen et al., 2016).
This critical observational network has provided additional valuable constraints for ESMs, such as by confronting the model ability to simulate seasonality of surface ocean pCO 2 as well as long-term trends in surface ocean pCO 2 , 15 ocean acidification , and deoxygenation . As ESMs simulate emerging climate change signals in the coming decades (Henson et al., 2017), sustaining these observations into the future is becoming more important than ever.
In order to tackle the challenges associated with climate change, ocean modules in Earth system models could be further upgraded to include interactive methane (CH 4 ), nitrous oxide (N 2 O), and biogenic volatile organic compounds (BVOCs) 20 cycles for enabling model projections in respective emission driven frameworks. Moreover, NorESM2 still has a relatively simple marine biology that would benefit from more key functional groups as well as their sensitivity to climatic stressors. In preparation for the anticipated increasing spatial resolution in the physical components of the model, different data assimilation methods (combined with state-parameter estimation on the basis of observations) are currenty being explored to enable us to perform optimization of the current ecosystem parameterization more efficiently (Tjiputra et al., 2007;Gharamti et al., 2017). 25 Further NorESM-internal improvements may consider online estimated or progonostic organic carbon emissions from the ocean biogeochemistry to the atmosphere. Currently, we use fixed particulate organic carbon emissions from the ocean, linked to an upper-ocean chlorophyll-a climatology. The spatial distribution and temporal characteristics of DMS emissions may have an impact on the strength of the feedback involving DMS and aerosol-cloud interactions. Dust and iron or phosphate contained therein as well as nitrogen deposition will be computed in the future version of the atmospheric NorESM module and may 30 provide more realistic nutrient input and may thus lead to feedbacks in a coupled ESM. The biogeochemical exchange processes are also dependent on sea ice cover and require better knowledge of air-sea-interaction processes in polar and subpolar regions.
Mineralization processes in land regions may change under climate change conditions, leading to important changes in nutrient fluxes into the ocean biogeochemical systems. Biogenic Volatile Organic Compounds (BVOC) and halocarbons emitted from the ocean have been suspected to influence atmospheric chemistry, e.g., in the stratosphere. It will certainly be the ambition of further developments to investigate the sensitivity of these processes to climate change to increase complexity where needed.
Beyond their current applications, future ESMs should be developed further to provide societal relevant information that is directly relevant for policy decisions on climate mitigation and adaptation measures. This includes potential climate hazards to societies (such as marine heatwaves, sudden pH drops, and spreading low oxygen zones in the oceans) as well as the 5 risks emerging from the combination of hazards, vulnerabilities, and exposures (Oppenheimer et al., 2014). Future inclusion of socio-economic dynamics in ESMs and their feedback to climate through societal transformative action remains a big challenge (Giupponi et al., 2013) while a new ocean related socio-economic scenario framework (OSPs) has been developed mirroring the established Socioeconomic Pathways or SSPs (Maury et al., 2017). On the other hand, refining process complexity and increasing resolution may not be the only alternative for further model development, as the results of more complex model 10 simulations become more difficult to interpret (Bony et al., 2011). Therefore, simplified models and ESMs of intermediate complexity including the respective simplified ocean biogeochemistry modules, will need to be developed and used in parallel (for instance Steinacher et al., 2013).                          Centre for Climate Research. JT also acknowledges project Biodiversa (295340) and COLUMBIA (275268). High performance computing and storage resources were provided by the Norwegian infrastructure for computational science (through projects nn2345k, nn2980k, nn1002k, nn9560k, ns2345k, ns2980k, ns1002k, and ns9560k). We also acknowledge Johan Liakka for providing technical assistance and Jöran Maerz for the Te f f − 1km data. *Disclaimer.*This article reflects only the authors' view -the funding agencies as well as their executive agencies are not responsible for any use that may be made of the information that the article contains.