AWI-CM3 coupled climate model: Description and evaluation experiments for a prototype post-CMIP6 model

. We developed a new version of the Alfred Wegener Institute Climate Model (AWI-CM3), which has higher skills in representing the observed climatology and better computational efﬁciency than its predecessors. Its ocean component FESOM2 has the multi-resolution functionality typical for unstructured-mesh models, while still featuring a scalability and efﬁciency similar to regular-grid models. The atmospheric component OpenIFS (CY43R3) enables the use of latest developments in the numerical weather prediction community in climate sciences. In this paper we describe the coupling of the model components 5 and evaluate the model performance on a variable resolution (25–125 km) ocean mesh and a 61 km atmosphere grid, which serves as a reference and starting point for other on-going research activities with AWI-CM3. This includes the exploration of high and variable resolution, the development of a full Earth System Model as well as the creation of a new sea ice prediction system. At this early development stage and with the given coarse to medium resolutions, the model already features above CMIP6-average skills in representing the climatology and competitive model throughput. Finally we identify remaining biases 10 and suggest further improvements to be made to the model.


Introduction
The evolution of coupled climate models between phases of the Coupled Model Intercomparison Project (CMIP) is advancing our ability to simulate the Earth's climate and to quantify humankind's past and future impact. applicability of the lower-resolution AWI-CM3 here, with a glance at its higher-resolution performance. What we consider 50 low resolution for the OpenIFS atmosphere TCo159L91 (61km) is already beyond the practical limits of our previous AWI-CM1 and AWI-CM2 models with ECHAM6 (100 km). The higher resolution simulation that we briefly touch on features a TCo319L137 (31km) atmosphere as well as a 5-27km ocean, making the simulation more finely resolved than the HighResMIP contribution with AWI-CM1.1 HR.

55
The AWI-CM3 coupled system encompasses two major components with the atmosphere (OpenIFS) and ocean (FESOM2), as well as an auxiliary component, the runoff mapper. The fourth component (XIOS), running in parallel, handles the output from the atmospheric model.

OpenIFS 43R3 Atmosphere
For its atmospheric component AWI-CM3 uses OpenIFS, which is based on ECMWF's Integrated Forecast System (IFS) 60 (ECMWF, 2017a, b, c). With the readily available OpenIFS, ECMWF aims to provide cutting-edge performance from the world of operational numerical weather forecasting to the research community, while in turn presenting a testbed for developments that can feed back into weather forecasting.
As such, OpenIFS contains the hydrostatic dynamical core, the physical parameterizations, as well as the H-TESSEL hydrology model (Balsamo et al., 2009) and the WAM wave model (Komen et al., 1996) from the ECMWF IFS suite. The two-way 65 OpenIFS-WAM coupling makes the surface roughness calculation dependent on the wave state, which in turn influences the calculation of momentum and sensible heat fluxes. WAM wave fields are currently not directly coupled to FESOM2 for e.g.
wave induced vertical mixing calculations. H-TESSEL provides column model type soil moisture computations, while horizontal water transport on land is simulated using a separate runoff-mapper, as shown in figure 1. In contrast to the operational IFS NWP system, the 4D-var data assimilation, the non-hydrostatic core, the adjoint/tangent-linear versions, the Meteo France 70 IO server, and the subroutine level implementation of the NEMO ocean model have been removed from OpenIFS.
The defining feature of the OpenIFS numerical core is its semi-Lagrangian (Ritchie et al., 1995) semi-implicit (Robert et al., 1972) advection scheme (Ritchie, 1987;Ritchie et al., 1995), which allows for advection distances in excess of the Courant-Friedrichs-Lewy (CFL) condition. At the same grid resolution and with the same characteristic fluid velocities this permits OpenIFS simulations to be numerically stable at much longer timesteps than equivalent Eulerian integrations. What the 75 model saves in terms of computing resources can be invested instead into higher model resolution, larger ensembles or longer simulations. We use the cycle 43R3V1 version of OpenIFS released in September 2020, that is based on IFS CY43R3, which constituted the operational NWP system at ECMWF between July 2017 and June 2018. In contrast to ECHAM6, OpenIFS allows for the use of both full and reduced Gaussian grids (Hortal and Simmons, 1991), thus reducing grid cell shape distortion near the poles. 80 OpenIFS is available at a wide variety of horizontal resolutions, ranging from a TQ21 (626km) toy model to the TCo1279 (9km) operational NWP. Even higher experimental horizontal resolutions require only minor source code changes. Three options exist for the representation of highest (truncation) resolution spherical harmonics in grid point space. The linear truncation TL grids with two gridpoints for the smallest spherical harmonics, the quadratic TQ with three and the cubic octahedral TCo with four grid points (Malardel et al., 2016). Of these, the TCo grids are the most accurate and applicable to coupled cli-85 mate simulations, as the coupling takes place in grid point space. The vertical resolution choice is much more limited than the horizontal one, as each horizontal resolution is typically paired with one optimal vertical resolution, for which the model parameterizations have been tuned.
The main experiments we present were performed at a resolution of TCo159 (61km) with 91 vertical layers (TCo159L91), with some experiments at TCo319L137 (31km). Further resolutions we successfully tested computationally are TCo95L91 90 (100km) and TCo639L137 (16km). All grids have a ceiling pressure level of 0.01 hPa, with the L137 grids resolving the vertical space in between more finely than the L91 grids.

FESOM2 Ocean
The ocean dynamics of AWI-CM3 are simulated by the Finite volumE Sea ice Ocean Model (FESOM2), the second version of the global unstructured-mesh ocean model developed at AWI .
95 Figure 2. a) Horizontal resolution of the CORE2 mesh used for the main simulations presented in this work. b) As a) but for the DART mesh with higher resolution, which is based on a mix of local Rossby radius of deformation and local sea surface height variability. The DART mesh is used in Section 5.1 for an outlook of high resolution applications.
FESOM2 is formulated with a finite volume dynamical core using ALE vertical coordinates. FESOM2 is a global unstructuredmesh ocean model that has computational performance comparable to structured-mesh models. Unstructured meshes allow for local mesh refinements without sharp resolution boundaries, as encountered by classical nesting models. In practice, the mesh can be designed to follow the patterns of local sea surface height variability or to scale corresponding to the local Rossby radius of deformation (Sein et al., 2017). FESOM2 contains the embedded FESIM sea ice model . For 100 the coupled model presented here, FESIM was modified such that it calculates prognostically the sea ice surface temperature, while in the previous AOGCMs AWI-CM1 and AWI-CM2 this computation was performed in the atmospheric component.
For the experiments presented here we employ a mesh called CORE2, which has about 127k surface nodes, as shown in Koldunov et al. (2019a). The mesh has 47 vertical layers and horizontal resolution varying from 25 to 125 km depending on latitude and distance to coastlines. We tested a second mesh called DART with shorter simulations. The DART mesh has 105 3.1 million surface nodes with 80 vertical layers and horizontal resolution of 5 to 27km. We show the spacial distribution of horizontal resolution for both meshes in Figure 2. In the vertical, for the first two layers of both meshes have 5 meters resolution, and subsequent layers are in 10 meter intervals till 100 meters depth. Below 100 meters depth the CORE2 vertical resolution decreases downwards more rapidly than the DART resolution.
The low-resolution CORE2 mesh for FESOM2 is not eddy-resolving, and we employ the Gent-McWilliams (GM) parame-ice concentration and sea ice velocity, as developed by Timmermann and Beckmann (2004) based on Lemke (1987). For the horizontal viscosity, FESOM2 applies the kinematic backscatter scheme of Juricke et al. (2020) with a backscatter coefficient of 1.5. This scheme dissipates kinetic energy on small scales, but reinjects kinetic energy on large scales, resulting in an overall reduced dissipation. Detailed information on the available mixing scheme options in FESOM2 can be found in Scholz et al. (2022). For the experiments shown below, the net evaporation -precipitation -runoff (E-P-R) integrated over the global ocean is forced to 0 at each timestep by subtracting the residual.
The FESIM sea ice model is integrated directly in FESOM2 source code on a module level. The sea ice computations are done on the ocean surface grid and the dynamics use an adaptive elastic-viscous-plastic solver (Kimmritz et al., 2016;Koldunov et al., 2019b). For the thermodynamics FESIM implements a 0-layer scheme after Parkinson and Washington (1979). The 125 standalone ocean model FESOM2 allows for the optional use of the Icepack sea ice thermodynamics module (Hunke et al., 2020;Zampieri et al., 2021), representing a potential future upgrade for AWI-CM3.
Bodies of water that are cut off by land from the world oceans, such as the Caspian Sea and the Great Lakes are not simulated via FESOM2, but are included as lakes via the OpenIFS lake module.
For the low-resolution CORE2 mesh in particular, several ocean basins with narrow outflow channels are not included.

130
These are: the White Sea, Persian Gulf, Black Sea, and the Gulf of Ob. Such narrow inlets on a coarse mesh would reduce the smallest horizontal grid spacing, and thus incur a smaller timestep in order to still fulfill the CFL condition globally. The higher resolution DART mesh does not omit these basins.

Runoff-mapper
In addition to the two major components, AWI-CM3 includes a river routing scheme. It receives from OpenIFS the difference 135 between precipitation, evaporation and soil moisture over land (P-E-S), and uses a map of river basins to deliver the water to discharge points along the coastline. The current river routing component has no water storage and thus acts instantaneously.
Separation of the routing component from the atmosphere and ocean is a design decision shared with EC-Earth, where the flexibility and ease of modification are core ideas. This design keeps open the option of swiftly replacing the basic runoff mapper with a more sophisticated hydrological model, such as mHM (Samaniego et al., 2010) or CaMa Flood (Yamazaki tutes an ever-growing fraction of the computational cost of running OpenIFS. Recently Yepes-Arbós et al. (2021) implemented the parallel XML Input/Output Server (XIOS) 2.5 into OpenIFS for this reason, and we make use of it for AWI-CM3 to reduce the computational cost and increase the integration speed.
While XIOS takes the file writing out of the critical path of the simulation, the overhead cost of XIOS is non-zero. The main reason is that XIOS works only in grid point space, and therefore requires spectral fields to be transformed inside OpenIFS 155 before they can be sent to XIOS for writing. Nevertheless it provides a significant reduction of computing cost, as shown in table 1. Furthermore, XIOS enables online data postprocessing, such as vertical and horizontal interpolation, as well as temporal operators for maximum, minimum, mean etc. In doing so, XIOS reduces the number of times that files have to be written and read from disk, saving storage space, and reducing the number of job steps in the work flow. Finally XIOS allows for output directly in NetCDF, facilitating the use of AWI-CM3 model output.

160
FESOM2 contains its own bespoke parallel IO routines and the integration of an external dedicated IO Library is currently not envisioned.

Coupled Model Description
The coupled climate model is constructed by combining and building on the approaches of Hazeleger et al. (2010) and Sidorenko et al. (2015). The OpenIFS version 43R3 is common between a number of AOGCMs and ESMs with regular 165 ocean grids currently under development, including EC-Earth4 of the EC-Earth consortium and GEOMAR's FOCI-OpenIFS (Kjellsson et al., 2020). Indeed the basic functionality of the coupling interface, as well as the future ESM component integration will be shared between EC-Earth4 and AWI-CM3. Nevertheless, some differences in the coupling strategy exist, and the setup developed for AWI-CM3 shall be detailed further here.
The ESM-Tools (version 3) infrastructure software (Barbi et al., 2021) was used to manage the configuration, compiling, 170 and runtime-scripts of the coupled model, as well as to ensure simulation reproducibility.

Coupling strategy
The surface heat, mass and momentum fluxes are calculated within OpenIFS and supplied to FESOM2. Here the state variables for ocean and sea ice surface are updated accordingly. The runoff-mapper calculates its river routing after the atmosphere component computes and provides the P -E over land for a given coupling time step. 175 We employ so called concurrent coupling, with surface condition updates that are considered to be numerically independent between ocean and atmosphere. The temporal exchange of the ocean and atmosphere surface conditions is taking place at the least common multiple of the ocean and atmosphere timesteps. For the TCo159L91-CORE2 simulations the timesteps for coupling, the atmospheric model and the oceanic model are 120, 60 and 40 minutes respectively, while for the TCo319L137-DART simulation they are 60, 15 and 4 minutes. In the production mode both the atmosphere and ocean components compute their resulting from this double-sided-lag method are small compared to those stemming from e.g. spatial and temporal truncation and are generally accepted in the climate modelling community, as they allow for parallel execution of model components (Lemarié et al., 2015;Marti et al., 2021).
AWI-CM3 can also be run in a sequential atmosphere-first mode, updating the ocean at timestep t n with atmospheric fluxes 185 from t n . In this mode climate models can get very close to what would be a converged solution of an iterative coupling at the atmosphere-ocean interface (Marti et al., 2021). Integration of a Schwarz iterative method for fully converged surface coupling is not planned, due to the high computational cost compared to small reduction in model error.
On the technical side all three components of the coupled model are compiled into their own respective executables and a parallel communication library, OASIS3-MCT 4.0 (Craig et al., 2017), is integrated into each one. The realized AOGCM 190 setup is sketched in Figure 1. Since FESOM2 has previously been coupled to the atmospheric model ECHAM6 , an interface for the data exchange already existed, and the grouping of fluxes in OpenIFS has been modeled after the grouping in ECHAM6. OpenIFS CY43R3 had not been coupled via OASIS before, but an older related model OpenIFS CY40R1 was coupled via OASIS as a test case in EC-Earth 3. The coupling of interface for OpenIFS CY43R3 is inspired by this predecessor. Future releases of OpenIFS will be published with this coupling interface already included.

Climatological Performance
In this section we will outline the ability of the new coupled system to reach a stable equilibrium with constant greenhouse gas and solar forcing from the year 1850. Thereafter we test to what degree the model can simulate the climate as observed over the period 1850 to 2014, with a particular focus on the last 25 years, when the observational coverage is most dense and reliable. Finally this is followed by a characterization of the response to two idealized future CO 2 emission scenarios, one with 200 a sudden 4x increase of CO 2 , and the other with a constant increase of 1% per year, starting from 1850 values.

Spinup drift (SPIN)
A 700-year long spinup of AWICM3 was carried out under constant greenhouse gas and solar forcing from the year 1850, starting from winter Polar Science Center Hydrographic Climatology (PHC3) (Steele et al., 2001). The forcing fields were collected from the input4MIPs data server (https://esgf-node.llnl.gov/search/input4mips/, last access: 6 November 2019). Aerosol fields 205 were kept at present day levels, as the integration of the emissions-based aerosols into OpenIFS CY43R3 through the EC-Earth Consortium with tracing via the M7 model (Vignati et al., 2004) is still ongoing. This implies a somewhat colder pre-industrial state in particular in regions of the northern hemisphere, that are cooled in present day observations due to industrial aerosol emissions.
During the first 500 years of the spinup simulation we noted positive global ocean temperature trends throughout nearly the   (Steele et al., 2001). Switching from dry to total mass fixer after 500 years reduced spurious heat production in the atmosphere and halted the ocean warming trend in the upper 1000 meters. Warming at depth continued. (McGregor, 2005). We implemented this solution starting from year 1651. Subsequently the radiative imbalance reduced to +0.7 W/m 2 , which is within the range of imbalance of CMIP6 models (Wild, 2020). 215 Further experiments showed that decreasing the OpenIFS timestep from 60 minutes to 30 minutes reduced the imbalance to +0.4 W/m 2 . An additional reduction to 15 minutes timesteps showed no more improvements. We surmise that the timestepdependent component of the error implicates the semi-lagrangian trajectory algorithm operating close to the stability limit for the given atmospheric resolution of TCo159. For our analysis we judged this additional error an acceptable price for a doubling in model integration speed, and we thus keep the 60 min timestep. The timestep independent flux imbalance of +0.4 W/m 2 220 will be targeted in future model development and tuning efforts.

2019) with the McGregor scheme
In the global mean ocean temperature Hovmöller diagram ( Figure 3) we can see that after switching to total mass conservation the reduction of spurious heat production in the atmosphere led to a stabilization of the ocean temperatures in the upper 1000 meters. The trend in the deep ocean, strongest at 4500 meters depth, on the other hand has hardly slowed down and, thus, likely has a different origin. One candidate currently under investigation is the topography-influenced equilibrium depth of the 225 Gibraltar Straight overflow. Alternatively we speculate that overestimated mixing from the KPP mixing scheme (Large et al., 1994) might be the reason. The bias pattern is very similar to that of the previous FESOM2-ECHAM6 (AWI-CM2) coupled model .
Another potential contributor to the accumulation of heat at depth is a consistent positive shortwave radiation bias of OpenIFS in the Southern Hemisphere. Some of the spuriously heated Southern Ocean surface water gets entrained into the 230 Antarctic Intermediate Water (AAIW). In recent years, the Southern Ocean shortwave radiation bias has been the subject of research which led to its reduction by 5 − 10 W/m 2 (Forbes et al., 2016). We have already back-ported these improvements originally developed for the operational IFS CY45 model into OpenIFS CY43R3 prior to our spinup simulation. Even with the improvements, a positive shortwave downward radiation bias of up to 5 − 10 W/m 2 between 45-60°S remains, and can be seen in Figure 7 e). Idealized experiments have shown that removal of the remaining shortwave radiation bias would cool the 235 Southern Ocean by roughly 1 K within a decade, with potentially larger improvements on longer timescales (not shown).

Pre-industrial control (PICT)
As evident from the Hovmöller diagram (Figure 3b), the spinup run is not yet in equilibrium at depths greater than 1000 meters.
A small residual drift can also be found at the ocean surface and in the atmosphere, which can be seen as a consequence of the still-drifting deep ocean. Based on experience with other ocean models we can estimate that a 3000-5000 year long 240 simulation would be needed for the model to reach full equilibrium (Rackow et al., 2018). Instead, we run a pre-industrial control experiment which serves as a reference for correcting the historical-period simulation with respect to the remaining trends. The pre-industrial control run thus extends the spinup run by 165 years with the same year 1850 greenhouse gas and solar forcing. We construct a simple linear regression model for the remaining drifts in PICT and subtract these when we analyze the response to historical forcing in the historical simulation.

Climatological performance during the historical period (HIST)
In the following we first characterize the most prominent bias patterns of the last 25 years of the historical simulation with regards to reanalysis and satellite data products. We then take a look at the climate response to the historical forcing in comparison to the pre-industrial control. contributing models can be seen. A complete list of the CMIP6 models serving as the evaluation set is given in Appendix A. The list of observational datasets used to calculate all mean absolute errors is also given in Appendix A. For all model and observational datasets, where available, the time period from December 1989 until November 2014 is considered. As the 255 performance of AWI-CM3 is expressed as a fraction of the errors of the CMIP6 average performance, values below 1 indicate better performance, values above 1 point to larger model errors.
For the majority of variables, seasons and regions our post-CMIP6 prototype model is already performing better than the average CMIP6 model. The lead is especially large for cloud cover (clt) and 500 hPa geopotential height (zg). For surface meridional wind (vas), surface zonal winds (uas), 300 hPa zonal wind (ua), TOA outgoing longwave radiation (rlut), and 260 precipitation (pr) the bias to observations is mostly below average. The very important near surface (2m) air temperature (tas) is simulated well in the Arctic (60°-90°N) and reasonably well in the northern mid-latitudes (30°-60°N) and tropics (30°S-30°N).
that the mixed layer depth from reanalysis we used here as the reference could be associated with uncertainties too, as it is based on the implementation of mixing parameterizations in the reanalysis model. The standard deviation of sea surface height (zos) shows reasonable variability of the ocean currents in the mid and high latitudes, with problems found in the Nino34 region (5°S-5°N, 170°W-120°W). On the other hand, the standard deviation of temperature is best represented here, with deficiencies in the Antarctic cold season and northern mid latitude summer.

270
A simple average over all individual performance indices gives AWI-CM3 a score of 0.931, with 15 out of the 30 considered CMIP6 models performing better. The overall index of the AWI-CM3 prototype simulation is improved compared to its CMIP6 predecessor with similar resolution and computational cost (AWI-ESM1.1-LR, 1.044). The performance of our medium-resolution AWI-CM1.1-MR CMIP6 contribution (Semmler et al., 2020) is better (0.894), but this model configuration is 20 times more expensive to run than the simulations presented here. Preliminary tests with higher resolution at equal 275 computational cost indicate that AWI-CM3 can achieve better climatological performance than AWI-CM1.1-MR, as we will show in Section 5.1. A major contributing factor are the faster dynamic cores and better computational scalability of the model components, allowing for higher resolutions at equal computational cost.
With this overview in mind we will limit our model bias analysis to problematic areas, knowing that for the variables we do not focus on we achieved good model performance. The sea ice thickness in the Arctic contains realistic values for both end of summer (EOS) and winter (EOW), PICT and HIST runs ( Figure 5). In the central Arctic, PICT sea ice thickness ranges from 3.5m at the EOW to 2.5m at the EOS. The mean over the last 25 years of the HIST simulation reveals a reduction of sea ice thicknesses to 2.5m and 1.5m in these two seasons. In both simulations, the maxima of EOW ice thickness can be found in the East Siberian Sea and along the coastline of northern 285 Greenland. Although, in some years, a local maximum of sea ice thickness was observed along the East Siberian Sea coast, in the multi-year-mean field it should not be as large as presented by the model. Similar issues are common in many CMIP6 models and exist even in the PIOMAS reanalysis (Watts et al., 2021). One clear bias in comparison to observations is a too wide tongue of sea ice extending eastward in the Greenland Sea during the winter months. A similar feature can be seen in our previous model versions. The annual cycle of sea ice extent is represented well with 16 to 7 million square kilometers in the The Antarctic sea ice biases require the most improvement as follows from the metrics presented in Figure 4. In both the PICT and HIST simulations the sea ice covered area is strongly underestimated during austral winter (Figure 5c&g). Notable are two spots of low sea ice thickness in the Weddell Sea and in the eastern Ross Sea. Both areas feature low EOW mean sea ice thickness of less than 20cm during the PICT run, and are partially ice free during the last 25 years of HIST. These locations 295 feature persistent large scale polynyas. In reality, polynyas were observed in the Weddell Sea (e.g., during the winters 1974 to 1976), however, the frequency of their occurrence (not presented in this paper) is clearly overestimated in our model.
In order to gain an understanding towards the reasons for the persisting polynya, we investigate the mean mixed layer depth (MLD), defined as the depth at which the potential density differs by 0.125 kg m 3 from the surface density (Monterey and Levitus, 1997). The MLD shown in Figure 6 a) features large values in the Weddell and increased values in the Ross seas that are co-300 located with low sea ice concentration values seen in figure 6 b). We therefore speculate that the large MLD could be one of the reasons for the underestimated wintertime sea ice, as the ocean heat from the warmer Circumpolar Deep Water (CDW) can be mixed up to reach sea ice from below. The salinity profile over the highlighted region confirms that the surface layer is more saline in our model than the PHC3 climatology. The exact reason for the overestimated MLD is not clear and understanding whether the salinity bias is the main cause is among our planned future research. i-l): Same as top row but for GIOMAS sea ice thickness reanalysis product. (Zhang and Rothrock, 2003). m): Evolution of sea ice extent over SPIN, as well as HIST and PICT simulations. Mass fixer switched from dry to total in years 1650. HIST and PICT forcing applied from 1850 onwards.
We hypothesize that in the subsequent austral summer the reduced sea ice cover results in a strong positive sea ice albedo feedback, further heating up the surface ocean. We examine this feedback in the following section.

Surface temperature and fluxes
The near surface (2m) air temperature bias in comparison to ERA5 largely shows a pattern similar in sign (Figure 7a), and at somewhat higher amplitude compared to the CMIP6 multi model mean bias shown in Bock et al. (2020). In the tropical 310 Pacific we find a slight cold bias located at the equator, flanked by equally sized warm biases, which is typical of a too-strong double-ITCZ found in many coupled models and associated with characteristic percipitation and short-wave radiation biases (see below). The upwelling regions off the west coasts of southern Africa and South America feature well known warm biases of up to 4°C resulting from insufficient upwelling, too little stratocumulus cloud cover and thus too much downwelling shortwaver radiation with our low resolution ocean model. Further north and south the subtropical gyres feature cold biases on the 315 order of -1 to -2°C.
In the mid-latitudes, the most prominent bias is the misplacement of the Gulf Stream which fails to represent the correct North West Corner detachment (Figure 7a). This is a well known bias in OGCMs of coarser than 10 km resolution in the North Atlantic. For the FESOM2 model, a study by Sein et al. (2017) shows that this problem can be mitigated by increasing the ocean model resolution in the region.

320
At high latitudes we find a strong warm bias of more than +8°C over areas of the Southern Ocean (SO) adjacent to Antarctica and a moderate one over Antarctica of +3 to +4°C, as well as a cold bias of around -2°C in the Arctic. The cold bias in the Arctic likely stems from the ERA5 reanalysis, which misses snow cover on sea ice (Batrak and Müller, 2019), rather than from AWI-CM3. We thus focus on the southern warm biases in our analysis, some of which are well known for IFS based climate  (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014). e) Surface downward shortwave radiation bias with respect to CERES (2001CERES ( -2014. f) Total cloud cover fraction bias with respect to MODIS. For each variable rmsd gives the area weighted root mean square distance between observations and model data, and bias gives the area weighted mean distance. models and can also be seen in Döscher et al. (2021) and Roberts et al. (2018). We hypothesize that these biases are caused by 325 the direct effect of heat released from a spuriously deep mixed layer (as noted in Section 4.3.1), a positive ice albedo feedback to the resulting reduced sea ice cover, and a remaining positive net shortwave downward heat flux bias between 45-60°S.
Nearly all of the near surface temperature biases (Figure 7a) can be associated with co-located net surface shortwave radiation biases (Figure 7c). These can be largely explained by surface downward radiation biases (Figure 7e), which in turn result from total cloud cover fraction biases (Figure 7f). This linkage pattern is well known in the modelling community (Satoh et al.,330 2019) and will likely remain a dominant source for surface temperature biases until deep convection resolving atmospheric models become readily available for multidecadal to centennial climate simulations.
There are two notable exceptions to this causal chain in our simulations. Firstly, we found a cold bias over the Greenland sea where the surface downward shortwave radiation bias is positive. The cold bias in the region is the result of a lobe of sea ice drifting into the area from the Greenland coast during winter. This results in an overestimation of surface albedo and a 335 reduction of the net surface shortwave radiation.
The second and similar exception is the previously mentioned warm bias south of 60°S in the SO. While the model overestimates the cloud and thus underestimates surface downward shortwave radiation in the region, the sea ice fraction in this area is too low. The surface net shortwave radiation and the near surface temperature biases are therefore positive. As the SO surface net shortwave radiation bias is negative the low sea ice concentrations can not originally be caused by shortwave biases.

340
Indeed, the near-surface (2m) air temperature and sea ice concentration biases south of 60°S peak during the austral winter season, while the biases in the southern mid-latitudes peak during austral summer (Figure 4). Correcting the largest biases south of 60°S in our model will probably necessitate work on non-solar heat fluxes and mixed layer depths at high-latitudes.
The precipitation biases shown in Figure 7b   Contrasted with the same winds in the ERA5 reanalysis of the equivalent time period the QBO frequency error is apparent.
The main reason for this behaviour is a lack of tuning for the gravity wave flux parameterization, as was confirmed by tests conducted with OpenIFS at GEOMAR (not shown). Tuning of the gravitational wave flux parameterization will be addressed in future model releases.

Ocean temperature
On the ocean side of the coupling interface, we find mean sea surface temperature biases with respect to PHC3 that are nearly identical to the aforementioned near surface air temperature biases (Figure 9). At a depth of 100 meters the bias looks similar to that at the surface in high-latitudes where mixed layer depths are large. In the Tropics and Subtropics this depth range is dominated by cold biases, which could be partially due to missing vertical mixing associated with Langmuir circulation in the gives the area weighted mean distance. At the surface the ocean-resolution-specific warm biases in upwelling regions and western boundary currents can be seen, in addition to a pronounced Southern Ocean warm bias. At depth, a warm bias is dominating in the Atlantic.
in the near-surface ocean in the mid-latitudes and tropics Ali et al., 2019), and the integration of a Langmuir circulation parameterization in vertical mixing schemes is planned.
At a depth of 1000 meters the Atlantic shows a strong warm bias, which we speculate results partially from spuriously warm SO surface water entrained into the AAIW. Ultimately the whole bias probably stems from multiple yet to be identified sources  These authors discuss different factors that may be responsible for the bias. Sterl et al. (2012) show that overestimation of the Mediterranean outflow can significantly increase the deep-ocean salinity bias. Delworth et al. (2012) attribute this anomaly to the insufficient eddy transport required to compensate for the wind-driven subduction in the subtropical gyres. They show that moving towards an eddy-resolving setting or a parameterization of the eddy stirring reduces the temperature biases significantly. Jungclaus et al. (2013) suggest that part of the problem arises from the improper interbasin exchange between the Indian and 380 South Atlantic oceans.
At 4000 meters depth we observe the warm biased Atlantic water mass spreading into the whole global ocean. As the meridional circulation at this depth is northward in the Antarctic Bottom Water (AABW) cell, the origin of the warm bias is presumably insufficient cold AABW formation in the Southern Ocean on this coarse mesh. The slowness of the AABW circulation in the Atlantic coincides with the slow but continued global-mean warming trends at great depths seen in Figure   385 3. Biases presented in Figure 9 explain the Hovmöller diagram in Fig 3 which we addressed in Section 4.1. The three sets of anomalies in the Hovmöller diagram at depths 100m, 1000m and 4000m stem from the biases in the mid latitudes, the North Atlantic and in the entire ocean, respectively.

El Niño-Southern Oscillation
To assess the model capabilities in representing climate variability, we investigate the representation of the El Nino Southern 390 Oscillation (ENSO). Using the entire HIST simulation and the EOFs toolsbox (Dawson, 2016) Figure 10a) depicts the spatial extent of the correlation between the first principle component and the input data set at each point in space. Also depicted is the Nino3.4 box (5°N to 5°S, 120°W to 170°W) upon which we based our further analysis.
We calculated the area mean SST within the box, applied a linear detrending, subtracted the mean seasonal cycle, and finally computed a three month running mean. The resulting Nino3.4 index timeseries is shown in Figure 10c. Comparison with the 395 Nino3.4 index based on the HadISST observational estimates (Figure 10d) reveals that our simulated amplitude of ENSO, with a range of +1.9 to -1.6°C, is lower than the observed. Indeed, the histograms in Figures 10e and 10f reveal missing tails of the distribution in the simulation.
While the overall strength of ENSO is underestimated by AWI-CM3, the normalized power spectrum density (PSD) in Figure 10b shows that the frequencies of the observed HadISST ENSO are well reproduced. We find several peaks concentrated 400 between periods of 2.8 and 12 years. Further analysis (not shown) indicated that ENSO phase locking is rather weak in the model version presented, and that both ENSO amplitude and phase locking can be improved through reduction of equatorial Pacific precipitation and temperature biases. Understanding how to do so without negatively impacting model performance in other regions is work in progress.
Future improvements for the ENSO amplitude could also potentially be achieved by activating the OpenIFS internal stochas-405 tic parameterization schemes, as shown by Yang et al. (2019). Indeed, HighResMIP simulations performed by Roberts et al. (2018) with IFS CY43 that employed the Stochastically Perturbated Parametrisation Tendencies (SPPT) scheme (Buizza et al., 1999) had an ENSO amplitude more in line with observational estimates. Since these studies used the SPPT scheme, they required additional humidity mass fixers for climate-length integrations.
Alternatively, in order to use the new -and from a mass conservation perspective more favorable -Stochastically Perturbed  OpenIFS CY43R3, or the new version OpenIFS CY47 has to be released. Noteworthy is also that CMIP6 simulations done with CNRM-CM6-1, featuring the IFS atmosphere with the ARPEGE physics package (not in use AWI-CM3, as we employ ECMWF physics) and no stochastic parameterizations, also had a high ENSO amplitude. However the frequency of ENSO was too focused with a sharp single peak in the 3-4 year band (Voldoire et al., 2019).

415
Finally experiments with stochastic coupling at the atmosphere ocean interface conducted with AWI-CM1 yielded improved ENSO phase locking (Rackow and Juricke, 2020), providing another potential development avenue.

Air temperature
After we have established that AWI-CM3 behaves reasonably for much of the globe and many important climate parameters, 420 we will now characterize the impact of historical greenhouse gas and solar forcing on the evolution of several of these variables.
The global mean near surface (2m) air temperature increases under HIST forcing, as seen in Figure 11a. Since our SPIN experiment did not establish a full equilibrium in the deep ocean, we analyze the air temperature change in our pre-industrial control experiment PICT, and obtain the residual drift of 0.00091°C/Year via linear regression. Over the 165-year-long simulation this amounts to a drift of 0.15°C in the PICT run.  to about −0.5W/m 2 . Comparing to the total anthropogenic forcing of +1.5W/m 2 , we foresee that, once transient aerosols 435 will have been included in OpenIFS CY43R3, AWI-CM3 historical global warming will be much closer to the observed value.
The map of temperature changes in Figure 12a shows that under historic well-mixed greenhouse gas and solar forcing AWI-CM3 simulates a temperature increase of approximately 1-2 • C in the Tropics, Antarctic and large parts of Eurasia. Mid-latitude oceans in both hemispheres see a weaker warming of 0.6-1 • C. Larger warming of 2-3 • C can be found in the Middle East, North America, the Caucasus, Australia and South Africa. Sea ice covered regions in the Arctic experience the strongest air 440 temperature increases, with the Arctic (65-90 • N) warming by 3-8 • C. Defining an Arctic Amplification Index (AAI) as the ratio of warming north of 65°N to whole northern hemisphere warming (Davy et al., 2018;Johannessen et al., 2016), the resulting AAI is 2.87. Note that, in contrast to observations as well as full-forcing CMIP simulations, there is no trace of a warming hole in the North Atlantic south of Greenland. Whether this might be due to the missing transient aerosol forcing in our simulation is unclear. Earlier studies have linked the warming hole rather to a weakening of the AMOC (Keil et al., 2020), but although 445 our historic simulation does exhibit such an AMOC weakening (see Figure 13), no warming hole forms.

Ocean circulation
The AWI-CM3 simulations reasonably reproduce the canonical pattern of the AMOC streamfunction, with an upper cell consisting of northward surface flow as well as southward return flow of North Atlantic Deep Water, and a lower cell representing the northward flow of AABW (Figures 13 a,b). During the spinup simulation the maximum of the northward transport between 30-45°N in AWI-CM3 fluctuates at around 20 Sv (see Figure 13c.). Initially the AMOC gradually slowed down to 18 Sv while 460 the upper ocean was spuriously warming, as described in Section 4.1. After applying the total mass fixer starting after 500 years into the SPIN experiment the AMOC is recovered to 20 Sv along with the cooling of the upper ocean (Figure 3b). Figure 13d shows a rather strong decline of the AMOC strength over the last 70 years of the HIST simulation, while the multi decadal natural variability of the AMOC is still high. As we do not have additional ensemble members, we do not make strong statements about the impact of historic forcing on the AMOC. However, we can conclude that our simulated AMOC 465 response to historical forcing is consistent with recent synthesis based on observations (Caesar et al., 2022), and the ability for rapid AMOC state changes, as described in Ollinaho et al. (2017), does seem to exist in AWI-CM3, since our low resolution is indeed higher than their high resolution case. While the upper cell diminishes considerably under historic forcing, the lower one is mostly unaffected.
AMOC variability across all latitudes is well correlated, pointing to the same (northern) origin of the signal. In the HIST run 470 the decrease of AMOC is found across all latitudes. The correlation is high but not perfect (see years between 1920 and 1930, for instance). This indicates that the recirculating cell, associated with physical and numerical mixing in the model caused by the advection operator, changes in strength (Sidorenko et al., 2021). Further conclusions would require extra analysis in the density space and the isopycnal framework (as in Sidorenko et al. (2021)) which we didn't activate in this run. However, this extra analysis was done by Sidorenko et al. (2021) and they concluded that the NAO was the main driver of AMOC variability.

475
Simulated volume transport fluxes were computed across several major ocean straits (Table C2). Historical values averaged over 1985-2014 match well the observation-based estimates for most straits. Two exceptions are the Nares and Davis Straits where the simulated transport is lower than in the observational estimates. Indeed, these very narrow straits are likely not well resolved on the CORE2 mesh. In the Arctic, the poleward inflow through the Barents Sea Opening and Bering Strait is somewhat overestimated, and counterbalanced by a southward transport at Fram Strait larger than in the observational estimates 480 (yet all remain within the uncertainty range). The simulated Antarctic Circumpolar Current (ACC) transport through the Drake Passage is within the range of observations. While being lower than the most recent measurements by Donohue et al. (2016), our AWI-CM3 estimate is comparable to the CMIP5 multi-model mean (MMM) and larger than the CMIP6 MMM (Beadling et al., 2020).
Historical transports through major straits are thus satisfactorily reproduced in the present AWI-CM3 configuration. Future 485 configurations with an increased resolution, eddy-resolving ocean are expected to provide even more accurate results within narrow straits and in eddy-rich areas such as the Southern Ocean, where mesoscale activity is key in accurately depicting the ACC behaviour (e.g., Rackow et al., 2022). To investigate the ability of AWI-CM3 to simulate warmer climate states, we conducted the 4xCO 2 experiment which prescribes a sudden permanent increase of the CO2 concentrations to 4 times (1137.27 ppm) the base value (284.32 ppm) from 1850. It can be used to analyse the model's inherent climate sensitivity. As shown by Gregory et al. (2004) the relationship between the change in net downward radiative flux and the change in near surface (2m) air temperature can be described by a linear regression model. The Gregory plot in Figure 14 a) was computed from the 4xCO 2 experiment in comparison to the 495 pre-industrial control simulation. The first axis intersection point determines the instantaneous radiative forcing F = 7.06 W m 2 resulting from the quadrupling of CO2. The linear regression line intersects the second axis at an Equilibrium Temperature difference ∆T = 6.49°C, resulting in a climate response parameter of AWI-CM3 of α = ∆T F = 0.92 K W m 2 . Equilibrium Climate Sensitivity (ECS) is given by doubling rather than quadrupling CO2 concentrations, and is thus ECS= ∆T 2 = 3.2°C. With this ECS value the AWI-CM3 prototype finds itself near the center of the range predicted by CMIP6 models (1.8 − 5.6°C) (Meehl 500 et al., 2020;Scafetta, 2021), and the same as AWI-CM1, which had a value of 3.2 • C as well.

Transient Climate Response (TCR)
We obtain the transient climate response of AWI-CM3 TCo159L91-CORE2 from an experiment that features increased CO 2 forcing by 1% per year, starting from the 1850 value of 285 ppm. Radiative forcing thus applied results in a near surface (2m) air temperature increase of 2.1°C after 70 years, when CO 2 concentrations have doubled, as shown in Figure 14b. As with the 505 ECS, the TCR of AWI-CM3 is also near the center of the CMIP6 model distribution of 1.8−5.6°C (Scafetta, 2021). Compared to the predecessor model AWI-CM1, the TCR did not change. Interestingly, a further doubling to a total of four times the 1850 CO 2 concentrations until the year 1990 results in another 2.6°C increase, and thus a larger global mean near surface (2m) air temperature rise than during the first period.

510
The computational performance of a climate model can be measured according to a variety of criteria. Balaji et al. (2017) provide a good overview of what can be considered the computational performance, but in our analysis we will focus on just two aspects, the Simulated Years Per Day (SYPD) and the computational cost measured in Core Hours per Simulated Year (CHSY). Systematic and rigorous experiment design would require that we identify all the degrees of freedom and vary them in all combinations. Unfortunately the number of degrees of freedom is large, including atmosphere and ocean vertical 515 and horizontal resolution, atmospheric spectral and grid point resolution, number of cores allocated for MPI and/or OpenMP parallelization for FESOM2, OpenIFS43 and XIOS, as well as the amount of model output for each of the main components etc.
Testing all combinations is impractical. Instead we present results for setups that have been optimized empirically, involving not only the use of analytical tools such as Dr.Hook (Saarinen et al., 2005), LUCIA (Maisonnave et al., 2020) and the XIOS internal statistics, but also educated guesswork. It may well be that better configurations exist, but AWI-CM3 can at a minimum 520 perform to the level presented here.
Tables 1 and 2 list the SYPD and CHSY values we achieved when optimization is done for speed and cost, respectively. Note, that the scaling limit of TCo319L137-DART has not been reached and simulations with upwards of 10 SYPD are likely feasible.

High resolution outlook 525
While we document mainly the first CMIP-prototype simulations of AWI-CM3 and its strengths and weaknesses at low resolution, we also tested higher resolution configurations. In Figure 15 we show the performance of a 31km atmosphere with 137 vertical layers, coupled to a high-resolution ocean with a mesh that features 3.1 million surface nodes and 80 layers. This configuration with the name TCo319L137-DART has been run for 50 years under constant 1990 forcing, with some of the insights gained while performing the set of experiments at low resolution already taken into account.

530
Key improvements of the TCo319L137-DART setup compared to TCo159L91-CORE2 are the eddy resolving ocean in the western boundary currents, with eddy permittance in the ACC region, as well as better representation of orography and related effects in the atmospheric model. In lower latitude regions the atmosphere has sufficient resolution to resolve and react to ocean eddies. Both atmosphere and ocean feature more vertical layers, allowing for better representation of vertical processes.
The ability of the TCo319L137-DART simulation to reproduce a climate as observed during the years 1990 to 2014 is 535 better than that of the TCo159-CORE2 runs we analysed so far. The improvement is almost universal with only the sea ice concentrations and mixed layer depths still showing below-average performance compared to CMIP6 models. A TCo159-CORE2 simulation of the same length and with the same forcing (not shown) showed performance almost identical to the last 25 years of the HIST experiment discussed above. We conclude that the improvement is related to the model resolution and continued model development, rather than the shorter run length or different forcing.

540
Obviously the improved climatological performance comes at a cost, as detailed in Section 5. Every simulated year with TCo319L137-DART costs 35 times the CHSY and is performed at 15 times lower SYPD. More detailed exploration of the higher resolution capabilities of AWI-CM3 will be subject of future work.

Conclusions
We for OpenIFS, the model was able to reach a near equilibrium under constant 1850 well-mixed greenhouse gas and solar forcing.
After 700 years of spinup we branched off four experiments, a historic simulation (165y), a pre-industrial control run (165y), and two idealized experiments, one with a sudden increase to 4xCO 2 (120y) and the other with 1% CO 2 (150y) increase per year.
Climate sensitivity experiments with AWI-CM3 obtained an Equilibrium Climate Sensitivity and Transient Climate Re-555 sponse of 3.2°C and 2.1°C, respectively, both of which are near the center of the CMIP6 spreads.
Using the last 25 years of the historical simulation we established that a low resolution version of AWI-CM3 provides above CMIP6-average performance for representing the climatological state of precipitation, wind speeds, cloud fraction, 500 hPa geopotential height and air temperature north of 30°S. We found that the Southern Ocean sea ice concentration and thickness were severely underestimated, leading to large positive near-surface air temperature biases in this region, and traced the sea 560 ice biases to spuriously large mixed layer depth, a positive ice albedo feedback and biases in shortwave downward radiation associated with cloud fraction between 45-60°S.
AWI-CM3 is capable of realistically simulating the Atlantic meridional overturning circulation (AMOC) in terms of both the shape and strength of the streamfunction, as well as reproducing a decreasing trend in the historical period consistent with observations. While the model produces an El Niño-Southern Oscillation (ENSO) at realistic frequencies, the amplitude of 565 ENSO is currently underestimated.
Our recommendations for resolving the Southern Ocean sea ice concentration and thickness biases in future versions of AWI-CM3 include re-tuning of the vertical mixing scheme, and the inclusion of coupling minor fluxes that are at least one magnitude smaller than the ones exchanged so far. We have identified the coupling of ocean surface currents, rain temperature, enthalpy of snow falling into the ocean, enthalpy of melting icebergs and basal melt flux as promising candidates to reduce 570 model biases. Based on literature we suggest porting and activating the Stochastically Perturbed Parametrisations scheme as a potential way to improve ENSO amplitude.
The advanced computational efficiency and scalability of AWI-CM3, combined with a very solid model and coupling physics implementation, will eventually enable us to perform full DECK and scenario simulations at resolutions of 5-25km, that were previously reserved for the high end of the HighResMIP protocol. We provide a preview with a shorter high-resolution 575 simulation, indicating that most AWI-CM3 climatological biases at future operational resolution will be about half those of the average CMIP6 model.
Code and data availability. The ocean model FESOM2 source code is available on Zenodo at 10.5281/zenodo.6335383 and at https : //github.com/F ESOM/f esom2/releases/tag/AW I − CM 3_v3.0. OpenIFS is not publicly available but rather subject to licecing by ECMWF. However licences are readily given free of charge to any academic or research institute. All modifications required to en-580 able AWI-CM3 simulations with OpenIFS CY43R3V1 as provided by ECMWF can be obtained on Zenodo at: 10.5281/zenodo.6335498.
The OASIS coupler is available upon registration at: https://oasis.cerfacs.fr/en/downloads/. The XIOS source code is available on Zen-  Appendix B: Additional figures Figure B1. HIST biases as figure 4 , but using NCEP-DOE2 re-analysis by Kistler et al. (2001) instead of ERA5 for near-surface air temperature (tas), eastward near-surface wind (uas), northward near-surface wind (vas), 300 hPa eastward wind (ua), 500 hPa geopotential height. Note that ERA5 was created using IFS CY41R2, a model closely related to the OpenIFS CY43R3 atmosphere employed in AWI-CM3. Figure B2. TCo319L137-DART biases as figure 15, but using NCEP-DOE2 re-analysis by Kistler et al. (2001)