Impact of increased resolution on long-standing biases in HighResMIP-PRIMAVERA climate models

. We examine the influence of increased resolution on four long-standing biases using five different climate models developed within the PRIMAVERA project. The biases are the warm eastern tropical oceans, the double Intertropical Convergence Zone (ITCZ), the warm Southern Ocean, and the cold North Atlantic. Atmosphere resolution increases from ~100–200 km to ~25–50 km, and ocean resolution increases from ~1° (i.e., eddy-parametrized) to ~0.25° (i.e., eddy-present). For one model, ocean resolution also reaches 1/12° (i.e., eddy-rich). The ensemble mean and individual fFully-coupled general circulation models and their atmosphere-only versions are compared with satellite observations and the ERA5 reanalysis of near-surface temperature, precipitation, cloud cover, net cloud radiative effect, and zonal wind over the period 1980–2014. The four studied biases appear in all the low resolution coupled models to some extent, although the Southern Ocean warm bias is the least cloud radiative effect. Our analysis finds no clear reductions in the studied biases due to the increase in atmosphere resolution up to 25–50 km, in ocean resolution up to 0.25°, or in both. Our study thus adds to evidence that further improved model physics, tuning, and even finer resolutions might be necessary. Increased resolution especially in the atmosphere in the HadGEM3-GM31 and MIP-ESM1-2 models helps reduces the surface warm bias over the tropical upwelling regions in the coupled models, with further reductionimprovements in the cloud cover and precipitation biases particularly over the tropical South Atlantic. Related to this and to the improvement in the precipitation distribution over the western tropical Pacific, the double Intertropical Convergence Zone bias also weakens with resolution. Overall, increased ocean resolution from ~1° to ~0.25° offers limited reductionimprovements or even bias degradation in some models, although an eddy-rich ocean resolution seems beneficial for reducing the biases in North Atlantic temperatures and Gulf Stream path. Despite the improvements, however, large biases in precipitation and cloud cover persist over the whole tropics as well as in the upper-troposphere zonal winds at mid-latitudes in coupled and atmosphere-only models at higher resolutions. The Southern Ocean warm bias also increasesworsens or persists in some coupled models. And a new warm bias emerges in the Labrador Sea in all the high-resolution coupled models. The analysis of this set of the PRIMAVERA models therefore suggests that, to reduce biases, i) increased atmosphere resolution up to ~25–50 km alone might not be sufficient and ii) an eddy-rich ocean resolution might be needed. The study thus adds to evidence that further improved model physics and tuning might be necessary in addition to increased resolution to mitigate biases.

contribution to the upper-ocean heat budget and offshore transport from the upwelling regions in the Atlantic (Seo et al., 2006;Xu et al., 2014b;Small et al., 2015). However, the bias persists in some models and ocean basins, such as the Pacific, even after increasing their resolution (Jochum et al., 2005;Doi et al., 2012;Delworth et al., 2012;Milinski et al., 2016;Goubanova et al., 2019), which suggests that a refinement of model physics might still be necessary to remove it (Patricola et al., 2012;Harlaß et al., 2018). A reduction improvement in the temperature and cloud biases in the eastern tropical oceans mightcan help reduce current uncertainty about climate sensitivity (Andrews et al., 2019), impact precipitation biases for example over the equatorial North Atlantic (e.g., Hazeleger and Haarsma, 2005;Huang et al., 2007;Siongco et al., 2015), and further enhance models' predictive skill over the tropics (Exarchou et al., 2021).

The double ITCZ
Another long-standing bias in the tropical climate in GCMs affects the representation of the ITCZ, referred to as the double ITCZ. This bias takes the form of a tropical precipitation distribution with two distinct maxima, to the north and south of the equator, instead of a single one north of the equator, as in observations (Fig. 2a, and black line in Fig. 3; Schneider et al., 2014). The double ITCZ problem has persisted over climate model generations (e.g., Lin, 2007;Li and Xie, 2014;Oueslati and Bellon, 2015;Zhang et al., 2015;Samanta et al., 2019;Tian and Dong, 2020); it has been related to deficiencies in the tropical or global energy budget (Hwang and Frierson, 2013;Bischoff and Schneider, 2016;Adam et al., 2016Adam et al., & 2018, in atmospheric deep convection (Zhang and Wang, 2006;Oueslati and Bellon, 2015;Song and Zhang, 2019), in land temperature (Zhou and Xie, 2017), and in the atmosphere-ocean coupling due to sea-surface temperature (SST) biases amplified by the wind-evaporation-surface temperature and the Bjerkness feedbacks (Lin, 2007;Li and Xie, 2014;Qin and Lin, 2018;Samanta et al., 2019). The double ITCZ commonly develops together with a cold surface bias and too weak easterlies over the equatorial western Pacific, which together lead to reduced convective precipitation aloft (Lin, 2007;Li and Xie, 2014;Oueslati and Bellon, 2015;Zhang et al., 2015;Samanta et al., 2019). The double ITCZ bias can present distinct seasonal characteristics (Lin, 2007;Li and Xie, 2014;Oueslati and Bellon, 2015;Adam et al., 2018)-although we will focus on the annual mean in our analysis for the sake of simplicity.
Increased model resolution can alleviate the double ITCZ bias, especially over the Atlantic when the eastern tropical warm bias is reduced (Seo et al., 2006;Delworth et al., 2012;Doi et al., 2012;Harlaß et al., 2018;Song and Zhang, 2020) and orography or mesoscale systems are better resolved in models (de Souza Custodio et al., 2017;Vannière et al., 2019;

Biases in middle and high latitudes
Besides biases in the tropics, climate models also present substantial biases at higher latitudes, which have also persisted across model generations. Here, we will discuss two of the best-known: the SO surface warm bias, and the cold bias in the subpolar North Atlantic.

Southern Ocean
Both past and state-of-the-art climate models show a surface warm bias over extensive areas at mid-and higher-latitudes of the SO (see, for example, the bottom left panel in Fig. 1b; Schneider and Reusch, 2016;Beadling et al., 2020). This bias has been attributed to an excessive shortwave radiation reaching and warming the surface ocean because of the underestimation of the cloud cover (especially mixed-phase clouds) and errors in the cloud forcing (Hwang and Frierson, 2013;Bodas-Salcedo et al., 2012Kay et al., 2016;Schneider and Reusch, 2016;Hyder et al., 2018). The extent and magnitude of these biases affects important aspects of the climate, not only over the SO, but globally. Thus, for example, too warm surface temperatures result in a gross underestimation of the Antarctic sea ice by models (Beadling et al., 2020). Similarly, the associated misrepresentation of the low-level temperature gradient has been linked to an equatorward shift bias in the southern hemisphere (SH) upper-troposphere jet (Ceppi et al., 2012). Biases in clouds over the SO are an important uncertainty source for climate sensitivity (McCoy et al., 2015;Tan et al., 2016). The biggest reductionimprovements in the SO warm bias have recently been achieved through a more realistic representation of cloud properties over the region (Bodas-Salcedo et al., 2014;Seiki and Roh, 2020;Varma et al., 2020), which might be better characterized in higher resolution models (Furtado and Field, 2017). characteristics of the North Atlantic decadal variability (Menary et al., 2015); an unrealistic Gulf Stream separation can similarly affect its response to future increases in greenhouse gases (Moreno-Chamarro et al., 2021).

Models and simulations
We compare simulations generated with 5 different climate models participating in the PRIMAVERA project and for which all the necessary data were publicly available on the CEDA-JASMIN platform at the time of the analysis (Table 1): CNRM-CM6-1 (Voldoire et al., 2019b), EC-Earth3P (Haarsma et al., 2020), ECMWF-IFS (Roberts C.D. et al., 2018), HadGEM3-GC31 (Roberts M.J. et al., 2019), and MPI-ESM1-2 (Gutjahr et al., 2019). Two resolutions for each model are compared (details provided in Table 1): a lower one, which in most cases features a standard ~100-200-km atmosphere and an eddyparametrizededdy-none, 1° ocean; and a higher resolution version with a ~50-km atmosphere and an eddy-present, 0.25°o cean. For simplicity, the lower and higher resolution versions of each model are referred to as LR and HR respectively. In all the models except for the MPI-ESM1-2 resolution increases in both the ocean and atmosphere from LR to HR (Table 1).
For the MPI-ESM1-2 only the atmosphere resolution increases, from a nominal resolution of 134 km to 67 km, both coupled to a 0.4° ocean. To extend the analysis and explore the benefit of an eddy-rich ocean model, we also analyze the HH coupled version of the HadGEM3-GC31 (Roberts M.J. et al., 2019), which has the same atmospheric resolution as its here-referred HR version (41 km) but coupled to an eddy-rich, 1/12° ocean (Table 1). However, the results of the HadGEM3-GC31-HH model are simply discussed whenever they are relevant and are not included to compute the ensemble means, since this model has a different eddy regime compared to the other HR models.
Following the CMIP6 HighResMIP protocol, no additional tuning was applied to the HR model versions, except for a short list of parameters that explicitly change with resolution (especially for oceanic diffusion and viscosity; see, for example, Table 1 in Roberts M.J. et al., 2020b). Specific details about each model can be found in the references in Table 1. In contrast to the other models, the HR version of the ECMWF-IFS model was based on an existing configuration used Hadley Center Global Ice and Sea Surface Temperature (HadISST.2.2.0;Kennedy et al., 2017), and ii) coupled historical runs (hist-1950), which are forced by time-varying external forcing starting from a 50-year control spin-up that uses fixed 1950s forcing. Both the atmosphere-only and coupled experiments cover the period 1950-2014-although here we focus mainly on the 1980-2014 period (see below). Comparing atmosphere-only and fully coupled climate models allowsmight helps isolatinge the biases arising from atmosphere-ocean interactions.

Observations and reanalysis
The climate models are compared against a suite of observational and reanalysis products. These include near-surface air tropospheric temperature (SAT) and tropospheric zonal winds from the ERA5 reanalysis (Hersbach et al., 2020) ERA-Interim reanalysis; (Dee et al., 2011), precipitation rate from the version-2 GPCP dataset (Adler et al., 2003), cloud cover from the version-3 ESA Cloud_cci dataset (ESA CCI-CLOUD; Stengel et al., 2020], and net cloud radiative effect computed from the CERES-EBAF dataset Kato et al., 2018;Loeb et al., 2018). The net cloud radiative effect is computed as the difference between the top-of-the-atmosphere upward net flux and the clear sky component; it represents the net effect of clouds on the radiation budget at the top of the atmosphere, with negative mean values for cloud-induced cooling, and vice versa (Fig. 5a). Biases in SAT and zonal winds with respect to the ERA-Interim reanalysis (Dee et al., 2011) are very similar to those with respect to ERA5 (not shown). Similarly, biases in SST (not shown) are very similar to those in SAT, which suggests SAT biases are dominated by the SST ones over the ocean. The periods of comparison between models and observations are adapted to maximize observations' availability until the last simulated year (i.e., 2014).
These periods are 1980-2014 for ERA5-Interim and GPCP, 1982for ESA CCI-CLOUD, and 2001 for CERES-EBAF. Biases are computed by adapting the ESMValTool (Eyring et al., 2020) recipe "recipe_perfmetrics_CMIP5.yml" (Gleckler et al., 2008 to analyze the PRIMAVERA models. The statistical significance of the differencesanomalies between models or the ensemble means and the observations is calculated for each variable based on a two-tailed Student's t test at the 5 % level, in which the null hypothesis is that the two samples (model and observations) have the same mean over the above-mentioned periods, assuming the two samples have different variances (von Storch and Zweirs, 1999 Figs. 1,2,4,5,6 (shows as stippling) to measure the agreement in the difference's sign of the ensemble members with respect to observations. For the global biases and each regional bias (upwelling regions, double ITCZ, SO, and North Atlantic) we compute the mean bias and the root-mean squared deviation (RMSD; Tables 2 and S1-S4). The areas where these metrics are computed are shown in Fig. S1 and are: for the tropical upwelling regions over the SH Pacific and Atlantic between 105-70° W for the Pacific and 30° W-15° E for the Atlantic, both between 0-30° S, for the Pacific ITCZ between 100-150° W and 0-30° S (as in Tian and Dong, 2020), for the SO between 0-360° E and 50-70° S, and for the North Atlantic between 80-10° W and    Table 2 summarizes the values of the global root-mean squared deviation (RMSD) and bias of four key variables: nearsurface air temperature (SAT), precipitation, cloud cover, and net cloud radiative effect. These variables are chosen to help assess the different regional biases discussed in Sections 4 and 5. On average, the ensemble presents a too cold, wet, and slightly cloudy climate, with excessive radiative cooling from clouds compared to observations. The coupled and atmosphere-only model versions present similar global biases at both resolutions for all variables except for SAT, for which biases are smaller in the atmosphere-only runs-consistent with these being forced by observed SSTs. In terms of RMSD, the ensemble mean presents some of the smallest values, likely because of error compensation among members. In contrast to the ensemble mean, the EC-Earth3P and MPI-ESM1-2 coupled models are globally warmer compared to observations, mostly due to excessively warm SO/Antarctica and tropics, respectively (Table 2 and Fig. S3). Similarly, only and the MPI-ESM1-2 models are insufficiently cloudy compared to observations (Table 1), which is connected to their strong biases over the tropics and subtropics (Figs. S1,S2,S65,S76). The EC-Earth models are the only ones that consistently show a positive radiative forcing bias due to clouds, related to a widespread cloud overestimation over the SO (Figs.

Global biases
S87,S98). Across the ensemble, the atmosphere-only and coupled CNRM-CM6-1 models show the largest RMSD values, particularly in cloud cover and net cloud radiative effect (Table 2), whose biases are dominated by those over the tropics and high latitudes (Figs. S65-S98). This contrasts with their relatively low global mean biases, a clear sign of large error compensation between regions. The HadGEM3-GC31 and MPI-ESM1-2 models both have large global mean biases in precipitation (excessively wet) and in cloud cover (respectively, excessively cloudy especially in the tropics, and deficiently cloudy especially in the subtropics and mid-latitudes; Fig. S67); however, these models have the smallest biases in net cloud radiative effect. These results highlight important differences across models within the ensemble. Compared to previous generation CMIP5 models, the global bias in net cloud radiative effect is lower in all the PRIMAVERA coupled models (Table 2; c.f. Table 1 in Calisto et al., 2014]. The increase in resolution from LR to HR has, on average, a mixed effect on the global biases ( Table 2). The temperature and net cloud radiative effect biases are reduced particularly in the coupled models, related to improvements in the eastern tropical oceans (Section 4) and North Atlantic (Section 5) mostly in the coupled versions of the HadGEM3-GC31 and ECMWF-IFS models. The precipitation and cloud cover biases increaseworsen with increased resolution, especially the cloud excess in the CNRM-CM6-1 and HadGEM3-GC31 models. This increase in global precipitation biases at higher resolution is consistent with previous literature (Vannière et al., 2020). In most cases, nonetheless, increased resolution has a small impact on the global biases. Since the study of global biases hides large regional differences, we discuss these in the following sections.

Upwelling regions
OIn the PRIMAVERA models, only the coupled configurations show a distinct warm bias in the eastern tropical oceans of a magnitude up to of about 2-3 °C ( Fig. 1) and of about 0.5 °C on average (Table S1). This bias is absent in the atmosphereonly models, as these are forced by observed SSTs (Fig. 1). At LR, the bias extends over is especially persistent over the eastern tropical South Atlantic and South Pacific and extends from the coast equatorward. In the Northern Hemisphere (NH) the warm bias is less robust across models: off the Californian coast, only the CNRM-CM6-1, EC-Earth3P, and MPI-ESM1-2 models show a distinct warm bias (Fig. S32), whereas off the northwest Africa, most models present a cold bias insteadlikely the result of the strong cold bias over the subpolar region (discussed in Section 5.2). Increased resolution leads to a ~1 °C reduction in the bias over the SH ocean basins up to about 1 °C ( Fig. 1) and of about 0.3 °C on average in the ensemble mean (Table S1Fig. 1), which might be related to improved coastal wind systems (Small et al., 2015;Milinski et al., 2016). The warm bias is largely reduced in both HadGEM3-GC31 HR models, although using an eddy-rich ocean model (HH) leads to no further reductionimprovement compared to the eddy-present ocean (HM) for the same ~50-km atmosphere resolution (Fig. S32). For this model and bias in particular, the increase in atmosphere resolution from a ~200 km to a ~50 km model seems to be more beneficial than the increase in the ocean from ~100 km to ~8 km (Roberts M.J. et al., 2019).

SAT (°C) Precipitation (mm d −1 ) Cloud Cover (%)
Net cloud radiative effect (Wm −2 ) As with many previous-generation GCMs, the surface warm bias is associated with an underestimation of the cloud cover up to by 10-20 % (Fig. 4) and of about 7 % on average (Table S1) over the eastern subtropical ocean in the LR ensemble (Fig.   4). The shape and magnitude of the cloud cover bias are similar in the atmosphere-only and coupled models, which points to deficiencies in the atmosphere models as its root cause. The ensemble mean is largely dominated by the bias in the The CNRM-CM6-1 LR model shows the largest amplitude in the cloud cover bias of about 20 % on average (Table S1) and

RMSD
locally above (of about 20-30 % (; Figs. S65,S76), followed by the MPI-ESM1-2 LR model, with a mean bias of about 17 % (Table S2); cloud cover biases over the upwelling regions showare, by contrast, nearly half the amplitudesize in the EC-Earth3P, ECMWF-IFS, and HadGEM3-GC31 LR other models (Table S2,  surface warming linked to cloud cover deficit (Fig. 4). Increased resolution helps reduces the bias in the net cloud radiative effect by about 3 Wm-2 on average (Table S1) and by up to 10-15 Wm -2 locally in the ensemble mean (

The double ITCZ
The PRIMAVERA LR models suffer from large biases in tropical precipitation (Fig. 2). These biases are similar in extent and magnitude to previous and contemporary models (CMIP3/5/6; c.f. Fig. 2 in Tian and Dong, 2020). On average, the double ITCZ emerges over the Pacific basin in the PRIMAVERA coupled models (Fig. 2), where the bias presents the characteristic pattern with precipitation deficit over the equator and excess on the northern and southern flanks by about ∓2 mm d −1 on average, respectively. This pattern can be identified in all the LR coupled models, except for CNRM-CM6-1, in which the precipitation excess is predominantly on the southern flank. Associated with the equatorial dry bias, a cold bias up to 1-2 °C also affects the LR coupled models over the western and central equatorial 11 285 290 295 300 305 Pacific (Fig. 1). In contrast to the Pacific, the precipitation bias over the tropical Atlantic points to a southward shifted ITCZ, with dry and wet biases to the north and south of the equator respectively, while over the Indian Ocean a wet precipitation bias extends over the western part of the basin and a dry one over the Indian subcontinent (Fig. 2). Such differences between ocean basins suggest that either different mechanisms are responsible for their biases, or that each basin responds differently to the same large-scale/global biases. Together, the tropical precipitation biases lead to a precipitation excess mainly over the SH in the LR coupled models (Fig. 3). All the areas with precipitation excess show positive bias in cloud cover up to about 10-20 % (Fig. 4).
In contrast to the LR coupled models, their atmosphere-only configurations show no clear double ITCZ pattern (Figs. 2 and   3). In the zonal mean, in fact, the excess in precipitation is relatively constant across all the tropics in the atmosphere-only models (Fig. 3). This result suggests that the double ITCZ arises from misrepresented atmosphere-ocean coupling, consistent with previous literature pointing to simulated air-sea interactions and SST as key players in its development (Lin 2007;Li and Xie, 2014;Oueslati and Bellon 2015). The LR atmosphere-only models, instead, present excessively wet (~1.5-3 mm  Table S2) . Over these two basins, however, the bias reduction is larger for the eddy-present HadGEM3-GC31 model than for the eddy-rich one (Fig. S5, and Table S2). Over both Pacific and Atlantic , the reduction in the tropical precipitation biass develops together with a reduction in the central equatorial Pacific cold bias up to about(of about 1 °C) and in the eastern tropical south Atlantic warm bias (Fig. 1), in agreement with previous literature (Huang et al., 2007;Xu et al., 2014a;Siongco et al., 2015;Song and Zhang, 2019). By contrast, cloud biases over these regions increaseworsens by about 3 % on average in the ensemble mean and locally up to 5-10 % with increased resolution in the coupled models (Fig. 4), and especially in the CNRM-CM6-1, MPI-ESM1-2, and HadGEM3-GC31 models (Fig. 4 Figs. S7 and Table S2). In most of the coupled models, increased resolution leads to modest bias reductionsimprovements (overall smaller than the magnitude of the bias itself), and thus the modelswhich still exhibit large biases in precipitation and 12 315 320 325 330 335 cloud cover over the tropical Pacific and Indian oceans (Figs. 2, and 4) and a clear precipitation excess in tropical precipitationover the SH tropics (Fig. 3).
In the atmosphere-only models, bias reductionimprovements due to resolution in precipitation and clouds in the tropics isare mostly negligible in the ensemble mean, and only the HadGEM3-GC31 and CNRM-CM6-1 models shows a slight reductionimprovement over the western tropical North Pacific and tropical North Pacific respectively (Figs. 2 and S43). This points to issues with the atmospheric model physics, which remained unchanged between LR and HR (Section 2), as the root of the precipitation and cloud cover biases over the tropics. Improvements seen in the HR coupled models therefore arise from increased resolution/improvements in the ocean, better represented coupling, or both.

Southern Ocean
The SO warm bias does not appear in all the PRIMAVERA LR coupled models (Figs. 1, S3). Excessively warm SATs by about 1-2 °C are especially persistent throughout the entire SO in the The EC-Earth3P and ECMWF-IFS models, which both use a combination of an IFS model and a NEMO model-albeit different versions (Section 2), show a mean SAT bias of about 1 °C over the entire SO (Table S3) with local values up to 2-3 °C (Fig. S23), which both use a combination of an IFS model and a NEMO model-albeit different versions (Section 2). In the By contrast, the CNRM-CM6-1, MPI-ESM1-2, and HadGEM3-GC31 models show a mean SO bias of about −1 °C but the patterns are , by contrast, biases over the SO show a more mixed pattern, with successive regional warm and cold biases that might result from a different spatial distribution in sea ice. Together with the SO warm bias, the LR coupled ensemble (and especially the CNRM-CM6-1, EC-Earth3P, and ECMWF-IFS models; Fig. S78) shows a meann underestimation of the mid-latitude cloud cover by about 5-10 % (Figs. 4, S7, and Table S3) and a positive mean bias in the net cloud radiative effect of about 10-20 5-15 Wm -2 (Figs. 5, S9 and Table   S3), which is dominated by the shortwave component (not shown). TBy contrast, the MPI-ESM1-2 model shows thea smallest comparably smaller (1~5 Wm -2 on average; Table S3 ) and less least widespread bias in its net cloud radiative effect over the SO (Fig. S89), which might explain its smaller surface temperature biases (Fig. S23). In contrast to the other models, tThe HadGEM3-GC31 model shows instead a positive bias in cloud cover over the SO ( . TAssociated with the SO warm bias, the LR coupled models also present a dry bias at mid-latitudes (Fig. 2).
Similarly, they exhibit an equatorward shift in the upper-level jet, even in models with a relatively small SO warm bias, with too weak zonal wind between the surface and the tropopause at around 60° S and too strong zonal wind at upper-levels (~200-300 hPa) to the equator (Fig. 6 Increased resolution has a mixed effect on the SO warm bias and, although it seems to increaseworsen in the ensemble mean ( Fig. 1), this varies substantially across models ( Fig. S23 and Table S3): the CNRM-CM6-1 model experiences an reductionimprovement of a cold bias over the Weddell Sea up to about , which turns slightly warm biased by about 41-2 °C; the EC-Earth3P warms along the Antarctic coast and its widespread SO warm bias persists at HR; the ECMWF-IFS model shows an increase worsening of its temperature bias by about 1.5-2 °C on average and very strongly locally in the Weddell Sea by more than 5 °C; the MPI-ESM1-2 shows a mean1-2 °C cooling overof theits SO of about 0.5 °C and, which becomes cold biased, especially to the west ofaround the Antarctic Peninsula; and the HadGEM3-GC31 model shows a reduction of its coastal cold bias, developing instead a more widespread 1-2 °C warm bias with local values up to about 1-2 °Calthough the cold bias over the Weddell Sea persists in the HadGEM3-GC31 eddy-rich model. In contrast to temperature, biases in cloud cover and net cloud radiative effect remain relatively unchanged between LR and HR (Figs. 4 and 5). TBoth the CNRM-CM6-1 shows a 1 % reduction in its mean cloud cover bias over the SO, while the and ECMWF-IFS and MPI-ESM1-2 models show a slight widespread reduction and 1-3 % increase of about 5 % in their cloud cover underestimation over the SO respectively (Table S3). Similarly, the , while the ECMWF-IFS model shows a 1.5 Wm-2 mean reduction while the and MPI-ESM1-2 models shows a 4 Wm-2 mean increase in their net cloud radiative effect biases over the SO present a small weakening and strengthening of about 5 Wm -2 of their positive bias in net cloud radiative effect respectively (Figs.

S56-S89). Given the small reductionimprovements in the biases in the cloud cover and net cloud radiative effect biases with
increased resolution, the change in the temperature bias over the SO might be related to a change in the sensitivity of the HR coupled models to the similar cloud and radiation biases, or to development of further biases, for example, in the sea ice, mixed layer depth, air-sea heat fluxes, or the strength of the Antarctic Circumpolar Current (e.g., Roberts C.D. et al., 2018b).
Some of these biases might, in turn, be linked to the disabling or not of the mesoscale eddy mixing at higher resolution (Roberts C.D. et al., 2018b), as discussed in Section 6. The dry bias over the SO remains unchanged (mean changes overall below 0.1 mm d−1) with increased resolution (Fig. 2). In agreement with previous studies, there is no obvious linkage between the magnitude of the SO bias and the double ITCZ bias in the LR and HR coupled models (Hawcroft et al., 2017).
Increased resolution deepens the magnitude of the zonal wind bias over the SH in all the models, although it has little impact on the overall pattern (Fig. 6).
As for the atmosphere-only models, temperature biases over most of the SO are negligible both at LR and HR (Fig. 1).
Nonetheless, the LR versions of the CNRM-CM6-1, EC-Earth3P, ECMWF-IFS, and MPI-ESM1-2 models show a cold bias up to of about 2-4 °C off the Antarctic coast, bias that is reduced only in the CNRM-CM6-1 by about 1-2 °C is reduced at HR (Fig. S12); this coastal cold bias might reflect an issue in the response of the lower atmosphere to the imposed sea ice field, perhaps related to assumed ice/snow thickness used in the land-surface scheme to calculate skin temperature over ice.
Biases in precipitation, cloud cover, and cloud radiative effect are comparatively similar to those in the coupled models and  Table S3). Biases in the SH jet in atmosphere-only models are similar but of small er amplitude compared to those in the coupled models (Fig. 6).

The North Atlantic
All the PRIMAVERA LR coupled models show a cold bias overof about 3-4 °C in the central subpolar North Atlantic and a warm one of about 1-2 °C off the North American east coast, with local values up to −5 °C and 2 °C respectively in the ensemble mean (Figs. 1, S32). These temperature biases are absent in the atmosphere-only models, which supports the notion that these are the result of the misrepresentation of the Gulf Stream separation and path by the ocean model. The cold bias is especially strong in the ECMWF-IFS model, where anomalies colder than −5 °C cover the large areas of the subpolar North Atlantic and Nordic Seas (Fig. S32); this strong cold bias results from an unrealistically weak Atlantic meridional overturning circulation (AMOC) and related heat transport, potentially related to the lack of re-tuning compared to its HR version (see Section 2 and Roberts C.D. et al., 2018b). The cold bias also extends northward into Arctic latitudes in the CNRM-CM6-1 and HadGEM3-GC31 models, which points to a misrepresentation of the Arctic sea ice in addition to the Gulf Stream path and the poleward oceanic heat transport. The cold bias over the subpolar North Atlantic is accompanied by a dry bias up to aboutof about 0.5-1 mm d −1 (Fig. 2) and, in most cases, by a reduced cloud cover up to about by about 5-10 % (Fig. 4). The cold bias might also be related to the southward shifted jet in the NH in some models (Fig. 6) due to a southward shift in the maximum of the horizontal temperature gradient (not shown); however, the bias in the NH jet might also be related to a southward shift in the ITCZ/Hadley Circulation (especially in the Atlantic basin; Fig. 2) and the associated intensification of the subtropical jet.
Increased model resolution reduces the magnitude of the cold bias by about 1 °C on average (Table S4) and locally up to about 2-3 °C in the ensemble mean (Fig. 1). There are, however, important differences across the ensemble members (Fig.   S2). The EC-Earth and CNRM-CM6-1 HR models show relatively small local reductions of the cold bias by about 0.5-1 °C over the central subpolar North Atlantic. The lack of a clear improvement in these two HR models might be related to the unchanged ocean physics between the low and high resolutions (Section 2). and in all models but the MPI-ESM1-2 model (Fig. S2). The MPI-ESM1-2 shows no changes in the biases between resolutions over the subpolar North Atlantic but a strong cooling up to about 4 °C over the Nordic Seas, likely related to misrepresented local sea ice. The lack of changes in the subpolar North Atlantic biases might be because bBoth the LR and HR MPI-ESM1-2 LR and HR models use the same ocean resolution (0.4°; Table 2) and both present a too zonal North Atlantic Current (Müller et al., 2018). Especially remarkable are the ECMWF-IFS and HadGEM3-GC31 models, for whichwhere the cold bias is strongly reduced (Fig. S23).
In the ECMWF-IFS model, this results from a much more realistic AMOC heat transport and sea ice extent in the North Atlantic compared to the LR version (Roberts C.D. et al., 2018b). In the HadGEM3-GC31, on the other hand, the bias is increased resolution (Roberts M.J. et al., 2019;Grist et al., 2021). The increase in ocean resolution from an eddy-present to an eddy-rich model leads to a more accuratefurther improvement in the Gulf Stream representation (Moreno-Chamarro et al., 2021) and a reduced warm bias near the coast ( Fig. S2; Roberts M.J. et al., 2019).
On average at HRIn all the HR models, the cold bias over the subpolar North Atlantic is replaced by a warm bias up toof about 2-3 °C over the Labrador Sea (Fig. 1). The warming of the entire subpolar North Atlantic is, in fact, one of the most remarkable differences at increased resolution in the ensemble mean.; The warming is especially prevalent in the NEMO models at the 0.25° resolution, in which the warm this warm bias is likely related to a stronger oceanic a too strong heat transport in the North Atlantic and a reducedtoo low sea ice concentration (Roberts M.J. et al., 2020b) than at lower resolutions, which in the NEMO models has been linked to a too strong ocean deep mixing in the Labrador Sea at 0.25°r esolution (Koenigk et al., 2021). In the MPI-ESM1-2 models, by contrast, a warm bias is already present at LR and, together with the cold bias in the central North Atlantic bias, remains unchanged at HR (Fig. S3). It is interesting to note that these two model versions share the same ocean resolution (Table 1). These results highlight the importance of ocean resolution for the North Atlantic bias. Tthe warming of the entire subpolar North Atlantic is in fact one of the most remarkable differences due to increased resolution between LR and HR coupled models.
Changes in other biases due to resolution include a reduction of the dry bias over the subpolar North Atlantic (Fig. 2), likely related to the surface warming, and a deepening of the bias in the NH upper-troposphere jet (Fig. 6), which might be related to an intensification in eddy momentum transfer to the jet due to resolution (Willison et al., 2013) and/or to the changes in the vertical structure of the temperature bias across models. The change in the cloud cover bias in the ensemble means is relatively small, of about ±5 % over the entire North Atlantic, with no clear changes in the pattern (Figs. 3). Other biases over the region also change with increased resolution: the dry bias over the subpolar North Atlantic is reduced (Fig. 2), likely because of the surface warming; the cloud cover bias remains relatively unchanged (Fig. 4); and the bias in the NH uppertroposphere jet deepens in most models (Fig. 6), which might be related to an intensification in eddy momentum transfer to the jet due to resolution (Willison et al., 2013) and/or to the changes in the vertical structure of the temperature bias across models.

Discussion and Conclusions
This paper examines whether increased horizontal resolution alone reduces four well-known, long-standing climate biases in five globalPRIMAVERA models developed within the PRIMAVERA project improves four well-known, long-standing grid to a 25-50-km one. The analysis also includes an eddy-rich global coupled model at an 1/12° ocean resolution. Models are compared to observations and the ERA5 reanalysis over the period 1980-2014.
All the PRIMAVERA LR coupled models suffer from the above-mentioned four key biases, as in previous and contemporary generations (CMIP3/5/6;IPCC, 2013;Wang et al., 2014;Tian and Dong, 2020). Although increased resolution contributes to reducing some of these biases, both globally and regionally, this is only in a few models and is model-dependent, for example, for surface temperature biases not consistently. In the ensemble mean, the warm eastern tropical ocean, the double ITCZ, and the cold North Atlantic biases are reducedimprove in the coupled models at higher resolutions; by contrast, the SO warm bias increasesworsens or persists in some models, with small changes in the cloud cover and net cloud radiative effect biases aloft; finally, and, a new warm bias emerges in the Labrador Sea in all the models due to that might be related to excessive Atlantic ocean heat transport (Roberts M.J. et al., 2020b) and excessive oceanic deep mixing in the Labrador Sea in the coupled models using the NEMO ocean models at a 1/40.25°-1/12° resolution (Koenigk et al., 2021). Despite some improvements, large biases remain at higher resolutions, especially in precipitation and cloud cover over the tropics and in the mid-latitude upper-tropospheric zonal wind, for which the benefit from resolution is rather modest. Our results are in line with previous modeling work suggesting reductionsimprovements in biases due to increased resolution (e.g., Mertens et al., 2014;Harlaß et al., 2018;Monerie et al., 2020;Vannière et al., 2020)  The ensemble means hide important differences across the individual models. Compared to their respective LR versions, the CNRM-CM6-1 HR model shows a modest reductionimprovements in most of its biases, although it still exhibits some of the largest biases in precipitation, cloud cover, net cloud radiative effect over the tropics, and zonal winds at SH mid-latitudes among the ensemble. The EC-Earth3P HR model improves slightly in the upwelling and subpolar North Atlantic regions but still shows large biases in tropical precipitation and a widespread SO warm bias. The ECMWF-IFS HR model, the one with the finest atmospheric nominal resolution (~40 km; Table 1), shows a big reductionimprovement in the North Atlantic cold bias because of a much more realistic Atlantic ocean heat transport compared to LR, and a modest bias reductionimprovement in the tropical precipitation and the eastern tropics; however, it also shows an increase worsening of the SO warm bias and no major changes in its global cloud cover biases. The HadGEM3-GC31 HR models improve the most among the ensemble because all its biases except for the warm SO are reduced with increased resolution. This includes notable gains in the tropical south Atlantic upwelling region, with bias reduction inimproved surface temperature, cloud cover, and precipitation over the upwelling region, and in the North Atlantic associated, with a more realistic Gulf Stream path (Roberts M.J. et al., 2019). except for the North Atlantic, where the LR and HR, both with the same ocean resolution, suffer from similar biases in the Gulf Stream path and North Atlantic temperatures. These results illustrate how strongly model-dependent the impact on the studied biases due to increased resolution can be.
When additional model configurations are available, the benefit of bias reduction from increasing ocean resolution alone can be assessed. For the ECMWF-IFS model, increased ocean resolution from 1° to 0.25° reduces the North Atlantic, Arctic, and equatorial Pacific temperature biases but increases the SO warm biases (Roberts C.D., et al., 2018b). For the HadGEM3-GC31 model, increased ocean resolution up to an eddy-rich one (0.08°) improves the Gulf Stream separation (Roberts M.J. et al., 2019) and representation (Moreno-Chamarro et al., 2021), although the eddy-rich resolution by itself has a modest impact on reducing surface temperature biases compared to the eddy-present (0.25°; Fig. S3  Its bias, however, also improves at an eddy-rich ocean resolution (Gutjahr et al., 2019). Overall, our analysis thus finds modest reductionimprovements or even bias degradation when the ocean resolution increases from 1° to 0.25°. Reaching an eddy-rich ocean resolution might be necessary to improve biases, for example, over the eastern tropical oceans and North Atlantic, in line with previous studies (e.g., Mertens et al., 2014;Xu et al., 2014b).
As for then increase in atmosphere resolution alone, it contributes to reducing the warm bias over the eastern tropical oceans in the ECMWF-IFS (Roberts C.D., et al., 2018b), HadGEM3-GC31 (Roberts M.J. et al., 2019) and MPI-ESM1-2 (this study) coupled models., Previous studies have linked a similar bias reduction potentially related to a more realistic coastal wind system (Small et al., 2015;Milinski et al., 2016). The reduction in the surface warm bias, in turn, helps reduces the regional precipitation and cloud cover biases aloft. However, increasedthe increase in atmosphere resolution alone does not leads to very modest bias reductionsimprovements over most regions in the atmosphere-only models, which still show strong biases in tropical precipitation over the western ocean basins at HR. The atmosphere-only models also show biases in cloud cover and net cloud radiative effect over the whole tropics and in the zonal winds at mid-latitudes very similar to those in the coupled models both at LR and HR. Even though we acknowledge that our conclusions might be both model and region dependent, taken together, our analysis suggests that to remove model biases i) a refinement of the atmosphere resolution up to ~50-km alone might not always be sufficient, and ii) reaching eddy-rich ocean resolutions (1/12° or fine) might be needed. The increase in ocean resolution from eddy-parametrized (~100 km) to eddy-richpermitting (~10 km) allows models to represent the first baroclinic Rossby radius and might therefore help improve the representation of small-scale dynamical processes and then biases. In contrast, equivalent phenomena in the atmosphere are already well resolved (the first Rossby radius at mid-latitude is about 1000 km, which corresponds to the synoptic scale). Many of the challenges of reducing atmospheric model biases are related to interactions between dynamics, radiation, and parameterized (moist) physics (clouds, convection, radiation). These errors are much more difficult to address with increasing resolution as they are not obviously related to errors in grid-scale dynamics but in model physics (Kay et al., 2016;Varma et al., 2020). Increased atmospheric resolution improves the representation of weather or extremes, as found, for example, for tropical cyclones (Roberts M.J. et al., 2020a;Vannière et al., 2020;Vidale et al., 2021;Zhang et al., 2021) and blocking frequency (Schiemann et al., 2020) in PRIMAVERA models and in numerical weather prediction systems (e.g., Lean et al., 2008).
In addition to increased resolution, improvements in model parametrizations and process representations, specific corrections applied to models, additional tuning, and longer spin-ups might all be essential to minimize help attenuate model biases.
More realisticImproved cloud physics based on observational constraints, for example, can reduce the SO biases in the net cloud radiative effect by about 4 Wm -2 and in the surface temperature by about 1 °C (Kay et al., 2016;Varma et al., 2020).
Corrections to the North Atlantic Current flow and North Atlantic surface freshwater budget can suppress the cold North Atlantic bias entirely (Drews et al., 2015). Further model tuning and longer spin-ups are still to explore. For the PRIMAVERA models considered here, no additional tuning was performed with the change in resolution, in agreement with the HighResMIP protocol (Haarsma et al., 2016). For the ECMWF-IFS model in particular, the LR version may have benefited from further tuning of the ocean component to reduce biases in the AMOC and North Atlantic SST in multidecadal climate simulations ( Fig. S2; Roberts C.D. et al., 2020a). However, in this case, it was an explicit decision to keep the ocean vertical physics as consistent as possible across configurations to ensure the LR ocean was a good proxy for the HR ocean in coupled projections at daily to seasonal time scales (Roberts C.D. et al., 2020a;. The lack of re-tuning seems to have been critical for the ECMWF-IFS model especially, whose HR version was first developed and tuned and later reduced in resolution but not re-tuned (Section 2 and Roberts C.D. et al., 2018b). This process resulted in a large degradation of its AMOC, which is unrealistically weak, and North Atlantic cold bias-even though the LR version shows a weaker SO warm bias than the HR does (Fig. S2). Regarding longer spin-ups, the PRIMAVERA models considered here also followed the HighResMIP protocol, which recommended a relatively short 50-year spin-up (Haarsma et al., 2016).
In the HadGEM3-GC31 LR coupled model, such a spin-up was found insufficient to stabilize its large-scale circulation and could therefore have contributed to accentuating some of its biases (Roberts M.J. et al., 2019). Testing the benefit of model re-tuning and longer spin-ups would, however, be extremely time and resources consuming if performed following 19 540 545 550 555 560 565 traditional approaches at the highest resolutions. Further bias reduction might be gained by using new convection-permitting climate models, as computing power increases with every new model generation (Klocke et al., 2017).
To summarize, our study finds limited benefit from increased resolution alone between the traditional ~100 km models and the ~25 km ones to reduce long-standing biases, based on an ensemble of high-resolution models developed for the PRIMAVERA project. At this resolution range, increased resolution in both the atmosphere and ocean can to some extent reduceimprove biases in the eastern tropical oceans, ITCZ, and North Atlantic, withwhere further gains at an eddy-rich ocean resolutionimprovements can be gained by using an eddy-rich ocean model. Reductions in surface temperature biases are strongly model-dependent in the coupled models and might be subject to differences in model physics between them. In addition to further increases in increased resolution, we therefore propose that future efforts should also be directed toward improving model physics, for example in cloud representation, and developing innovative high-resolution model tuning approaches at higher resolutions.
Code and data availability.