ACCESS-OM2: A Global Ocean-Sea Ice Model at Three Resolutions

We introduce a new version of the ocean-sea ice implementation of the Australian Community Climate and Earth System Simulator, ACCESS-OM2. The model has been developed with the aim of being aligned as closely as possible with the fully coupled (atmosphere-land-ocean-sea ice) ACCESS-CM2. Importantly, the model is available at three different horizontal resolutions: a coarse resolution (nominally 1◦ horizontal grid spacing), an eddy-permitting resolution (nominally 0.25◦) and an eddy-rich resolution (0.1◦ with 75 vertical levels), where the eddy-rich model is designed to be incorporated into the Bluelink 5 operational ocean prediction and reanalysis system. The different resolutions have been developed simultaneously, both to allow testing at lower resolutions and to permit comparison across resolutions. In this manuscript, the model is introduced and the individual components are documented. The model performance is evaluated across the three different resolutions, highlighting the relative advantages and disadvantages of running ocean-sea ice models at higher resolution. We find that higher resolution is an advantage in resolving flow through small straits, the structure of western boundary currents and the 10 abyssal overturning cell, but that there is scope for improvements in sub-grid scale parameterisations at the highest resolution. 1 Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2019-106 Manuscript under review for journal Geosci. Model Dev. Discussion started: 30 April 2019 c © Author(s) 2019. CC BY 4.0 License.


Introduction
Ocean-sea ice models have extensive applications. They form the oceanic component of coupled climate and Earth system models that are used for projecting future climate, and can incorporate biogeochemical and ecosystem dynamics which extend the realm of application. They are also needed for forecasting on shorter timescales -both forecasting in the ocean and for seasonal prediction of the ocean/sea ice/atmosphere state. As a research tool, ocean-sea ice models can be used to quantitatively 5 test, or experiment with, the dynamics of the climate system; such process studies have been invaluable in forming a broad understanding of the drivers of climate change and variability.
Modelling studies face the challenge of compromising between resolving critical processes and computational expense. For example, the standard grid spacing for the ocean component of coupled climate models is 1 • , with indications that some models being prepared for the next Coupled Model Intercomparison Project (CMIP6) will use 0.25 • horizontal spacing. However, 1 • 10 models do not resolve the ocean mesoscale, meaning that they miss key processes that can influence the climate. Higher resolution models usually have improvements in the climate state with better estimates of vertical heat transport , enhancement of boundary currents (Hewitt et al., 2016), better resolution of ocean straits and improved Southern Ocean state (Bishop et al., 2016). On the other hand, high resolution simulations consume huge computational resources, which limits the length of runs and the capacity to optimise the model configuration (or minimise biases) by testing the model over 15 a wide parameter space. There is also less experience in the coupled ocean-atmosphere-ice modelling communities in the integration of these high resolution ocean models for climate simulations. Thus, while higher resolution models are becoming computationally feasible, the additional resolution does not necessarily result in improved simulations.
One of the complexities in characterising model performance as a function of resolution is the influence of model biases governing the model state. It is well known, for example, that different models subjected to the same atmospheric state produce 20 differing mean states (e.g. Griffies et al., 2009;Danabasoglu et al., 2014). It follows that investigating the effects of model resolution requires a clean hierarchy; a model suite in which variations in resolution are available with homogeneous code, forcing and, as far as possible, parameter choices. This is a technique successfully employed by the DRAKKAR consortium (Barnier et al., 2014) as well as climate model developers (e.g. Griffies et al., 2015;Hewitt et al., 2016).
In this manuscript we outline development of the latest version of the ocean-sea ice component of the Australian Commu-

Vertical grid
The configurations use a z * vertical coordinate (Stacey et al., 1995;Adcroft and Campin, 2004) with partial cells (Adcroft et al., 1997;Pacanowski and Gnanadesikan, 1998). The vertical grids are optimised for resolving baroclinic modes, based on the KDS grids recommended by Stewart et al. (2017). The vertical grids in ACCESS-OM2 and ACCESS-OM2-025 are slightly modified versions of KDS50 with 50 levels and 2.3 m spacing at the surface, increasing smoothly to 219.6 m by the bottom at 5 5363.5 m. The vertical grid in ACCESS-OM2-01 is a slightly modified version of KDS75, with 75 levels and 1.1 m spacing at the surface, increasing smoothly to 198.4 m by the bottom at 5808.7 m. Details of the vertical discretisation can be found in Table 1.

Horizontal grid
In the horizontal direction, MOM5 and CICE5 both use the same orthogonal curvilinear Arakawa B-grid with velocity compo-10 nents co-located at the northeast corner of tracer cells. Model configurations have been developed with zonal grid spacings of Table 1. Vertical grid parameters: n levels, with spacing of ∆zmin and ∆zmax at the surface and maximum depth Hmax, respectively, and median spacing ∆zmedian. The depth at level n/2 is denoted H n/2 . The horizontal meshes are 360 × 300, 1440 × 1080 and 3600 × 2700 at 1 • , 0.25 • and 0.1 • , respectively (see Table 2). In all cases the grid extends from the North Pole to 81.1 • S, covering the global ocean to the Antarctic ice shelf edge but omitting globally balanced; thus, in these regions the sea ice salt is instead obtained from the global surface ocean at large. Over a sea ice formation and melt cycle this produces a small unphysical transport of salt from the global surface ocean to regions where such sea ice melts.

YATM and libaccessom2
ACCESS-OM2 uses a new atmospheric driver, known as YATM, which implements a file-based atmosphere and replaces 5 MATM which was used in ACCESS-OM (Bi et al., 2013b). Its purpose is to track model time and, when necessary, read the appropriate forcing fields from files and deliver them to the coupler. This is implemented via an associated library (libacces-som2, https://github.com/COSIMA/libaccessom2) that is linked into YATM, CICE and MOM to provide shared functionality and an interface to inter-model communication and synchronisation tasks.
YATM is also responsible for remapping river runoff in real-time. This is done separately from the other forcing fields 10 because it is difficult to do in a distributed memory setting, since ensuring runoff is on coastal points may require interprocess communication. Remapping is done in two steps: first runoff is moved to the destination grid using conservative interpolation, and then it is distributed from land to coastal points using an efficient nearest neighbour algorithm based on a pre-computed k-dimensional tree (k-d tree) data structure (Bentley, 1975). We use the KDTREE2 Fortran k-d tree implementation (https:// github.com/jmhodges/kdtree2). The k-d tree is also used to conservatively spread runoff into the neighbouring ocean grid points 15 to ensure it does not exceed a prescribed cap. ACCESS-OM2 and ACCESS-OM2-025 use a runoff cap of 0.03 kg m −2 s −1 globally. In ACCESS-OM2-01 there are reduced caps of 0.001 and 0.003 kg m −2 s −1 at the mouths of some Arctic rivers to produce broader spreading and avoid excessively low salinity, and a cap of 0.03 kg m −2 s −1 everywhere else.

Initial conditions
The ACCESS-OM2 and ACCESS-OM2-025 runs were started at rest, with zero sea level and with temperature and salinity 20 from World Ocean Atlas 2013 v2 Zweng et al., 2013) 0.25 • "decav" product (the average of six decadal averages spanning 1955-2012) and run for five 60-year cycles (1 Jan 1958 -31 Dec 2017) of JRA55-do. The initial sea ice concentration and thickness are 100% and about 2.5 m (respectively) in regions north of 70 • N and south of 60 • S where the sea surface temperature is less than 1 • C above freezing, with a parabolic distribution of area across the five ice thickness categories.
The initial snow thickness in each category is 0.2 m or 20% of the ice thickness in that category, whichever is smaller.

25
The ACCESS-OM2-01 experiment ran for 33 years from 1 Jan 1985 -31 Dec 2017. It was started from a 40-year spinup under repeated 1 May 1984 -30 April 1985 JRA55-do forcing (repeat-year forcing, RYF, Stewart et al., in prep.). This spinup began from the same initial condition as the ACCESS-OM2 and ACCESS-OM2-025 runs. The RYF spinup contained some parameter changes, in particular the ice-ocean stress turning angle was changed from 16.26 • to zero at the start of August in the 12th year. Before this change the Arctic ice volume built up significantly (and unrealistically) in the thickest category, but it 30 began a slow decline when the turning angle was set to zero which persisted into the first ∼6 years of the interannually-forced run (Fig. 27c). The 1984-85 repeat-year forcing contained some biases relative to climatology; for example, biases in the North Pacific wind stress curl produced late separation of the Kuroshio Current. Figure 1 shows the fields that are transferred between the coupled model components.  (Kritsikis et al., 2017;Khoei and Gharehbaghi, 2007) to produce very smooth destination fields; this is particularly important for the ACCESS-OM2-025 and ACCESS-OM2-01 configurations because they have finer resolution than the forcing dataset.

Computational performance
The computational performance of a coupled model depends upon the runtimes of each of its component models, the coupler overhead, and any potential load imbalances between each component. Here we report measurements of runtime of the MOM5 20 and CICE5 model components as a function of resolution and core count. We first present measurements of MOM5 and CICE5 runtime scalability with respect to computational core count, and consider performance over several CPU configurations. The performance of the coupled ocean/sea-ice model is inferred from the independent performance of CICE5 and MOM5, leading to recommended standard configurations.

MOM5 scalability
We conduct a series of tests into the scalability of MOM5 at each of the three model resolutions tested here, configured as considerably more computational time, MOM5 still scales outstandingly well -up to 4000 cores for ACCESS-OM2-025 and 16,000 cores for ACCESS-OM2-01 -and runtimes can often be sustained when provided with a sufficient number of CPUs.
This demonstrates efficient scaling well beyond the 512 cores investigated by Schmidt (2007). On our platform, a MOM5 configuration of 0.1 • grid spacing could achieve a maximum theoretical performance of almost 5 years per day, although startup and model I/O would reduce the speed in any practical case.

25
These results highlight the scaling efficiency of MOM5 and give us flexibility to choose different MOM5 core counts for different configurations. However, the behaviour of the coupled ocean/sea-ice model is also dependent on CICE5 performance, as discussed below.

CICE5 scalability
Analogous scaling tests for CICE5 were also undertaken, configured as described in Sect. 2.2, except that the distribution   (Fig. 2a, c). Even at higher than optimal core counts, scaling is acceptable.
However, these numbers obscure some more complex issues. First, the effective performance of CICE5 is variable, partly because ice cover is seasonal, so different tiles have a different amount of work to do at different times of year. This variability is mitigated by using the sectrobin scheme in CICE5 (Craig et al., 2015) in these scaling tests at 0.25 • and 0.1 • , which selects 5 tiles from different locations within a sector to be computed on the same core; we use four tiles per core for this scheme at 0.1 • grid spacing. While the code itself scales well with the number of cores, the total amount of work required increases more rapidly in CICE5 at higher resolutions. That is, the additional CPU time required for each MOM5 model timestep increases by a factor of 90 in going from 1 • to 0.1 • grid spacing (Fig. 2b); for CICE5 the CPU time required per timestep increases by a factor of 200, with a disproportionately large increase between 0.25 • and 0.1 • (Fig. 2c). We mainly attribute this to changing from 10 one to three dynamic timesteps per thermodynamic timestep between 0.25 • and 0.1 • , exacerbated by using the slower mushy thermodynamics, but residual load imbalances may also contribute. Consequently, we use relatively more cores for CICE5 than  for MOM5 at higher resolution -the ratio of ice to ocean cores is 0.11 in ACCESS-OM2, 0.25 in ACCESS-OM2-025 and 0.35 in ACCESS-OM2-01 (Table 2).

Coupled ocean/sea-ice model configuration
The performance of the coupled ocean-sea ice model is limited by the scaling performance and resource requirements of both components, as well as their load balancing and variability. This load balance is further complicated by the differing 5 performance of components as a function of resolution, and by the need to alter the model timestep for some simulations. It is clear from Fig. 2a that balancing the runtime of the model components requires more cores for MOM5 than CICE5, but the ratio depends on resolution. Since CICE5 has a lower core count than MOM5 at all resolutions, if imbalances are to exist, we aim to ensure that CICE5 is waiting for MOM5. The standard configurations, core counts and typical performance of differing model resolutions are shown in Table 2. 10 The configurations shown here are under continuous development and optimisation, and it is anticipated that improvements in model stability and load balancing will continue to improve performance in the future (for example, we have recently doubled ACCESS-OM2-01 performance relative to the figures in Table 2). We have also configured a minimal ACCESS-OM2-01 configuration with a total core count of ∼2000, to aid in testing and to run on smaller systems. These configurations will be continually released and documented on the ACCESS-OM2 code repository as they are developed.

Model evaluation
We now outline results from ACCESS-OM2 using simulations with each of the three horizontal resolutions outlined in Table   2. Each of the three simulations is forced by the interannual JRA55-do forcing dataset (Tsujino et al., 2018), which currently covers 60 years from 1958 until the end of 2017. The lower resolution simulations (both ACCESS-OM2 and ACCESS-OM2-025) continuously cycle through five iterations of this dataset, giving a 300-year simulation, following the CORE-II protocol 20 (e.g. Danabasoglu et al., 2014) and the CMIP6/OMIP protocol . Selected global diagnostics from these simulations are shown by the blue and orange lines (respectively) in Fig. 3, where dates have been aligned so that the last  The highest resolution simulation, using ACCESS-OM2-01, is ∼ 1000 times more computationally intensive than ACCESS-OM2, and ∼ 25 times more computationally intensive than ACCESS-OM2-025 (see Table 2). A full simulation with five 5 internannual forcing cycles of ACCESS-OM2-01 is not possible with current computing resources, and hence we use an alternative spinup strategy. As discussed in Sect. 2.4, in this case we select a Repeat Year Forcing (RYF) spinup strategy (Stewart et al., in prep.), in which the time period 1 May 1984 -30 April 1985 is repeated continuously. This spinup has been run for 40 years, after which time we conduct an interannually forced simulation from 1985 through to 2017. It is this interannual simulation period which is used for the model evaluation in this manuscript, as indicated by the green line in Fig. 3. 10 It is important to note that, in keeping with the CORE-II and OMIP protocols, we make no attempt to account for model drift in the analysis of these simulations; in the case of ACCESS-OM2-01 this means that the simulation is less well equilibrated than the lower resolution simulations, and contains some biases from the repeat year spinup strategy. Care must be taken in interpreting the influence of model drift on the different cases.
The timeseries of globally averaged quantities shown in Fig. 3 indicate that there is an initial drift due to heat uptake in ACCESS-OM2-025, while ACCESS-OM2-01 drifts cold relative to the initial state ( Fig. 3a), primarily during the repeat year 5 spinup phase. On the other hand, the global average Sea Surface Temperature (SST) in the models is dominated by the forcing field, with only weak variations between each cycle (Fig. 3b). This variation of SST within the final forcing cycle is closer to observations than that seen with CORE-II forced models (see Fig. 2 of Griffies et al., 2014), and includes a reasonable representation of the slowdown in warming in the decade preceding 2010. The high resolution model also drifts towards surface freshening, which is partly offset by the surface salinity restoring that is incorporated into the model (Fig. 3c), but this drift predominantly occurs during the Repeat Year Forcing spinup, and the rate of drift is reduced when the interannual forcing is used. As expected, the kinetic energy of the simulations is a strong function of resolution, with higher resolution models containing more turbulent processes (Fig. 3d). Each of these aspects of the simulations is investigated in greater depth by the following analysis, where we first focus on global circulation metrics, then look to better characterise important regional ocean circulation differences and finally investigate the representation of sea ice. OM2-01 is significantly lower, and is more stable. The underestimated ACC transport in these models is characteristic of this class of ocean-sea ice models (e.g. Farneti et al., 2015). It is notable that the higher resolution case does not lead to an improved transport prediction, although this could be a result of the much shorter spinup at 0.1 • .

25
To evaluate the capacity of the different model configurations to represent the mean state of the broad-scale horizontal ocean circulation, the simulated mean dynamic sea level (MDSL), averaged over years 1993-2012, is compared with the CNES-CLS13 observational product distributed by AVISO, which combines data from satellites, surface drifters and in-situ measurements to reconstruct a time mean, global Mean Dynamic Topography (MDT) field for the same time period. The reconstruction procedure is described in Rio et al. (2009). The comparisons between the simulated MDSL and observed MDT  The capacity of the different model configurations to represent the spatial patterns of sea-level variability provides a reliable proxy for surface mesoscale activity. To do this, we compute the standard deviation of the sea-level anomaly (SLA) for each configuration at each model grid point, to produce global maps of SLA deviation for the years 1993-2014 (where long-term linear trends are removed). The simulated SLA standard deviation is then compared with the objectively interpolated, multimission satellite derived, SLA product (AVISO Ssalto/Duacs), also from 1993-2014. We use the gridded observational product 5 for convenience in comparing global maps of SLA variability. However, we note that the optimal interpolation procedure used to produce the gridded product from satellite ground tracks tends to smooth the underlying fields and hence may underestimate the SLA variance by as much as 50% in certain regions (Chambers, 2018). As such, the true SLA variability may be higher than is indicated here.
The maps of SLA standard deviation are plotted in Fig. 6. Elevated SLA variability typically occurs in regions rich in   (Han et al., 2017;Mu et al., 2018). In the Indian ocean, variability is associated with both the Indian Ocean Dipole and ENSO (Li and Han, 2015).
Anomalies in the south-central Indian Ocean are driven by wind anomalies related to ENSO events (Xie et al., 2002) and are 25 simulated in each model resolution. ACCESS-OM2-01 also simulates variability from Indonesia down the West Australian coast which then propagates as Rossby waves westward (Potemra and Lukas, 1999), which is muted in the coarser resolution simulations.  data not shown). Longer simulations are required before we can firmly establish the equilibrium behaviour of the AMOC for ACCESS-OM2-01. 10 The other circulation cell of interest in Fig. 7 is the abyssal overturning cell (the negative cell centred at 1037 kg/m −3 ), which occupies a small part of density space but comprises a significant fraction of global water volume. Here, the strongest modelled overturning cell is in ACCESS-OM2-01 (12-15 Sv at 40 • S, see Fig. 8b; compared with poorly constrained observational estimates of 20-50 Sv; Sloyan and Rintoul, 2001;Lumpkin and Speer, 2007;Talley, 2013). We argue that in ACCESS-OM2-01, this abyssal cell is partly driven by the more realistic process of surface water mass transformation on the Antarctic continental  warming the fastest), although in the coupled system all models warmed at a more rapid rate . In particular, the warming tendency at mid-depths is larger when the mesoscale eddy processes (namely eddy-advection and isoneutral diffusion) are not well resolved. This occurs especially in eddy-permitting models (such as GFDL-MOM CM2.5, 0.25 degree) which neither fully resolved eddy processes nor include any eddy parameterisation. Our 0.25 degree model (ACCESS-OM2-025) includes both eddy-advection (Gent and McWilliams, 1990) and neutral diffusion (Redi, 1982;Griffies et al., 1998) pa- Model drift dominates the long-term evolution of temperature anomalies, but considerable interannual variability occurs in the upper 100 m (Fig. 9). Warming trends toward the end of the historical period are superimposed upon the cold anomalies near the surface in all ACCESS-OM2 models. Atmospheric forcing such as Coordinated Ocean-Sea ice Reference Experiment (CORE) Interannual Forcing (IAF) and JRA55-do are not designed to reproduce a long-term trend as expected from climate change due to the adjustment performed to obtain a global surface heat budget closure over the satellite era (Tsujino et al.,5 2018). However, an intermodel comparison study under the CORE-IAF protocol showed that all models experienced an increase in ocean heat content in the upper 700 m and associated sea-level rise over the 1993-2007 period similar to observations (Griffies et al., 2014). A practical approach to isolate the interannual variability from the model drift in ocean-sea ice model studies is to perform a de-drift using a control run, in a similar way as performed in fully-coupled climate models (Sen Gupta et al., 2013;Hobbs et al., 2016). Whilst the protocol of CORE-IAF/JRA55-do does not require a control run, it can be achieved

Temperature and salinity biases
Model drift can occur for a variety reasons in an ocean-sea ice model, particularly due to deficiencies in model physics and numerics, or due to unresolved processes in the model coupling (Sen Gupta et al., 2013). This drift can be further investigated by examining model sea surface temperature (SST) biases (relative to WOA13 climatology) as presented in Fig. 10; the cor- (WBCs) are found in all ACCESS-OM2 resolutions; this is also seen in many CORE models, associated with non-eddy permitting resolution and poor representation of WBC separation and fronts (Griffies et al., 2009). These biases are reduced in ACCESS-OM2-025 and particularly in ACCESS-OM2-01, however the similarities in the spatial pattern of surface temperature suggest the possibility of systematic biases in the surface forcing, or in the surface coupling.
In the Southern Hemisphere, larger biases in highly energetic regions (e.g. the Agulhas retroflection and along the ACC  (Fig. 11). Similar biases have been previously associated with a too-zonal path of the NAC (Danabasoglu et al., 2014, and Sect. 4.2.4) and deficient overflow from the Nordic Seas . ACCESS-OM2 has generally cold anomalies in the Subpolar North Atlantic but comparatively smaller biases in 15 the NAC; the weak AMOC transport is likely related to strong fresh biases around Greenland and in the Labrador Sea (Fig. 11).
ACCESS-OM2 presents a smaller warm bias near upwelling zones on the west coast of the American and African continents in comparison with CORE models (Griffies et al., 2009). This bias has been associated with coarse resolution in both model with anomalous open-ocean convection confined to a much smaller and more interannually-variable region in the northeastern Weddell Sea (but has also had less time to drift away from climatology). The warm and salty biases north of 45 • S above 1500 m are associated with weak penetration of cold and fresh Antarctic Intermediate Water, which can be caused by incorrect subduction and/or isopycnal mixing rates (Bi et al., 2013b). These biases are larger in ACCESS-OM2 and ACCESS-OM2-025, and smaller in ACCESS-OM2-01; we hypothesise that the coarser models have less isopycnal mixing (resulting from the sum of partially resolved and partially parameterised mixing) while ACCESS-OM2-01 seems to have a more realistic explicitly-resolved isopycnal mixing. Weaker along-isopycnal transport may the zonal-mean bias shows warm anomalies between 1000-3000 m at 60 • N (Fig. 12c, e). On the other hand, the weak AMOC transport in ACCESS-OM2 is translated into a strong warm bias above 1000 m, just below a large fresh bias (Fig. 12a, b).  Warm biases in this region have been linked with excessive surface deep convective mixing and overturning (Griffies et al., 2009;Bi et al., 2013b).

Heat transport
All three model configurations reproduce the large-scale features of the meridional heat transport suggested by reanalysis products (Fig. 13). ACCESS-OM2 simulates a weaker northward heat transport than the other two configurations at most 5 latitudes, associated with a weak AMOC (Figs. 7, 8). Heat transport within the Southern Ocean is consistent with observations within the spread between the observational products. Heat transport north of 40 • N within ACCESS-OM2-025 and ACCESS-OM2-01 is stronger than suggested by the reanalysis products, but consistent within error bars with the more direct estimate from WOCE (Ganachaud and Wunsch, 2003). The local maximum in heat transport at ∼50 • N is commonly seen in higher resolution models, and is thought to reflect a stronger Atlantic subpolar gyre contribution to the circulation (Griffies et al., In the tropics, the models simulate consistently weak poleward heat transport in comparison to the reanalysis products in both hemispheres. This weak transport is a feature of many ocean-only and coupled climate models (e.g. Griffies et al., 2009Griffies et al., , 2015. The ACCESS-OM2 configurations do not lie outside the range of model simulated transports in this regard.  There are well known issues with inferring poleward heat transport from reanalysis products and there are large variations between different products (e.g. Griffies et al., 2009;Valdivieso et al., 2017). Furthermore, the model simulations are more consistent with the inferred heat transport from the JRA55-do forcing itself at these low latitudes, particularly in the Southern Hemisphere (see Fig. 30 of Tsujino et al., 2018). The models still underestimate the peak in northward heat transport at 20 • N (∼ 1.3 PW in ACCESS-OM2-025 and ACCESS-OM2-01 compared to ∼ 1.5 PW from the JRA55-do forcing). The reason for 5 this mismatch remains unclear and worthy of further investigation. Nonetheless, the results show that ACCESS-OM2-025 has a clear advantage in representing heat transport over ACCESS-OM2, suggesting that 0.25 • models may have an improved climate in the upcoming CMIP6 simulations, compared with coarser resolution climate models.

Regional ocean circulation
The second part of this model evaluation involves examining the performance of the model at a selected number of key regions. In these regional analyses we will focus on the major circulation features such as the separation of western boundary currents, the average state of equatorial currents and flow through major choke points. The regional evaluation is not intended to be exhaustive; but instead will outline regions in which the model behaves well or poorly. It is envisaged that more in-depth 5 analyses will be published using this model in the near future.

Southern Ocean
A significant driver in moving towards high resolution ocean models is to better represent the dynamics of the Southern Ocean, where mesoscale variability is critical in capturing the evolution of the system (e.g. Hogg et al., 2015). An example of the improvement in water mass properties can be seen in Fig. 14, where transects of temperature and salinity along the 10 SR3 hydrographic line are compared with historical observations. Here, progressively enhancing the resolution leads to better representation of the observed surface low-salinity layer, enhanced subduction into the mid-depths, and improved Antarctic shelf properties and abyssal temperature-salinity structure (bearing in mind that the ACCESS-OM2-01 simulation is less well equilibrated, and thus has had less opportunity to drift away from the initial climatology).  For all model resolutions, mixed layer depths are very similar. Nevertheless, maximum mixed layer depths are deeper than observations suggest, especially in ACCESS-OM2 and ACCESS-OM2-025 (data not shown). Bias in the MLD may be due to a number of factors, such as bias in the surface buoyancy forcing (Sallée et al., 2013), systematic errors in the convective parameterisation, sub-grid scale turbulence and friction schemes (Dufresne et al., 2013) and the representation of submesoscales (Wenegrat et al., 2018). Determining the exact cause of the bias requires a careful analysis of the mixed layer budgets and is  increasing resolution. From our current understanding of mode water ventilation it is not clear how these differences, i.e. winter mixed layers which are too deep in the models and differences in the distribution of PV, will affect the uptake of tracers, such as heat and carbon, into the ocean interior.
Sea level variability in the region of the Agulhas Current is shown in Fig. 16 (colours) for each model resolution, including a comparison with observations. Variability follows contours of the barotropic streamfunction (white contours) down the 5 Mozambique channel (de Ruijter et al., 2002) and East Madagascar coast, continuing along the southeast coast of Southern Africa. There is a peak in variability in all simulations where the Agulhas Current retroflects at the southern tip of the African continental shelf. From here, variability continues both west into the South Atlantic Basin along the path of the Agulhas Rings (Dencausse et al., 2010), and east along the Agulhas Return Current. The peak in this variability is well captured in the ACCESS-OM2-01 simulation relative to observations, whereas variability amplitudes in the ACCESS-OM2-025 simulation 10 are about half those observed and the variability in ACCESS-OM2 is substantially less again. The path of the circulation in the region before the retroflection, as indicated by the contours of barotropic streamfunction, is consistent between the simulations and the observations. A sea level variability hotspot is well captured to the south of the main retroflection in the ACCESS-

Australasia
In the south-west Pacific Ocean the westward South Equatorial Current bifurcates at the Australian coast at about 16 • S, with the southward branch forming the southward-intensifying East Australian Current (EAC). Between 33 • S and 35 • S the EAC splits into an eddying eastward outflow (known as the Tasman Front), and the EAC extension, an alongshore southwardweakening eddy-dominated flow (Ridgway and Dunn, 2003). Sea level standard deviation in ACCESS-OM2-01 (Fig. 17c) 20 reproduces the observed spatial distribution of eddy activity in this region (Fig.17d)   variability is more severely underestimated in the coarser configurations (Fig. 17a, b). ACCESS-OM2-025 retains a weak qualitative signature of both the Tasman Front and the EAC extension, whereas these are nearly absent in ACCESS-OM2.
Contours of barotropic streamfunction converge near Australia's east coast in observations (Fig. 17d)  compensating biases in individual straits in ACCESS-OM2-01 and the coarser configurations may over-or under-estimate transport in each strait. Models may underestimate the magnitude of the total transport, or the transport in individual straits, for three primary reasons. Firstly, the ITF transport from the Pacific to the Indian Ocean is induced by the sea level gradient between these two oceans; in ACCESS-OM2 and ACCESS-OM2-025, this sea level gradient is weaker than observed and 10% smaller than in ACCESS-OM2-01 (Fig. 5); thus, it is not strong enough to reproduce the observed total transport. Secondly, 5 Lombok and Ombai Straits are narrow (minimum width 20 and 40 km, respectively) and therefore require a high horizontal resolution to faithfully represent the strait transport. For example, the width of Lombok Strait is 1 velocity cell (∼110 km) in ACCESS-OM2, 1 cell (∼28 km) in ACCESS-OM2-025 and 2 cells (∼22 km) in ACCESS-OM2-01; Rayleigh drag (Sect. 2.1.4) is used in ACCESS-OM2 to obtain more realistic transport through Lombok and Ombai Straits. Thirdly, the ITF outflow is split between the three main straits flowing first through the Lombok Strait, then the Ombai Strait and finally the Timor Strait. So if 10 more of the water that comes through the Makassar Strait goes through Lombok strait (as in ACCESS-OM2-025), less water will go through the Timor Passage. As a consequence, the resolution of straits is critical for this region, and for this reason the ACCESS-OM2-01 configuration is more appropriate to study the Indonesian Seas.

Pacific Ocean
All three versions of ACCESS-OM2 reproduce the major features of the equatorial Pacific Ocean circulation well compared with observations from Johnson et al. (2002) (Fig. 19), which are in turn similar to measurements from the TAO array on the Equator at 140 • W and 165 • E (not shown). The strength of the Equatorial Undercurrent (EUC) core is within 10% of observations and its latitudinal width at 140 • W is accurate in ACCESS-OM2-025 and ACCESS-OM2-01 but somewhat too 5 wide in ACCESS-OM2. The EUC extends too deeply in both ACCESS-OM2-025 and ACCESS-OM2-01. The strength of the thermocline is well reproduced in ACCESS-OM2 and ACCESS-OM2-025, although in ACCESS-OM2-01 it is slightly too strong. The strong Pacific thermocline in ACCESS-OM2-01 also appears in the zonal mean (Fig. 12) and Atlantic (Fig. 23) and may be a consequence of the lack of background diffusivity (e.g. Meehl et al., 2001) and reduced numerical diffusion.
The vertical temperature gradient above the thermocline appears to be too weak in all three configurations, a bias which may 10 also be linked to the weak vertical shear in the upper EUC. Further, both the northern and southern branches of the South Equatorial Current (SEC, the westward surface-intensified current south of ∼ 5 • N) are too weak in the models. These biases in the SEC and upper EUC may be associated with problems in the turbulent mixing parameterizations in this region, but a detailed sensitivity study has not yet been undertaken. The eastwards North Equatorial Counter Current (NECC) at 7 • N is  include an observational estimate of the climatological mean along the same transect, taken from the gridded WOA13 product, 5 for the period 1985-2013 (Fig. 20g, h). In general, all three model configurations produce a realistic thermal structure in this basin. In particular, the models capture the approximate depth of the thermocline and its inter-hemispheric asymmetry (with the southern hemisphere thermocline being somewhat deeper than in the northern hemisphere), the strong temperature gradients in the Southern Ocean at approximately 55 • S, coincident with the location of the Antarctic Circumpolar Current, and the weak vertical gradients to the north of the ACC in the regions associated with southern hemisphere mode-water production. However,  In contrast to the temperature structure, which was simulated reasonably well by the various models in this suite, the meridional haline structure of the central Pacific is not well simulated. In particular, none of the models reproduce the observed deep salinity minimum in either hemisphere, although there is some suggestion of penetration of relatively fresh waters into the 5 interior at approximately 55 • S and 45 • N in the ACCESS-OM2-025 and ACCESS-OM2-01 configurations. As such, it is likely that the models' representation of Pacific mode and intermediate waters will be affected by the poor representation of the deep salinity structure, which could, in turn, have implications for the local overturning circulation (Thompson et al., 2016).  decaying eastward along the Kuroshio extension. However, the observed variability continues with significant amplitude further along the extension. Peak variability here in ACCESS-OM2-01 matches the observed amplitude (∼ 0.4 m), whereas ACCESS-OM2-025 has a reasonable distribution of variability with reduced magnitude (peak ∼ 0.25 m), and ACCESS-OM2 substantially underestimates this (0.15 m). ACCESS-OM2-01 has variability upstream of the separation point, higher than that observed, where the Kuroshio's 'large meander' appears on interannual timescales (Kawabe, 1995). In ACCESS-OM2-01 5 the barotropic streamfunction has a similar structure to the observational estimate, but seems somewhat weaker in amplitude (white contours in Figs. 21c,d). The directly observed mean Kuroshio transport on the WOCE PCM-1 line between Taiwan and the southern Ryukyu Islands was 21.5±2. 5 Sv between September 1994and May 1996(Johns et al., 2001. Corresponding transports over the same period are close to this value in ACCESS-OM2-025 and ACCESS-OM2-01 (17.5 Sv and 20.1 Sv, respectively), but much weaker (7.6 Sv) in ACCESS-OM2, which is lower than in other models of this resolution under CORE-II 10 forcing (Tseng et al., 2016).

Atlantic Ocean
Accurately representing the horizontal circulation of the North Atlantic is a persistent challenge for ocean modellers. In particular, models commonly fail to simulate a Gulf Stream that separates from the North American coast at Cape Hatteras (35 • N, 35 Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2019-106 Manuscript under review for journal Geosci. Model Dev. Discussion started: 30 April 2019 c Author(s) 2019. CC BY 4.0 License. 75 • W), and a North Atlantic Current that flows north along the east side of the Grand Banks from 40 • to 51 • N (Rossby, 1996;Chassignet and Marshall, 2008). Consistent with other modeling studies (e.g. Bryan et al., 2007), we find considerable improvement in the Gulf Stream transport and separation latitude at eddy resolving resolution (Fig. 22). In ACCESS-OM2 and ACCESS-OM2-025, the Gulf Stream flow is too weak and overshoots the separation latitude by nearly 5 • latitude relative to observations; in ACCESS-OM2-025 the separation latitude is also highly variable. The mean Gulf Stream separation latitude 5 in ACCESS-OM2-01 is correct, but it is more variable than observed, occasionally separating 2-3 • too far north; this appears as excessive SSH variability shortly after separation and broadening of the separated jet in Fig. 22c relative to observations. In addition, the transport is characterized by an overly strong anti-cyclonic recirculation near the separation ( (Meinen et al., 2010). This reduced inertia may contribute to the poor Gulf Stream separation, as seen in idealised experiments by Özgökmen et al. (1997), but more investigation is required. Identifying a numerical recipe that accurately simulates these North Atlantic features remains an ongoing challenge for modellers. This problem is ameliorated in ACCESS-OM2-01, which produces a southern hemisphere thermocline with a 10 • C isotherm that is approximately 800m deeper at 40 • N than 40 • S, which is similar to that obtained from the WOA13 product (∼900m), although we note that the thermocline is deeper in the North Atlantic in the WOA13 observations than in the ACCESS-OM2-01 fields.

30
Similarly, the ACCESS-OM2 and ACCESS-OM2-025 models do not produce the southern hemisphere deep salinity minimum observed at a latitude of approximately 50 • S and at a depth of approximately 1000 m. However, the salinity minimum is reproduced quite well by the ACCESS-OM2-01 model, which is able to capture both the structure and approximate magnitude.
Curiously, while the high resolution ACCESS-OM2-01 is not able to capture the northern hemisphere deep salinity maximum (present in the WOA13 observations at an approximate latitude of 40 • N and an approximate depth of 1100m), both the lower  The Brazil Current flows southward along the western boundary of the South Atlantic Ocean, separating at ∼ 40 • S. The mean surface speed of this current in ACCESS-OM2-025 and ACCESS-OM2-01 is of comparable magnitude to, but weaker than, observations (Fig. 24b-d), and is strongly underestimated in ACCESS-OM2 (Fig. 24a). This underestimation is most clear 5 for the upper 400 m of this current (not shown). Weakening of the Brazil Current is expected at the lowest resolution due to the broadening of the current by the enhanced viscosity near the western boundary. The Malvinas Current, flowing northwards along the boundary south of 40 • S, is highly steered by bathymetry (Fig. 24d, h) and is well represented in ACCESS-OM2-01 ( Fig. 24c, g), including its northward penetration along the shelf break up to 40 • S. The Brazil-Malvinas Confluence mean latitude (∼38 • S, Fig. 24d, h) is captured in ACCESS-OM2-01, but is too far south in ACCESS-OM2-025 and ACCESS-  barotropic streamfunction (contours in Fig. 24h), but is weaker in ACCESS-OM2-01, indistinct in ACCESS-OM2-025 and absent in ACCESS-OM2 (Fig. 24g, f, e), consistent with the ZA being eddy-driven (Dewar, 1998;de Miranda et al., 1999); the poor representation at coarse resolution is associated with significant SST and SSS biases (Figs. 10 and 11). The sea level standard deviation forms a distinctive horseshoe pattern (Fu, 2006) around the ZA in the AVISO product (colours in Fig. 24h), which is partially captured in ACCESS-OM2-01, although at lower amplitude. The sea level variability pattern and amplitude 5 in ACCESS-OM2-025 differ significantly from AVISO, and variability is negligible in ACCESS-OM2.

Indian Ocean
As in previous sections, we plot time-mean transects of potential temperature and salinity across the central Indian ocean  We also assess the annual and seasonal mean variability of the thermocline depth (D20) in the western tropical Indian Ocean (50 • E-75 • E, 5 • S-10 • S), known as the Seychelles Dome (Yokoi et al., 2008;Hermes and Reason, 2008) for 1985-2013 using the 20 • C isotherm proxy. All three model resolutions are able to simulate the basic large-scale annual mean D20 structure of the 10 Indian Ocean (Fig. 26). In the coarse models (Fig. 26a, b), the D20 is deeper than observed (Fig. 26d)   our simulations cover a period of dramatic changes in sea ice (Fig. 27a, b): the Arctic sea ice underwent a drastic decline of the annual minimum extent (e.g. Stroeve et al., 2014), whereas the Antarctic annual maximum sea ice extent ramped up from 2012 to the maximum extent on record in 2014 before decreasing sharply from 2015 to the current (2019) minimum (e.g. Turner and Comiso, 2017). The Arctic sea ice decline during winter has been linked to an anomalous atmospheric circulation pattern bringing increased inflow of warm air masses from lower latitudes and a general polar warming, while during summer the 5 positive feedback via absorption of short-wave radiation into and warming of the ocean mixed layer contributes to the loss of summer sea ice in the Arctic (e.g. Stroeve and Notz, 2018). The interannual variability in the Antarctic sea ice extent has been attributed to a combination of thermodynamics (likely driven by increased glacial melt; Bintanja et al., 2015) and dynamics (e.g., Holland and Kwok, 2012;Schlosser et al., 2018). These observed changes in sea ice extent are closely tracked by the ACCESS-OM2 suite at all resolutions in both hemispheres (Fig. 27a, b) and are also reflected in sea ice volume (Fig. 27c,   10 d). Like CORE-II (Large and Yeager, 2009), the JRA-55 reanalysis (Kobayashi et al., 2015) incorporates observed sea ice concentration; however JRA-55 treats regions with <55% ice concentration as ice-free, and regions exceeding this threshold as 100% sea ice (unlike CORE-II, which combines ocean and ice fluxes in proportion to their concentration). We speculate that this hard ice edge causes a stronger imprint of the observed sea ice in the JRA55-do atmospheric state (e.g., reducing the 10 m air temperature over ice), which then drives the modelled ice concentration to a state resembling the observations. We 15 now assess the quality of the spatial distribution of sea ice in the two hemispheres.

Arctic Ocean
The Arctic sea ice biases in the ACCESS-OM suite appear considerably smaller than in most of the CORE-II-forced models investigated by Wang et al. (2016). The mean annual cycle of sea ice extent is close to the observed estimate ( Fig. 27e) but seems to decline slightly too slowly in late spring. At all resolutions the simulated 1993-2017 monthly mean Arctic sea ice extent agrees well with observational estimates in most regions, as does the monthly mean concentration (contours and colour in 5 Fig. 28b, c, e, f); however, some issues that warrant further investigation have been identified. At all resolutions the simulations exhibit a broad zone of sparse sea ice concentration in the eastern Sea of Okhotsk and southeast of Fram Strait at the March maximum, neither of which is observed (Fig. 28b, c). In ACCESS-OM2-01 the Canadian Archipelago, Central Arctic, and Siberian coast exhibit slightly excessive ice concentration during March, whereas in Baffin Bay simulated ice concentration is low compared to observations. These biases are also present at coarser resolution, although the Central Arctic bias is weaker closely matches observations (apart from some excess coverage in eastern Siberia), but the overall concentration is slightly too low in ACCESS-OM2-01 (Fig. 28e, f) and in the coarser resolutions (not shown). In broad agreement with results from the Pan-Arctic Ice Ocean Model and Assimilation System (PIOMAS, Zhang and Rothrock, 2003), the thickest ice in our runs is found close to the Canadian Archipelago (Fig. 28a, d). However, rather than being transported northward through the Beaufort Gyre and eventually out of the Central Arctic via the transpolar drift, much of the sea ice in our simulation remains within the 5 Canadian Arctic. Investigation of the causes of these biases is beyond the scope of this paper, but possible contributing factors include SST biases (Fig. 10), issues regarding the modelled mixed-layer depth, and bias in the 0.1 • initial condition.
Finally, we note that the sea ice in the 0.1 • simulation displays many long, narrow, linear zones of low ice concentration and high strain rate (not shown) which open and close on timescales of days, largely in response to varying wind stress. These lead-like linear kinematic features are too narrow to resolve with existing satellite passive microwave instruments (Lemieux 10 et al., 2015) and are characteristic of high-resolution models with EVP rheology (e.g. Hutchings et al., 2005) as a result of strain localisation. While their spatiotemporal scaling may only be partially correct at this resolution (Hutter et al., 2018), we consider the presence of these lead-like features to be an improvement over the very smooth fields obtained in coarser EVP models. Linear kinematic features are also evident in the Antarctic ice, but are less ubiquitous than in the Arctic.

15
At all resolutions the modelled 1993-2017 mean annual cycle of Antarctic sea ice extent closely matches observational estimates, although ice growth appears to occur slightly more rapidly than observed (Fig. 27f). The spatial structure of the modelled 1993-2017 mean Antarctic spring maximum sea ice extent agrees well with that derived from passive microwave observations at all resolutions, and is particularly realistic in the 0.1 • simulation (contours in Fig. 29b, c). During Antarctic winter and early spring the simulated sea ice concentration is too high near the coast at 0.1 • , and a fraction too low in the wider pack-ice 20 zone (Fig. 29b, c); at coarser resolutions the concentration becomes smaller in both regions, reducing the positive bias near the coast but increasing the negative bias in the outer pack (not shown). We note that the high-concentration coastal ice in the model is very thin (Fig. 29a), with most of the ice cover in the thinnest thickness category (<0.64 m; not shown) and high frazil production (not shown), consistent with the presence of newly forming sea ice in coastal polynyas. Passive microwave products are known to underestimate the concentration of thin ice, such as in polynyas or marginal ice zones during autumn 25 and winter (e.g. Meier et al., 2014;Ivanova et al., 2015), suggesting that the discrepancy with the model may be partly due to a bias in the observational estimate.
The simulated annual minimum sea ice extent is much too low in the Weddell Sea and most of East Antarctica, and too high in the Ross Sea, at all three resolutions (contours in Fig. 29e, f). This is associated with low concentrations at all resolutions in all regions other than the northern Ross Sea; this bias is also typical of a wide variety of models driven by CORE-II forcing 30 (Downes et al., 2015). The thickness distribution also reveals broad regions of very low-concentration sea ice extending well beyond the model's 15% concentration contour (Fig. 29d). This ice is mostly in the thickest two categories (>2.47 m), i.e., second-year ice. Sea ice concentration builds up rapidly from its low minimum, reaching realistically high values in the outer pack by May-June, but the concentration then declines early relative to microwave observations, apparently preconditioning resolution, with the goal of informing which aspects of the model can be used for different research or operational objectives.
This approach also has the benefit of providing a benchmark for future developments.
In general terms, the model does a good job of representing many features of the ocean, particularly at high resolution.
Historical sea ice extent trends are well-represented, and the surface properties and transects in each ocean basin compare well with the observational record. The large scale overturning circulation, flow through the Indonesian archipelago and patterns 5 of boundary currents are generally realistic, supporting the notion that this suite of models is competitive with similar models from other institutions. Areas for improvement include the Gulf Stream behaviour and associated North Atlantic SST and SSS biases, the weaker than observed Drake Passage transport and the weak AMOC in the 1 • configuration. In addition, more work is needed to understand differences between observed and modelled meridional heat transport.
One feature of the model evaluation exercise was to highlight a general improvement of many model metrics at higher 10 resolution. In particular, Southern Ocean water masses, the Antarctic shelf region, the overturning circulation and the western boundary current regions are all much improved in ACCESS-OM2-01 compared with the coarser resolutions. However, the highest resolution model also has the weakest Antarctic Circumpolar Current transport, has an overly strong Equatorial Pacific thermocline and continues to have biases in western boundary current regions, despite the high resolution. These features will continue to be investigated.

15
A feature of modern model development is the continuous and collaborative process of building software. A version of ACCESS-OM2 has been frozen to enable other users to replicate these simulations (see code availability below) but it will be continuously developed. These incremental improvements will be publicly available via the model GitHub site, allowing users to both adopt and contribute to the future evolution of the model.