Multi-grid algorithm for passive tracer transport in the NEMO ocean circulation model: a case study with the NEMO OGCM (version 3.6)

Ocean biogeochemical models are key tools for both scientific and operational applications. Nevertheless the cost of these models is often expensive because of the large number of biogeochemical tracers. This has motivated the development of multi-grid approaches where ocean dynamics and tracer transport are computed on grids of different spatial resolution. However, existing multi-grid approaches to tracer transport in ocean modelling do not allow the computation of ocean dynamics and tracer transport simultaneously. This paper describes a new multi-grid approach developed for accelerating the computation of passive tracer transport in the Nucleus for European Modelling of the Ocean (NEMO) ocean circulation model. In practice, passive tracer transport is computed at runtime on a grid with coarser spatial resolution than the hydrodynamics, which reduces the CPU cost of computing the evolution of tracers. We describe the multi-grid algorithm, its practical implementation in the NEMO ocean model, and discuss its performance on the basis of a series of sensitivity experiments with global ocean model configurations. Our experiments confirm that the spatial resolution of hydrodynamical fields can be coarsened by a factor of 3 in both horizontal directions without significantly affecting the resolved passive tracer fields. Overall, the proposed algorithm yields a reduction by a factor of 7 of the overhead associated with running a full biogeochemical model like PISCES (with 24 passive tracers). Propositions for further reducing this cost without affecting the resolved solution are discussed.

1 Introduction ties usually underestimate energy and variance with respect to observations at scales smaller than 5 to 10 times the grid spacing.
This observations is formalized with the notion of effective resolution introduced by (Skamarock (2004)). The effective res-50 olution can been seen as the smallest scale that is not affected by numerical dissipation (Soufflet et al. (2016)). In contrast, numerical resolution refers to the typical resolution of the discrete grid used for numerical integration. In a series of sensitivity experiments, (Lévy et al. (2012)) shows very little difference in simulated tracer distribution if tracer transport is computed with velocity field where scales smaller to 5-10 dx have been filtered.
Because the existing implementation of multi-grid tracer transport for NEMO ocean model is not meant to be run at the 55 runtime with the physical ocean component, its application is strongly limited by the I/O requirements and storage capacity.
Indeed, the method proposed by Aumont et al. (1998) is not a priory designed for coupled ocean biogeochemical models but rather restricted to offline applications where the biogeochemical model is using output from a previous ocean circulation model integration. With this algorithm, data from the ocean circulation model should be output and stored on a local file system.
The frequency of the coupling between ocean dynamics and the biogeochemical model is therefore limited by the frequency 60 of available ocean model output. The I/O and storage capacity required for such applications is a strong limiting factor with existing climate and operational systems. It is all the more problematic with the foreseen increase in spatial resolution of ocean components of climate and operational systems. This paper describes a new multi-grid algorithm for tracer transport, its implementation in NEMO v3.6 and documents its performance in a series of global ocean model experiments. We describe in particular detail what specific choices have 65 been made for the representation of vertical diffusive fluxes. The algorithm is then tested in a series of global ocean model simulations of varying complexity.

Evolution equation for a passive tracer
The objective of the multi-grid algorithm is to solve the evolution equation for a passive tracer on a grid in which horizontal spatial resolution is coarser than that of the associated hydrodynamical model. The passive tracer grid dimensions and hydro-70 dynamic variables used in this equation need to be computed from the dynamic grid. In this section, we present the evolution equation for a passive tracer in order to identify which terms are provided by the hydrodynamical model and which terms are computed by the transport module of the biogeochemical model itself.
Following the formalism of the heat conservation equation from (Madec (2008), equation 2.1.d in section 2.1.1 ), the evolution of a biogeochemical tracer can be written as: The influence of sub-grid-scale processes D can be decomposed in a horizontal part D l and a vertical part D v : represents the contribution of the lateral part along geopotential surfaces, R being a tensor of iso-neutral slopes, A l the lateral eddy diffusivity coefficient and i, j the grid points horizontal indices, while represents the contribution of the lateral part along the vertical direction (first term on the right hand side) and the effect of 85 vertical subgrid physics (second term on the right hand side) with A v the vertical eddy diffusivity coefficient and k the grid points vertical indice.
In order to compute Equation 1 in a multi-grid environment, we first need to be able to apply the ∇ operator on the coarsened grid, which requires to define adequately the coarsened grid coordinates and dimensions. As a second step, we have to design an operator to compute the terms which are coming from the dynamical model, such as the ocean velocity − → V , the lateral and 90 vertical diffusivity coefficient A l and A v , the iso-neutral slopes R and the surface forcing terms F on the coarsened grid. Other terms of Equation 1 (tracer tendency, sinks and sources) are computed by the biogeochemical model itself, hence provided on the coarsened grid directly.

Description of the algorithm
The multi-grid algorithm described below is suitable for Arakawa C-type grid and z-level ocean models. Some adjustments 95 could be performed to adapt it to B-grid ocean models and/or models using different vertical discretizations, but those are not discussed here.

Definition of the coarsened grid
The first step consists of defining the coordinates, the cell dimensions and the land-sea mask of the coarsened grid based on those of the finer grid. The finer grid used in the ocean dynamics and thermodynamics component is called HR hereafter and 100 the coarser grid in the passive tracer transport component is called HRCRS hereafter. The horizontal coarsening factor used here is 3, a choice that will be discussed thereafter (section 6).
As the equations are solved on an Arakawa C-type grid (Figure 1), temperature, salinity, pressure and other scalar variables are computed at the cells center T while the velocities are computed at the center of each face of the cell ( U and V ), normal to the corresponding velocity component.
105 Figure 1. Schematic diagram illustrating the definition of the coarsened grid for a coarsening factor of 3 (ocean high-resolution grid is to the left, coarsened grid for biogeochemical model to the right) and localization of tracer (T, blue dots), zonal velocity (U, green triangles) and meridional velocity (V, red triangles) points in both grids. Grid sections that are common to both grids are colored in green.

Coordinates
In order to build an HRCRS cell, we take a 3 × 3 square of HR cells ( Figure 1) and colocate the HRCRS T,U and V grid points at the same location as the corresponding HR grid points. As a result, the definition of HRCRS grid coordinates can be seen as a sub-sampling of HR grid coordinates.

110
Defining the horizontal dimensions of the coarsened grid cells is necessary to compute gradient or divergence operators.
For a coarsening factor 3, HRCRS zonal cell size is computed by summing 3 consecutive HR zonal cell sizes along the zonal direction, and HRCRS meridional cell size is computed by summing 3 consecutive HR meridional cell sizes along the meridional direction ( Figure 2).

115
Defining the land-sea mask for HRCRS is the most strenuous task of the multi-grid algorithm, which requires ad-hoc verification as it can lead to substantial errors in the final outputs. If there is at least one ocean T-point in the HR 3 × 3 pad, then the corresponding HRCRS T-point will be considered as ocean, which can be defined automatically ( Figure 3, top panel). At the interface between two coarsened grid cells,we carefully retain the shape of the HR mask by adjusting the mask at U-or V-points: when there is at least one ocean point in the HR interface (as in Figure 3, middle panel), then the interface of the

Vertical dimensions
The vertical dimension of cells at T-points (usually named e3t in NEMO) are used in vertical divergence and laplacian operators 125 throughout the biogeochemical code. For the divergence operator, it is important that the coarsening procedure preserves the HR grid volume. Hence the vertical dimension of HRCRS grid cells e3t HRCRS is defined as the sum of the volume of the corresponding HR grid cells divided by the HRCRS grid cell horizontal surface area: as illustrated in Figure 4 (top panel). Note that we assume that HR grid points which are land-masked have their vertical 130 dimension equal to 0.
For the vertical gradient operator, the coarsening procedure must preserve the HR grid thickness. Therefore, we define another vertical dimension for HRCRS grid cells, e3tmax, as the maximum of the vertical dimensions of the corresponding HR grid cells:

Definition of coarsening operators
Once the coarser grid variables have been defined, the second step of the multi-grid algorithm consists of defining the operators used to coarsen the dynamic fields necessary to solve the passive transport equation.

Coarsening of state variables
In order to preserve conservation properties through coarsening operations, the intensive state variables (temperature, salinity) of the hydrodynamic model are coarsened such as their volume integrated quantities is the same on the finer and coarser grid.
Therefore, the operator is a weighted mean over the 3 × 3 pad of HR cells corresponding to each HRCRS cell, where the weights are defined as the volume of the HR grid cells: 145 In order to preserve the non-divergence of velocities on the coarser grid, HRCRS velocities are computed so as to conserve horizontal fluxes at the edges of the cells. Hence, the horizontal velocities are coarsened using a weighted mean along the edges of the 3 × 3 pad of HR cells corresponding to each HRCRS cell, where the weights are defined as the area of the vertical faces of the HR grid cells: 150 where e12 HR = e2u HR , e3x HR = e3u HR and l = j for zonal velocities, and e12 HR = e1v HR , e3x HR = e3v HR and l = i for meridional velocities. Note that it is the effective horizontal velocities that are coarsened to HRCRS, ie including eventual sub-grid-scale parametrizations, rather than the eulerian velocities.
Vertical velocities and surface fluxes are coarsened using a weighted mean over the 3 × 3 pad of HR cells corresponding to each HRCRS cell, where the weights are defined as the horizontal area of HR grid cells:

Quantities related to the equation of state
Quantities related to the equation of state have a substantial impact on the model stability, via stratification. Therefore, it is necessary to define conscientiously the coarsening procedure of those, so as to prevent the multi-grid algorithm from generating spurious numerical instabilities.

160
The N 2 buoyancy frequency in the HRCRS domain is computed by coarsening the HR N 2 buoyancy using the mean operator (6), a solution that is considerably cheaper than computing HRCRS N 2 in the HRCRS domain. In parallel, the density in the HRCRS domain is computed from the temperature and salinity fields in the HRCRS domain. Finally, both the buoyancy frequency and density in HRCRS domain are used to compute the HRCRS slopes for neutral surfaces R. This ad-hoc solution (note that there is no well defined coarsening operator for a slope) was found to yield less numerical instabilities than others.

Sub-grid-scale vertical processes
Vertical processes affecting passive tracers are predominantly led by vertical mixing thus are very sensitive to its amplitude.
The vertical mixing coefficient A v has minimal background values of the order of 1.e-5 m 2 .s − 1 or less, but can reach up to 10 m 2 .s − 1 or more at sporadic grid points (when static instability occurs and the parametrization for convection is activated, for example). As a consequence, using a simple weighted mean of the HR A v as coarsening procedure, is expected to be 170 inappropriate. Indeed, with such a method, HRCRS A v would be exclusively reflecting occurrences of strong values of HR A v , whether they are adjacent to lower ones or not. Henceforth, we define 6 different operators to coarsen the vertical mixing coefficient and test the sensitivity of the multi-grid algorithm to those (see section 5.3.1): the MIN and MAX operators, where the coarsened A v is the minimum or, respectively, the maximum of the HR A v over the 3 × 3 pad of HR cells corresponding to each HRCRS cell;

175
the MEAN operator, where the coarsened A v is computed by a weighted mean in a 3 × 3 pad of HR A v (ie the method expected to be inappropriate); the MEDIAN operator, where the coarsened A v is the median value in a 3 × 3 pad of HR A v ; the MEANLOG operator, where the coarsened A v is computed by a weighted mean in logarithmic space in a 3 × 3 pad of HR A v .  ...
Step ... To avoid the duplication of coarsened and non coarsened grid and dynamic variables in the passive transport routine, the pointers target the coarsened grid and dynamic variables when the multi-grid algorithm is activated and the non coarsened The multi-grid algorithm has been implemented in NEMO GCM (Madec (2008)) version 3.6 in which the OPA ocean model is coupled to LIM3 sea ice model (Rousset et al. (2015), Vancoppenolle et al. (2009)) and its passive tracer transport component TOP.

205
Three different global ocean configurations have been employed to assess the multi-grid algorithm: the HR configuration, where the ocean hydrodynamic (ie open ocean and sea ice) and the passive tracer components are both running at 1/4 • resolution. This is the reference experiment using the highest possible resolution and no multi-grid algorithm for passive tracers.
the LR configuration, where the ocean hydrodynamic and the passive tracer components are both running at 3/4 • res-210 olution. This experiment provides insights on the impact of horizontal resolution with still no multi-grid algorithm for passive tracers.
the HRCRS configuration, where the ocean hydrodynamic component is running at 1/4 • resolution while the passive tracer is running at 3/4 • resolution. This experiment corresponds to the standard use of multi-grid algorithm for passive tracers.

215
It is important to state that it is not expected that the HRCRS configuration yields exactly the same results as the HR or LR configurations. Indeed, as there is some information missing in the coarsening procedure of ocean hydrodynamics from HR to HRCRS, the passive tracer distribution in HRCRS is not expected to be strictly the same as that in HR configuration. On the other hand, because the hydrodynamics in LR is running at coarser resolution than that in HRCRS, it is not expected that the passive tracer distribution in HRCRS is the same as that in LR. Yet, the multi-grid algorithm will be proven useful if the 220 results of HRCRS are closer to HR than those of LR. Hence the present experimental protocol should not be seen as a direct validation of the multi-grid algorithm, but rather as an assessment of its potential benefits and limitations.

Model configurations
The 1/4 • configuration is described in Barnier et al. (2006). The 3/4 • configuration coordinates and bathymetry are built by applying on 1/4 • configuration coordinates and bathymetry the coarsening operators described above. This operation has 225 created some fake straits in Panama and the Indonesian Throughflow, which have been manually filled. The 3/4 • configuration is close to the global Ocean ORCA 1 • configuration used in various CMIP5 (Voldoire et al. (2013)) and CMIP6 models (Voldoire et al. (2019)).
The initial state for the ocean is at rest, with temperature and salinity from WOA13 (World Ocean Atlas / www.nodc.noaa.gov).
At the surface, the model is forced by ERAinterim atmospheric reanalysis (Dee et al. (2011)) using the CORE bulk formulae 235 (Large and Yeager (2004)) to compute the turbulent air-sea fluxes. A monthly runoff climatology is built and applied to all configurations, using data for coastal runoffs and major rivers from Dai and Trenberth (Dai and Trenberth (2002)). A split-explicit time-splitting scheme is employed to compute the surface pressure gradient with a non-linear free surface (Levier et al. (2007)). These parametrizations were already used in the Irish-Biscay-Iberian regional configuration, described in Maraldi et al. (2013).

240
The momentum advection term is computed with the energy and enstrophy conserving scheme proposed by Arakawa and Lamb (Arakawa and Lamb (1981)). The advection of tracers (temperature and salinity) is computed with a total variance diminishing (TVD) advection scheme (Lévy et al. (2001), Cravatte et al. (2007)).
Lateral eddy viscosity is computed with a biharmonic operator and lateral eddy diffusivity is computed with an isopycnal laplacian operator for both active and passive tracers. As explained in Madec (2008) (section 9.1), lateral eddy diffusivity 245 coefficient A l decreases proportionally to the grid size, while the lateral eddy viscosity coefficient B l decreases poleward as the cube of the grid cell size: where A l o and B l o are the respective values at the Equator (see table Table 1) while e max is the maximum of e1 and e2 taken over the whole masked ocean domain located at the Equator for global configurations. The lateral eddy diffusivity coefficients shown here are used for active tracers when the multi-grid algorithm is activated or not, and used for passive tracers when the grid multi-grid algorithm is not activated; the choice of lateral eddy diffusivity for passive tracers when the multi-grid algorithm is activated, is discussed later (subsection 5.2).

255
The vertical eddy viscosity and diffusivity coefficients are computed from a TKE turbulent closure model (Blanke and Delecluse (1993)), with parameters as described in Reffray et al. (2015).

Numerical experiments
To assess the multi-grid algorithm for passive tracers, three types of experiments have been set up. They differ on the passive tracer initial state or evolution. In all cases, the dynamical component is the same. The comparisons of the simulated tracer In the PATCH experiments, the multi-grid algorithm is assessed on the horizontal component of the transport equation (1).
Here, the passive tracer vertical diffusivity component is set to 0 (A v = 0. ). The passive tracer C is initialized with a patch of 10 • radius centered on 60 • West / 36 • North (see top left panel in Figure 6). The value of the tracer ranges from 2 at its center 265 to 1.368 at its boundaries and is vertically uniform. Outside of the patch, the tracer initial value is set to 1. The tracer has no sources nor sinks. Hence the passive tracer evolution in the PATCH experiment can be described by the following equations: where R is a tensor of iso-neutral slopes.
The AGE-ZDIF experiments are conceived to assess the multi-grid algorithm on the vertical eddy diffusive component of 270 the passive tracer transport equation (1). Hence the advection and lateral diffusion are unplugged for the passive tracer. This experiment makes use of the so-called Age tracer of NEMO GCM: its initial value is set to zero in all the ocean and its value is damped to 0 at the ocean surface while there is a net source in the ocean interior. This tracer gives a good representation of the time evolution of a water mass, after it was in contact with the Ocean surface; hence it gives some information on the ventilation of water masses.The tracer evolution in the AGE-ZDIF experiment can be described by the following equations: Finally, the multi-grid algorithm is assessed on the full tracer transport equation (1) in the AGE-FULL experiment which also uses the Age tracer, but retaining all terms for the tracer transport equation. Hence the tracer evolution in the AGE-FULL experiment can be described by the following equations:

Coarsened velocities
Advection by ocean currents is one of the main drivers of the evolution of passive tracers. As a consequence, it is important that tracer transport by ocean currents on the coarsened grid HRCRS remains close to that on the HR grid. This implies that coarsened velocities share similarities with HR velocities on the spatial scales that are common. To test this, coarsened 285 velocities are compared to HR velocities and LR velocities (Figure 5), as done by Lévy et al. (2012) in a more idealized context.
Power spectra are computed along the horizontal coordinates of the model grids. For each spectrum, the minimum wavelength corresponds to twice model grid spacing: 2 * ∆x. HRCRS velocities are the result of the coarsening of HR velocities on LR grid. As HR grid is finer than LR grid, HR velocities contains finer scales than HRCRS and LR.
In terms of kinetic energy, vertical velocities and relative vorticity, HR, HRCRS and LR have almost the same level of 290 energy at spatial scales larger than 250 km. At smaller spatial scales, HRCRS has a level of energy comparable to HR in terms of kinetic energy and relative vorticity while vertical velocities are less energetic than in HR. For the 3 quantities, at spatial scales smaller than 250 km, the level of energy of LR is significantly below HR and HRCRS levels. This suggests that ocean currents in HRCRS are, overall, more similar to those in HR than to those in LR.

Horizontal tracer transport 295
The evaluation of the multi-grid algorithm to resolve an advection-diffusion problem is shown here with the results of the PATCH experiment described in subsection 4.2.
The first step is to evaluate the best value for the passive tracer horizontal diffusion coefficient in the HRCRS configuration.
A comparison between HR and various HRCRS configurations differing for their horizontal diffusion coefficient (set to 300, 600, 900 and 1200 m 2 /s) is presented in Appendix A. This sensitivity test suggests that 900 m 2 /s yields the best results for 300 HRCRS. Using 300 m 2 /s in HR and 900 m 2 /s in HRCRS for the passive tracer lateral diffusion respects the ratio in spatial resolution between the two configurations (of respectively 1/4 • and 3/4 • nominal resolution), which is consistent with the linear relationship between the horizontal diffusion coefficient A l and the cell size (see Equation 9).
Using this value for the lateral diffusion coefficient A l in the HRCRS configuration, and comparing outputs to those from HR and LR configurations, suggests that the HRCRS patch is closer to the HR shape than to the LR one ( Figure 6). Indeed,

305
LR has a larger area where its concentration is higher than in HR, with a difference greater than 0.3 in its center, while the difference between HRCRS and HR does not exceed 0.2 (within the red box, after one year of simulation, RMSE is 0.04 between HRCRS and HR and 0.09 between LR and HR).
Horizontal velocities influence the interior EKE distribution and so the tracer distribution. In Figure 7, we represent for a meridional section in the Gulf stream the daily vertical distribution of zonal velocities and PATCH tracer after one year of 310 simulation, for the HR, HRCRS and LR configurations. The HRCRS velocities are obtained by coarsening the HR velocities.
HR and HRCRS zonal velocities are stronger than LR velocities and present more fine structures, in particular in the northern region ( above 40 • North). HRCRS and HR tracer distribution are similar and present some differences to LR configuration. Black lines in spectra represent usual slopes to be compared with numerical spectra. All computations use daily model outputs and yearly averaged.
In the southern part of the 55 • W sections, LR configurations has a maximum of concentration, whereas HRCRS and HR have lower values. In the northern parts of the section, HR and HRCRS have some higher values of concentration in the interior and 315 deeper ocean than in LR configuration, around 40 • N latitude. Despite the degraded spatial resolution operated on the velocities, the dynamics used to transport HRCRS tracer present the same pattern as the HR dynamics. The tracer using the multi-grid algorithm reproduces the HR tracer; while the dynamics is coarsened, the HRCRS tracer benefits from the higher resolution dynamics and present a gain compared to a lower resolution experiment.     HRCRS with MIN operator has a negative flux divergence at the bottom, but its Age value is closer to HR.

Special case for convection
Here we assess the multi-grid algorithm at simulating the age tracer penetration for deep convection as in the Ross and Weddel seas. We want to know if the operators selected in the previous section (MEANLOG and MEDIAN) are appropriate for this 345 particular situation. Figure 9 represents the depth where the Age tracer reaches the value of 10 days (top) and 100 days, along a section in the Austral Ocean, at the latitude of 58.3 • south. On the two plots on the left, we compare the HR profile (black), the LR experiment (grey) and the ensemble spread of the different solutions of HRCRS (light blue). The HR profile is included in the HRCRS ensemble, whereas the LR experiment is outside of the HRCRS spread. All the HRCRS solutions have better performances 350 than the LR solution compared to HR. However, the spread is larger in two areas, between 140 • W and 80 • W and between 100 • E and 160 • E.
The 2 plots on the right of Figure 9 shows the differences between each HRCRS solutions and HR. We can see that, outside of the 2 areas of deep convection, the HRCRS solution performed with MEANLOG and MEDIAN operator are the closest to the HR solution, especially in the deeper case ( bottom right figure). Around 100 • E and 150 • W, the HRCRS solution with the In order to combine the better performance of MIN operator in the convection situation and MEANLOG operator in the global Ocean , we add here a new operator based on MEANLOG operator with a switch to MIN operator in presence of convection. The HRCRS run with MEANLOG operator and a switch to MIN operator gives the best comparison to the HR run in the Weddel and Ross seas areas and have good performances outside.

Evaluation of the full algorithm
We have validated the multi-grid algorithm to represent the tracer advection and lateral diffusion in section 5.2 and the vertical eddy diffusion in the section 5.3. The choice of the operator to coarsen the vertical diffusion coefficient A v has also been discussed in the section 5.3. Now we present some results for the full transport equation (1). We decide to continue the assessment using the MEANLOG operator to coarsen the vertical diffusion coefficient.

365
A comparison of the depth for HRCRS and LR runs to HR run is presented in Figure 10. The upper figure presents the HR depths where Age is equal to 100 days. The second figure presents the depth differences between LR and HR and the last figure the depth differences between HRCRS and HR. Higher depths values are due to deeper Age tracer penetration, caused  Table 3. RMSE with HR solution, after one year of simulation. by higher vertical mixing. Globally, there are no big differences between the 3 runs, except in the area were mesoscale activity is important: the Gulf Stream, the North Atlantic subpolar gyre, the Kuroshio and the Austral Ocean. The differences between 370 HRCRS and HR runs are largely weaker than the differences between LR and HR runs.
In order to compare on depth the three different runs, we present in Figure 11 the Age tracer penetration along 3 sections. The first section starts in ACC and goes northward until North Atlantic. For a value of 10 days, the penetration for the 3 experiments are close. For a value of 100 or 300 days, HRCRS is very close to HR run and LR is different from the 2 others experiments.
The second section goes along the Equatorial Pacific. For the 3 values of Age Tracer, the corresponding depth from HR and 375 HRCRS are very close and the depths from the LR experiment are different. The last section is inside the Antarctic Circumpolar Current. HRCRS and HR depths are similar for 10, 100 and 300 days. LR run depths are similar to the 2 other runs for 10 days, and the differences with the 2 other runs increase with depth.
The Age tracer RMSE with HR solution has a value of 0.91 for HRCRS and 2.30 for LR, after one year of simulation (Table 3).

Computational performance
Because of the extra computational cost of the coarsening of dynamical fields, the multi-grid algorithm procedure gives better performance only with large numbers of passive tracer fields. Indeed, the expected speed-up for computing tracer advection on a 3x3 coarser resolution grid is a priory a factor 9. However, in practice, the time spent for defining the grid and computing dynamical fields on the coarsened grid should also be accounted for. We present in Table 4 the elapsed time spent by the HR 385 run (without grid coarsening capacity) and HRCRS run (with grid coarsening capacity). The elapsed time spent defining the scale factors for coarsened grid and computing the dynamic fields on the coarsened grid is actually significant. This is why, for a single tracer field, the HRCRS run (with grid coarsening capacity) uses more CPU resources than the HR run. But it should be noted that the definition of the coarsened grid and the coarsening of dynamical fields are only done once for all the tracer fields. Therefore the net overhead cost of this extra operation is relatively small in situations with many tracer fields (as shown 390 in the last row of Table 4).
Overall, we estimate that using the multi-grid algorithm allows us to reduce the cost of running a full ocean model with an interactive biogeochemical component by a factor~3. Table 5 provides an estimate of the breakdown of elapsed time in a global NEMO configuration coupled to PISCES biogeochemical model (Aumont et al. (2015)), which simulates 24 passive tracers. Typically, running NEMO biogeochemical transport module (TOP) for 24 tracers on the same grid as the dynamics  Table 4, we estimate that the net elapsed time overhead of the biogeochemical component (transport + PISCES) would be reduced by a factor~7 with the multi-grid algorithm. Therefore, a full model using our multi-grid algorithm would use 1.45 times resources 400 than the ocean/sea-ice component alone, therefore reducing the cost of the overall system by a factor 3.

Limitations and future perspectives
A key limitation of the multi-grid algorithm described in this paper is its restriction to odd coarsening factors. As shown in Figure 12, two possibilities would exist for defining an averaging procedure in the case of even coarsening factors but none of them would be fully amenable for defining a robust coarsening procedure. On the one hand, Figure 12 shows that there is no 405 obvious definition for lateral fluxes at the cell boundaries if the cell resulting from the averaging of several T-cells is centered at a T-point. On the other hand, a consistent treatment of north-fold boundary condition in the HR and CRS grids requires that both grids share the same pivot point at the north-fold boundary (as described in (Madec (2008)), see in particular Fig. 8.4). overhead of BGC 3. 0.45 6.7 total 4. 1.45 2.8 Table 5. Typical expected speedup in elapsed time between a run without versus with multi-grid algorithm. Values have been extrapolated from the measured cost of the coarsening operation and an estimate of the net overhead of PISCES biogeochemical model. This is why it is not possible to use an averaging procedure where the cell resulting from the averaging of T-cells is centered at a F-point.

410
The present implementation is even more limited since only a coarsening factor of 3 is allowed. This limitation is related to constraints associated with multiprocessor applications. Indeed using a larger coarsening factor would require increasing the width of MPI overlapping areas (halo ghost cells for data exchanges with neighboring processors). This is illustrated in Figure 13, which shows that using a spatial coarsening factor of 5 would require extending the MPI overlapping area with an extra grid point in each direction. But, as for NEMO3.6, the width of MPI overlapping areas is limited to one grid point in each 415 direction. A natural perspective of this work would be to add time sub-sampling. Indeed, in its present form, our multi-grid algorithm uses the same time-stepping for both the HR and CRS grids. But the integration on the CRS grid could probably be carried out with a larger time step which would further reduce the computational cost of tracer advection. As described in (Madec (2008)) section 3.2 and (Leclair and Madec (2009)), NEMO currently uses a leap-frog time-stepping scheme with a Robert-Asselin 420 filter. In this context extending our algorithm to the discretization in time is non-trivial because it would require keeping in memory several consecutive occurrences of the prognostic variables and grid cells volumes. Allowing time coarsening would be greatly simplified with a 2 level time-stepping scheme in NEMO (which is planned for NEMO v5.0).

Conclusions
An algorithm based on multiple grids has been developed and implemented in NEMO 3.6 ocean model for accelerating the 425 integration of ocean biogeochemical models. The core of the algorithm is to compute the evolution of biogeochemical variables on a grid whose resolution is degraded by a factor 3x3 with respect to the dynamical fields. We have described in detail the operators that allow the switch from the high resolution grid (HR) to a coarser resolution grid (HR) and how to define optimally the evolution operators on the coarse grid based on information at high resolution. We have in particular described how several vertical scale factors should be introduced and described the different options for the treatment of vertical physics on the coarse 430 grid. A series of numerical tests performed under realistic conditions has been carried out for identifying how to optimally represent vertical mixing on the coarser grid.
The solutions computed with the full algorithm have been compared to solutions obtained using a high resolution grid for both dynamics and passive tracers (HR configuration, 1/4 • resolution), and solutions obtained using the lower resolution grid for both dynamics and passive tracers (LR configuration, 3/4 • resolution) in a series of global ocean model experiments. We 435 have shown that the proposed method provides solutions at global scale that are notably improved as compared to LR solutions and comparable to HR solutions. This confirms our working hypothesis that fluctuations of dynamical quantities close to the HR grid size have negligible impact on tracer transport.
The evaluation of computational performances shows that the multi-grid approach does not reduce the computing time in the case of a single passive tracer because of the overhead associated with the definition and computation of dynamical quantities 440 on the coarse grid. However, the reduction of elapsed time might be substantial when the algorithm is applied to multiple tracers as in the case of comprehensive biogeochemical models (an estimate has been provided but not proven). The proposed algorithm allows us to reduce by a factor 7 the overhead associated with running a full biogeochemical model with 24 tracer fields in NEMO simulations.
Several possible directions for further improving the performances of the algorithm have been identified but they may require 445 important changes to NEMO code. Increasing the width of MPI overlapping areas in NEMO would allow us to increase the spatial coarsening factor (now limited to 3 in the present version). In addition, the use of more selective coarsening operators (Debreu et al. (2008)) would bring the coarsened solution even closer to the uncoarsened solution. Their larger spatial stencil would however bring similar issues as for the coarsening factor limitation in a multi-grid processor environment. Extending our approach to the discretization in time is also a natural direction. Using a 2 level time-stepping scheme instead of the leap-frog 450 time-stepping scheme currently used in NEMO would greatly simplify such a development.
The algorithm we propose in this article makes it possible to run global applications of complex ocean biogeochemical models at eddying resolution over several decades. By reducing the computational overhead of running complex biogeochemical components in high resolution ocean models, our algorithm opens the possibility to explicitly account for the impact of mesoscale ocean dynamics on ocean biogeochemistry in operational and climate models. The benefit of this approach in the 455 context of climate modelling has recently been illustrated by Berthet et al. (2019).
The key key_crs is added to this list to activate the multi-grid algorithm.
The exact version of the model used to produce the results used in this paper is archived on Zenodo at https://doi.org/10. 5281/zenodo.3615356, as are input data and scripts to run the model and produce the plots for all the simulations presented in  Table A1. RMSE with HR solution, after one year of simulation.
Appendix A: Sensitivity of tracer to diffusion coefficient In a coupled ocean-biogeochemical model, where the ocean and biogechemical components are running at the same resolution, as the HR and LR configurations, the horizontal diffusion coefficient A l value is the same for active tracers (temperature and 470 salinity) and the passive tracers (the biogeochemical tracers, or the PATCH tracer here). With the multi-grid capacity, passive tracers are running at a lower resolution than active tracers. We need to evaluate the best diffusion coefficient value for the passive tracers. Here we present the results of the HR and HRCRS configurations for the PATCH experiment described in section 4.2.1.
The reference diffusion coefficient A l o value from (9) is 300 m 2 /s for HR configuration and different simulations have 475 been performed for HRCRS configurations with values of 300, 600, 900 and 1200 m 2 /s for A l o . The top figures in Figure A1 represent the HR solution at its resolution (left) and at the HRCRS resolution (right). The HRCRS solutions are represented above and their differences with HR in the bottom plots.
The HRCRS configurations runs with values of 300 and 600 m 2 /s for A l o seems to be too noisy compared to the HR configuration; the maximum value of the tracer inside the patch is also too high compared to HR configuration maximum.

480
The HRCRS configuration run with a value of 1200 m 2 /s for A l o seems to be a little bit more diffuse compared to HR configuration and the maximum value is a little bit lower. The HRCRS configuration run with a value of 900 m 2 /s gives the better reproduction to HR configuration in term of the patch distribution and fine scales representation. Table A1 gives the RMSE with HR for all experiments in the red box. The best result is obtained with a diffusion coefficient value of 900 m 2 /s and the worst with a value of 300 m 2 /s. The results obtained with 1200 m 2 /s are close to the best solution. Figure A1. Surface values of PATCH tracer after one year for HR configuration (a); differences between HRCRS and HR configurations with diffusion=300 m 2 /s (b) , diffusion=600 m 2 /s (c), diffusion=900 m 2 /s (d) and diffusion=1200 m 2 /s (e).