Development of a semi-Lagrangian advection scheme for the NEMO ocean model (3.1)

As resolutions of ocean circulation models increase, the advective Courant number – the ratio between the distance travelled by a fluid parcel in one timestep and the grid size – becomes the most stringent factor limiting model timesteps. Some atmospheric models have escaped this limit by using an implicit or semi-implicit semi-Lagrangian formulation of advection, which calculates materially-conserved fluid properties along trajectories which follow the fluid motion and end at prescribed grid-points. Unfortunately, this formulation is not straightforward in ocean contexts, where the irregular, interior boundaries 5 imposed by the shore and bottom orography are incompatible with traditional trajectory calculations. This work describes the adaptation of the semi-Lagrangian method as an advection module for an operational ocean model. We solve the difficulties of the ocean’s internal boundaries by calculating parcel trajectories using a time-exponential formulation, which ensures that all parcel trajectories remain inside the ocean domain despite strong accelerations near the boundary. Additionally, we derive this method in a way that is compatible with the leapfrog timestepping scheme used in the NEMO10 OPA (Nucleus for European Modelling of the Ocean, Océan Parallélisé) ocean model, and we present simulation results for a simplified test-case of flow past a model island and for 10-year free runs of the global ocean on the quarter-degree ORCA025 grid. Copyright statement. TEXT

intended as a drop-in replacement for the model's other advection modules, and in particular it does not interfere with NEMO's time-stepping algorithm (leapfrog).

Timestep constraints in the ocean
A numerical model with an explicit time-marching scheme must generally limit its timestep to satisfy a Courant-Friedrichs-Lwey (CFL) condition: information must not propagate more than a discretization-defined maximum number of cells in a single step, leading to a maximum stable Courant number. For systems such as the Euler equations (for the atmosphere) or hydrostatic equations (as implemented by NEMO-OPA), the information propagation speeds are controlled by the admissible 30 wave modes of the systems, which become characteristic curves.
In the atmosphere, the most restrictive wave mode is that corresponding to sound waves. These waves are fast compared to atmospheric motions, and in response atmospheric models generally treat sound waves either implicitly or through sub-cycling, especially in the most restrictive vertical direction. The second most stringent restriction comes from simple advection by winds in the upper atmosphere. At the Canadian Meteorological Centre, the atmospheric forecasting system (and atmospheric 35 component of the coupled forecasting system) uses the GEM (Geophysical Environmental Multiscale; Girard et al. (2014)) model, which addresses this timestep restriction through a semi-Lagrangian treatment of advection (Robert, 1982).
In the ocean, the Boussinesq assumption eliminates sound waves, but the model is left with the problem of surface gravity waves. Here, NEMO-OPA takes a similar approach to that used by atmospheric models for sound waves, by either treating the surface pressure gradient in a time-implicit manner (with a linearized free surface, used in this work), or by sub-cycling. 40 The ocean lacks any direct equivalent to the atmosphere's strong upper-air winds, and so advection by the background velocity and internal gravity wave modes compete as the next most limiting factor for the maximum stable timestep. Lemarié et al. (2015) finds that the Courant number associated with vertical advection is more limiting than that associated with internal (baroclininc) gravity waves at resolutions of 1 2 • , and the Courant number associated with horizontal advection catches up with that of gravity waves at resolutions of 1 4 • and finer. 45

Grid stretching
In order to cover the entire ocean in a single, continuous domain, global NEMO-OPA model configurations typically use grids based on the ORCA "tripolar" grid (Madec and Imbard, 1996;Murray, 1996). This grid is defined in the northern hemisphere by an elliptical coordinate system, where the latitude-like coordinate is defined by ellipses with a shared pair of foci and the longitude-like coordinate is defined by the hyperbolas orthogonal to these ellipses. These coordinates match continuously at 50 the equator to lines of latitude and longitude in a Mercator projection. By placing the foci of the ellipses on land, the grid contains no singularities in the ocean domain.
Unfortunately, this placement causes an abundance of small grid cells in the north polar region, especially in the Canadian Arctic Archipelago. Figure 1  Grid Spacing (km) Figure 1. Grid size (defined as min(e1t, e2t)) on the ORCA025 grid. At top: in the global view, gridpoint spacing gradually decreases from the equator towards the north and south poles. At bottom: in a detail view of the north polar region, the grid is especially high-resolution in the southern portion of the Canadian Arctic Archipelago, with gridpoint spacing as low as 3km.
to regional models such as that of Lemieux et al. (2016), which refine this grid while retaining its tripolar structure to use conforming boundary conditions.
The coordinate system is also stretched in the vertical direction. Using the z-level grid option of the NEMO-OPA model, layers near the surface are spaced much more closely together than layers nearer the ocean bottom, in order to provide adequate 60 resolution of the mixing layer. This stretching enhances the impact of vertical advection on the vertical Courant number, even if vertical-current magnitudes are low in absolute terms; Lemarié et al. (2015) notes that vertical advection provides a tighter bound on the timestep than horizontal advection.
Semi-Lagrangian advection alleviates both vertical and horizontal Courant number restrictions by tracing fluid parcels in a Lagrangian, fluid-following coordinate system. This coordinate system is defined so that the end of each timestep the fluid parcels arrive on the prescribed computational grid, and the properties of the fluid parcels (in the ocean setting, temperature, salinity, and horizontal velocity) at the end of the timestep (on the computational grid) are found by interpolating the previousstep, gridded values to the origin point of each parcel's trajectory. This method provides an implicit treatment of advection, allowing timesteps with advective Courant numbers greater than those usually permitted by explicit, Eulerian-form models.
In this work, we describe the initial implementation of a semi-Lagrangian advection routine in NEMO-OPA, based on the 70 configuration of Smith et al. (2018). This configuration uses a linear free surface where the vertical coordinate does not move in time, but we believe that the described method can be generalized.

Existing work
In the atmosphere, semi-Lagrangian advection is a standard technique (Robert, 1982) for the implicit treatment of advection, but especially at large scales the effects of topography are relatively gentle. In particular, trajectory calculations can proceed 75 under the assumption that the fluid parcel does not experience strong boundary-related acceleration. In the ocean domain this assumption is strongly violated, particularly for z-level vertical grids where the bathymetry changes abruptly at lateral cell boundaries.
Some attempts have been made previously to incorporate semi-Lagrangian advection into the ocean context. The work of Casulli and Cheng (1992), which is used as part of the ELCOM lake and estuary model (Hodges and Dallimore, 2006), 80 calculates parcel trajectories via a substepping approach, where fluid parcel trajectories are integrated via an explicit Euler method over many short steps per model timestep. The two-dimensional, unstructured shallow water model of Walters et al. (2007) takes a similar approach, where it also must take at least one substep per element boundary traversed by a fluid parcel.
In this work, we overcome this difficulty with an iterative trajectory calculation which reduces in the limit to an implicit trapezoidal rule. In addition, we also derive the semi-Lagrangian advection scheme in a form which calculates effective advec-85 tive tendencies, such that the advection routine can operate as a "plug-in" scheme for models which traditionally use Eulerian fluxes. We apply this to the NEMO-OPA model, and we believe this algorithm may be useful when applied to other ocean models with a structured grid.
In exchange, however, the semi-Lagrangian advection formulation departs from NEMO's finite-volume interpretation of its tracer and velocity components. By tracing infinitesimal fluid parcels, semi-Lagrangian advection treats gridpoint values 90 analogously to a finite-difference method, and as a consequence the scheme does not naturally offer conservation guarantees. This is not a primary concern for the short to medium-term forecasting applications that form the direct target for this work, but extensions of the semi-Lagrangian scheme to ensure conservation (Lauritzen, 2007) may be needed before the technique is applicable to longer-term climate simulations.
Additionally, Leclair and Madec (2011) has developed an "arbitrary Lagrangian-Eulerian" vertical coordinate scheme, im-95 plemented in recent versions of NEMO. This scheme splits vertical motions into fast (high temporal frequency) and slow motions, and the former are treated by co-moving vertical coordinate surfaces with a regridding step. This coordinate system reduces spurious diapycnal mixing caused by the high-frequency vertical motions, and its Lagrangian treatment of these motions relaxes the corresponding stability restriction.

100
We first introduce the time discretization of the semi-Lagrangian scheme in section 2, in order to develop a formulation that remains compatible with the common leapfrog scheme. In section 3, we begin to spatially discretize the semi-Lagrangian scheme by specifying the horizontal and vertical interpolation operators, and in section 4 we complete the discretization by defining the trajectory calculations. We present preliminary numerical examples in section 5, demonstrating the stability of the advection scheme.

2 Time discretization
The first requirement of a semi-Lagrangian advection scheme for the NEMO-OPA model is that it be consistent with the model's overall timestepping approach: the advection scheme is but one component of the full model.
In version 3 of NEMO-OPA, non-diffusive, non-damping processes such as advection are implemented via the leapfrog scheme (Mesinger and Arakawa, 1976), where at each timestep a field f receives its new value at f A (f "after" the timestep) 110 based on its value at the previous timestep and forcing terms, which are all evaluated on the reference grid x ref . This gives a schematic of: where f A is the field calculated at time t 0 + ∆t, f B is the field evaluated at the known prior time t 0 − ∆t ("before"), f N is the field at the provided time t 0 ("now"), and F is the forcing operator. The forcing operator includes advective processes at the 115 "now" time-level, but diffusive, damping, and hydrostatic pressure terms might be evaluated at either the "before" or "after" time-levels.
This is an Eulerian approach to fluid motion, where tracer and momentum values are tracked along the fixed reference grid at all times, and fluid flows through this grid.

Semi-Lagrangian advection
In contrast, the Lagrangian advection schemes consider the fluid parcel to be the fundamental unit of discretization. In this perspective, if f is a property of a fluid parcel that is conserved along a trajectory 1 , it satisfies the continuous equations: where D Dt = ∂ t + u · ∇ is the material derivative and F L (Lagrangian right-hand side) contains all the same forcing terms as F except those arising from tracer and momentum flux, which are included inside the material derivative.

125
Ordinarily, (2) is discretized so that F L is evaluated following the Lagrangian particles in the moving coordinate frame x(t), satisfying the trajectory equation: From an Eulerian point of view, (3) is a trivial identity based on the definition of the material derivative, but from the Lagrangian point of view (3) must be solved to define x over time.

130
One technique for solving (2) and (3) is the two time-level implicit semi-Lagrangian method, used in the GEM atmospheric model (Girard et al., 2014) among others. Here, the F L terms are evaluated with a trapezoidal rule, discretizing (2) and (3) as: The trajectory equation ( This off-grid, departure point evaluation of u and F L is fundamental to Lagrangian and semi-Lagrangian methods, and can be written more simply as f D (F D L ) for "departure-point f (F)." Neither the time-implicit evaluations (generally) nor the off-grid evaluations (of non-advective forcing) are compatible with the core structure of NEMO-OPA, which 140 considers advection to be just one of many independent operators influencing the F term of (1).

Reconciliation
Implementing semi-Lagrangian advection in NEMO-OPA requires adopting as much of the framework of (1) as possible, without changing the evaluation of non-advective forcing terms. Effectively, the semi-Lagrangian advection routine must ultimately supply a time-trend that, from the perspective of the leapfrog timestep algorithm, is indistinguishable from a conventional flux-145 form advection operator.
To effect this, consider (2) without forcing terms (F L = 0). The function f is preserved following the flow, so this gives the simply-written: This is approximated by taking one timestep of (1) (with only advective forcing F adv ), but the latter involves integrating 150 over the whole interval from t 0 − ∆t to t 0 + ∆t. Thus, we should identify f D (and the departure points generally) not with the "now" time-level in the leapfrog scheme, but with the "before" time-level. Doing so and equating (5) and (1) gives: Equation (7) is prescriptive, and it gives the necessary trend for the leapfrog algorithm. Evaluating it requires f only at the 155 already-known "before" time-level and calculation of the departure points x D . This calculation is further simplified by basing the departure points on the time-centered velocities u N , and the exact algorithm for this calculation will be discussed in more detail in section 4.

Effects of the Asselin filter
To prevent decoupling of odd and even timesteps (damping the computational mode), NEMO-OPA is typically configured 160 to use the Asselin time filter (Asselin, 1972), which adds a small time-damping proportional to ∂ 2 ∂t 2 f . Using the notation of Shchepetkin and McWilliams (2005) adapted to (1), the filter extends the time-marching scheme to the sequence: Equation (8a) is the direct equivalent of (1), creating a provisional "after" value f A * . Equation (8b) applies the filter (with a strength parameter ) with this value and the previous step's provisional field to define a final "now" field, and finally equations (8c) and (8d) are "bookkeeping" steps to shift field labels to become ready for the next timestep. The forcing operator F N * is evaluated based on the provisionally-defined fields.

170
In applying this filter with the semi-Lagrangian forcing, equation (7) is oblivious to the presence of the filter or the difference between f N and f N * . Substituting (7) into (8a) and applying (8c) and (8d) to (8a) and (8b) gives the update equation: In the case of one-dimensional advection by a constant velocity u 0 , the trajectory calculation is trivial and: Since (9) is linear, we can also take its Fourier decomposition in space and consider only a single, arbitrary wave mode, giving f =f (t) exp ikx for a time-varying coefficientf . Applying this to (9) casts the update in a matrix form as: The time-stability of this filter is then governed by the eigenvalues of this matrix. Using the shorthand ω = −ku 0 ∆t, these eigenvalues are: and to leading order in these eigenvalues have squared magnitudes of: signifying stability (|λ| ≤ 1) for all values of ω and thus all Courant numbers.

185
To perform the off-grid interpolations in (7) to find f B (x D ), this method fits a cubic polynomial to the underlying function.
If the single departure point This grid-cube contains up to 64 grid points where f (x) might be defined (subject to boundary conditions), and building a 190 complete interpolation stencil would be cumbersome and inefficient. Instead, the interpolation procedure takes advantage of the tensor-product nature of the grid to separate interpolation along each dimension: Algorithm 1. Three-dimensional interpolation. To find f (x d , y d , z d ) for some off-grid point (x d , y d , z d ): 1. Interpolate f (x) in the vertical to the location [x i , y j , z d ], for i ∈ [a − 1, a + 2] and j ∈ [b − 1, b + 2] 2. Interpolate along the first dimension in this two-dimensional grid to give f (x d , y j , z d ), for j ∈ [b − 1, b + 2].

195
3. Finally, interpolate along the second dimension to give f (x d , y d , z d ).
To effect the one-dimensional interpolations in algorithm 1, we make use of the cubic Hermite polynomials (Hildebrand, 1974). On the interval 0 ≤ χ ≤ 1, these polynomials are: h 10 (χ) = χ 3 − 2χ 2 + χ, and 2 If x d lies along an edge or corner of this interval, then at least one of the resulting interpolations will be trivial. In that case, the choice of which neighbouring interval x d lies "within" is arbitrary.
and a function f (χ) defined on this interval is interpolated via: Here, we prefer to use the cubic Hermite polynomials over simple Lagrange polynomial interpolation because the former choice allows greater freedom (via (15)) in implementation. If f is approximated by a four-point finite difference stencil, then (15) reduces to Lagrange interpolation. However, we can also make other choices for f to impose desirable properties: restricting f to have the same sign as the discrete difference imposes a type of slope limiting, and calculating f through a 205 three-point stencil provides for continuous derivatives. These approaches are discussed in more detail in the following sections.
Interpolation using the above algorithm involves appropriately defining the interval to be scaled to [0, 1] and approximating f at the endpoints. Because of the high aspect ratio of oceanic flows and the special character of vertical motion in a stratified ocean, these approximations differ between the horizontal and vertical interpolations.

Horizontal interpolation 210
In the horizontal, the interpolation in (15) can be directly conducted in grid-index space. Even when the underlying grid is mapped to the sphere, such as in the ORCA global grid (Madec, 2008, Ch. 16), the grid generally transitions smoothly and slowly from point to point 3 . The physical trajectory departure location x d (and y d ) can be translated into a fractional grid index offset by dividing by the appropriate grid scale factor, available inside the NEMO-OPA source code as one of e[12] [tuv].
Achieving third-order accuracy inside (15) is possible, but doing so requires an equally-accurate estimate for f . Unfor-215 tunately, interpolating successively in one dimension using the above algorithm does not allow for precomputation of these derivatives: after the vertical interpolation step, all of the function values need to be taken off-grid, so any precomputed derivatives would themselves require interpolation. Instead, sufficiently-accurate estimates of the derivative are available by applying a finite-difference formula to the function values themselves.

220
For notational simplicity, begin with the last step of the above algorithm where we have f (x d , y j , z d ) and would like to estimate If y d lies between y 0 and y 1 , then the four-point interpolation stencil implies that we have computed f (x d , y j , z d ) In this domain, g (0) and g (1) can be approximated by the finite differences: which then substitute for the appropriate derivatives in (15).  These finite differences are exact expressions for the first derivative for polynomials up to third order in j, and their use essentially converts (15) to interpolation via Lagrange polynomials. The Hermite polynomial form, however, allows for an easier imposition of boundary conditions.

Boundary conditions
On the NEMO-OPA z-level grid, the lateral boundaries coincide with uand v-points (velocity points), which are spaced halfway between t-points (tracer points). Tracer points that lie inside the land region are masked (tmask = 0) as are velocity points that are at the edge of or within the masked region. This arrangement is illustrated for a sample region in figure 2.
The physical interpretation of the boundary varies with respect to the field being interpolated. For tracers, lateral boundaries 235 imply no-flux conditions for the purposes of advection, which in turn implies a zero derivative at the boundary. The normal velocity (u with respect to a boundary along the first grid dimension, v with respect to a grid boundary along the second) is obviously constrained to zero by geometry to give a Dirichlet boundary condition, whereas the tangential velocity can be set as free-slip, no-slip, or some combination via a namelist entry. In the subsequent, we assume that velocity has a free-slip boundary condition, with boundary friction left for future work.

240
If a boundary occurs in the left portion of this interpolation stencil, there are a total of seven possible cases:

Algorithm 2. Lateral boundary conditions
To find g(j ) for 0 ≤ j ≤ 1 in the vicinity of a lateral boundary: 1. If g(−1) corresponds to a point at the boundary and the boundary is of the Dirichlet-type, then g(−1) = 0 and (16a) and (15) apply normally. 245 2. If g(−1) is inside the boundary, g(0) is inside the fluid domain (that is, the boundary is between these two points), and the boundary is of the Neumann-type, then g(−1) is taken to be equal to g(0), essentially making it a ghost point.
3. If g(−1) is inside the boundary, g(0) is inside the fluid domain, and the boundary is of the Dirichlet-type, then g(−1) is taken to be −g(0). (0) is at the boundary and the boundary is of the Dirichlet-type, then g(0) = 0 and g(−1) = −g(1). 250 5. If g(0) is inside the boundary and j < 0.5, then the interpolated point is itself inside the boundary and should be masked.

−g(2).
For boundaries that occur in the right portion of the interpolation stencil, the values taken for ghost points are given sym-255 metrically.
The combination of "the grid point is at the boundary" and "the boundary is of Neumann-type" is missing from algorithm 2. This construction is forbidden by the grid structure of NEMO-OPA, where tangential velocity is located one half-cell away from a boundary.
For two dimensional interpolation, algorithm 2 applies independently to each dimension. When interpolating along x, the 260 points f (x d , y j , z d ) will each individually be either in the ocean domain and valid or in the land domain and masked, which provides the values necessary to compute f (x d , y d , z d ). This off-grid point itself must lie in water, which imposes a strong requirement on the trajectory calculations to be discussed in section 4.

Slope limiting
As a final step, once values for the function and its derivative at the interval endpoints are specified, the derivative values are 265 limited to help prevent new maxima in the interpolated function. In particular, if g(0) is a local minimum (maximum) among itself, g(−1), and g(1), then g (0) is set to zero if the above procedure finds that it would be negative (positive). A similar procedure applies symmetrically for g (1) if g(1) is a relative extremum.
This limiting is milder than methods derived from Bermejo and Staniforth (1992), which would strictly preserve positivity for any j , but it effectively limits excursions when j is close to 0 or 1. Without such limiting, numerical testing showed 270 that semi-Lagrangian advection of temperature and salinity could cause weak instabilities near the coastline, where a locally extreme temperature or salinity could become "trapped" near the coast and slowly amplified.

Vertical interpolation
Vertical motion in the NEMO-OPA model differs from horizontal motion in a number of respects: -Vertical gradients of temperature and density are much stronger than typical horizontal gradients, especially near the 275 surface.
-Typical vertical grids used with NEMO-OPA are strongly stretched, with a higher resolution near the surface and a lower resolution in the deep ocean.
-Vertical flow is often oscillatory, where vertical motion is driven by barotropic and baroclinic waves.
The horizontal interpolation described in section 3.1 is third-order accurate; with the provided one-sided formulas for calcu-280 lating the endpoint derivatives it reduces to a four-point (cubic) Lagrangian interpolation process. However, the smooth field implied by this interpolation process is only C 0 continuous: We do not find this to be a practical concern for horizontal interpolation, since horizontal currents in most of the ocean tend to be dominated by relatively steady quasi-geostrophic motions. In the vertical, however, we found that even low-amplitude 285 oscillations caused by high-frequency gravity waves would cause the temperature and salinity fields to drift. The mechanism is that a fluid parcel displaced upwards by in one timestep and downwards by in the next timestep would see an effective diffusion proportional to the jump between the upward and downward-looking vertical derivatives.
To maintain global accuracy, we impose C 1 continuity in the vertical direction through an alternative treatment of the vertical derivative. Instead of applying equations (16), we treat the physical depth (rather than grid index) as the relevant coordinate 290 and construct a centered estimate of the derivative.
For a function f (z n ) defined at the z n levels, define ∆f and ∆z − = z n − z n−1 . These differences combine to give the estimated derivative: which is accurate to O(∆z 2 ) for the derivative and accurately reproduces quadratic functions of z. In the limiting case of a 295 constant ∆z (equispaced vertical levels), this formula reduces to the classic centered difference.
Because vertical interpolation comes first in algorithm 1, (17) need be evaluated only at grid points, and in fact it may be precomputed for the entire grid for a given function and timestep. This is a key advantage of placing vertical interpolation first in the interpolation sequence, and it avoids duplication of work.
Whereas interpolation near the horizontal boundaries is complicated by the many combinations of grid staggering and 300 physical boundary conditions, interpolation near the vertical boundaries is much simpler. On the NEMO grid, tracers and horizontal velocities lie along the same vertical level, and these levels are staggered one half-cell away from the boundaries.
Likewise, the natural vertical boundary condition for both tracer and horizontal velocity fields is a no-flux boundary condition; NEMO-OPA models boundary-layer friction in another module. Interpolation near the boundaries then proceeds in two steps.
The first step is to define f z at the top and bottom points in the water column, for which the central difference formula of (17) 305 is not directly valid. Here, we approximate the physical no-flux condition through a fictitious ghost point such that ∆f − = 0 at the top boundary and ∆f + = 0 at the bottom boundary, with the respective ∆z matching the layer thickness (e3t).
The second step is to define how (15) applies to the interval between the grid level and the physical boundary. Here, the no-flux boundary conditions reduce to even symmetry, and the derivative at the ghost points is the negative of the vertical derivative calculated for the in-boundary point. Near the free surface, if the interpolation point is above the level of the free 310 surface (above z = 0) then it is clamped to the surface itself. Near the ocean bottom, if the interpolation point is below the level of the ocean bottom (below z = z max ) then the point is masked and is treated as an "inside the boundary" point for the purposes of horizontal interpolation above.

Treatment of partial cells
Over most of the domain, this interpolation works well. Although there is no guarantee of positivity in the derivative formulation 315 of (17), overshoots and the consequent generation of spurious maxima are limited. For the tests presented in section 5, there was no need for slope-limiting for vertical interpolation over most of the domain.
One exception to this rule is at the bottom boundary. Here, vertical levels are spaced far apart, but to better-represent the ocean bottom the z-level grid of NEMO-OPA uses a partial cell configuration (Madec, 2008, sec. 5.9). For water columns where the bottom-most cell is much deeper than its neighbours, a local (small) upwelling can cause an overshoot of temperature or 320 salinity that spuriously increases the local density but does not diminish the upwelling. Over time, the maxima-increasing trend can accumulate and cause some points at the bottom boundary to reach implausibly cold temperatures (below −10 • C, for example) or high salinities. In the absence of explicit horizontal diffusion (which would mix this maximum into more dynamically-active regions), these spurious maxima do not generally corrupt the flow, although they obviously would corrupt whole-ocean (or whole-level) statistics such as average or extreme temperatures.

325
Near these boundary cells, vertical limiting is implemented in the simplest possible way: the interpolation of (15) is replaced with a constant, such that f (z) = f (z k ) over the interval from z k downwards to the physical boundary.
Implementing this limiting over the whole bottom level is possible, but that is far stronger than necessary and leads to erroneous diffusion along gentle slopes. When the bottom layer is composed of partial cells of varying thickness, even interpolation along a horizontal plane (that is, without changing physical depth) requires vertical interpolation in grid space to find that constant level in adjoining columns. Imposing vertical limiting along the whole bottom level effects undesired horizontal diffusion, even though the problem solved by limiting is observed when adjoining cells have large relative thickness variations.
As a compromise between these two errors, we only apply the described limiting to vertical interpolation for cells at the bottom boundary which have a layer thickness greater than 1.75 times that of their "thinnest" neighbour.
This exact threshold is empirical, and other grids might require a re-tuning of this parameter. Ideally, the grid generation 335 itself would avoid abrupt transitions in cell-layer thicknesses, but adding such a restriction would make this advection scheme useless as a drop-in replacement for the standard advection routines of NEMO-OPA.

A numerical example
As a simple numerical example, consider the case of a tracer being advected in a rectangular, two-dimensional domain by an internal wave and a background current. This tracer satisfies the advection equation: for some prescribed velocity field (u, w).
If this tracer field σ(x, z, t) would be a function of z alone (σ(z)) if not for the wave motion, then its motion is analytically given by:

345
where η(x, z) is the isopycnal displacement, u 0 is the x-directed background current (uniform in z), and c is the phase speed of the wave. Following Turkington et al. (1991), a streamfunction defined as: gives velocities: which are exact solutions of (18) for σ(x, z, t).
To give an internal wave that respects no-flux conditions at the top and bottom of the domain, we set: where k and m are horizontal and vertical wavenumbers respectively and A is the wave amplitude. For a domain of size L x in 355 the horizontal (periodic) and L z in the vertical, k = 2π/L x and m = π/L z give the lowest internal wave mode, used here.
In dimensional units, we take the model domain to be a channel L x = 1km long and L z = 100m deep with a background current of u 0 = 1m s −1 , and we set c = N/ √ k 2 + m 2 based on a mean buoyancy frequency of N = .03s −1 , which corresponds to a 1% density change from the surface to the bottom of the channel. With a wave amplitude of A = 10m, the maximum waveinduced current is about 10% of u 0 , and the phase speed is c ≈ 0.94m s −1 .

360
In order to represent the pycnocline found in many ocean waters, we choose 4σ (z) = tanh 1 2 − 1 10 zL −1 y . The domain is discretized by N x × N y points, defined as: where i = 1, 2, · · · , N x , j = 1, 2, · · · , N z , and: This implements a stretched vertical coordinate that increases the vertical resolution in the vicinity of the pycnocline.

Semi-Lagrangian advection
In integrating this system with semi-Lagrangian advection, the leapfrog method reduces to an Euler method of twice the timestep because there is no external forcing. The time-discrete equation is: where (x d(ij) , z d(ij) ) is the departure point of the trajectory that arrives at the gridpoint (x i , z j ), and the off-grid evaluation of σ proceeds via the interpolation processes described earlier without slope-limiting.
The departure points are given by the trapezoidal rule 5 with a time-centered evaluation of velocity: where the velocities are evaluated exactly via (21). The overall system (25) is solved via simple iteration, with an initial guess This algorithm is stable for large timesteps, so we tested this system for timesteps corresponding to Courant numbers of 0.2 and 2.1, with spatial grid resolutions between 40 × 4 and 2560 × 256. The final integration time was chosen to be 380 t f in = 5L x /u 0 , which allowed the wave to propagate through the domain several times. Since the exact solution is analytically known, we recorded the maximum error experienced over the integration, and error convergence rates are shown in figure 3. 4 Since this section tests advection alone, the scaling of σ is not dynamically relevant. In fact, the wave structure of (22) corresponds to an exact internal mode of the incompressible Navier-Stokes equations for a linear stratification. 5 The trajectory calculation scheme of section 4 could be used instead, but since the overall trajectory lengths are small compared to the length scales of the velocity field (Lx and Lz), that method would give equivalent results to the trapezoidal rule.

Flux-form advection
As a control, we also integrate this system in flux form (σ t − ∇ · (uσ) = 0) via centered differences, with σ evaluated at the midpoints between grid cells via a simple average, matching the central difference tracer advection scheme in NEMO-OPA.

385
The velocity field given by (20) is divergence free, so this form of the equation is pointwise equivalent to (18). However, this no longer holds after discretization. In order to eliminate the divergence error, the velocity field is defined by creating the streamfunction at the staggered points (x i+1/2 , z j+1/2 ) and defining discrete velocities u and w via the discrete equivalents to (21). With this modification, the discrete flux-form operator is equivalent to a discrete advection equation.
As usual, this leapfrog timestepping algorithm is only stable to a maximum Courant number of 1. With this staggered grid and vertical grid stretching, the Courant number can be defined by: For the mode-one internal wave with background current used in this section, the maximum Courant number is reached at the top and bottom of the domain (where w = 0), so max(C) = max(u)/∆x.
We present results for (26) at a maximum Courant number of 0.2, chosen to give a "small timestep" for later comparison with semi-Lagrangian results. The results are insensitive to the timestep within the stable range, with less than 5% change in 400 maximum-norm error over the range 0.2 ≤ max(C) ≤ 0.99.

Results
The error over time of this test case is shown in figure 3. As expected, each method achieves second-order convergence. For the Eulerian advection control case, this is governed by its two-point central difference scheme. For the semi-Lagrangian cases, the dominant contribution to the error field comes from the lower-order vertical interpolation. While the semi-Lagrangian method 405 has a higher order of accuracy for horizontal motion, here the problem is constructed such that horizontal and vertical motions are of equal importance.
As is often observed with semi-Lagrangian methods, the overall error of the scheme is somewhat lower for the high-CFL case than for the low-CFL case. The interpolation used to evaluate σ off the grid introduces error with each interpolation, and  Overall, this simplified test case supports the conclusion that the semi-Lagrangian treatment of advection is a viable replacement for flux-form advection. The semi-Lagrangian method achieves similar (for low-CFL flows) or better (for high-CFL flows) error, and it remains stable for CFL values substantially larger than unity.

Trajectory calculation 415
With the mechanism for evaluating the f B (x D ) term in (7) established in section 3, the remaining half of the semi-Lagrangian advection algorithm is the estimation of the x D departure points. This corresponds to the positions at the "before" time-level (t 0 − ∆t) of those fluid parcels that will arrive on the reference grid x ref the "after" time-level (t 0 + ∆t). One such upstream location exists for each valid grid location, so in general x d needs to be estimated for each t, u, and v point on the NEMO-OPA grid to provide for (respectively) the tracer and velocity advective forcings.

420
In general, calculation of the departure points is an implicit and nonlinear problem, requiring knowledge of the flow velocity at every sub-grid place and time between the "before" and "after" time-levels, before the flow at the latter has been computed.
To make this problem tractable, we make a series of simplifying assumptions.
The first such assumption is to freeze the flow, such that trajectories are computed based on strictly the "now" velocities (that is, u, v, and w at the intermediate time-level). This is consistent with the underlying leapfrog timestepping algorithm 425 and the other advection schemes in NEMO-OPA, where most fluxes are computed instantaneously with respect to the same "now" velocities. In physical terms, this constrains fluid parcels to follow paths based on estimated, instantaneous streamlines.
In exchange, this decouples the trajectory computation from the "after" velocities and makes the process time-explicit, which eliminates what would otherwise be a need to iterate the entire timestepping process.

Exponential integration 430
Ordinarily, the next assumption in the trajectory calculation is to approximate the particle paths, either by a straight line or by a low-degree polynomial. In this case, the Lagrangian equation: is integrated with an approximate quadrature. Using the trapezoidal rule gives the approximation:  Now, apply equation (29) to the fluid parcel that arrives at x a = (1, 0). Along this streamline, v = 0 by inspection so this equation reduces to one dimension and has the solution: For small values of ∆t, this solution is reasonable. For ∆t > 1, however, this solution leads an unphysical trajectory, where the departure point is found to lie in the left-half plane (and thus lies inside the boundary).

445
The failure here is a specific example of (29) failing the Lipschitz trajectory-crossing criterion (Smolarkiewicz and Pudykiewicz, 1992), which requires u x ∆t < C ≈ 1. The trajectory implied by (30) crosses the trajectories of fluid parcels that arrive at x a = (1 ± , 0), and the resulting advection loses its physical meaning.
This trajectory-crossing criterion is a physical limit for solutions which develop discontinuous shocks, such as those that can arise in simulations of the non-dispersive, nonlinear shallow water equations. However, these shocks are not typical of three-450 dimensional hydrostatic flows in the ocean, and they are certainly not universally seen at solid boundaries. The true trajectories of fluid parcels, if evaluated exactly, do not cross (and do not have origins inside the land domain), so a better approach is to directly integrate (28) without approximating the time derivative. Here, this one-dimensional system reduces to the ordinary differential equation: with the boundary condition x(t 0 + ∆t) = x a = 1. The solution to this equation is obviously of the form x(t) = C exp(t) for some constant C, and applying the boundary condition gives x(t) = exp(t − (t 0 + ∆t)) and a departure point of This solution is very well-behaved, lying exclusively in the right half-plane and asymptotically approaching the wall at x = 0 as ∆t → ∞. This approach works when that of (29) fails because the direct integration properly captures the exponential-in-460 time path of the fluid parcel.
A generalization of this approach forms the basis for trajectory calculation in this work. Since the solution of (28) is not analytically possible with an arbitrary velocity field, we exactly solve (28) based on an approximate, linearly-varying velocity field. This is similar to an approach discussed by Walters et al. (2007), where within a single, two-dimensional finite-element cell the linear velocity form is exactly-given by the underlying discretization rather than an approximation.

465
Assume that an arbitrary fluid parcel arrives at x a , and that we know the velocity there (v a ) and at another point v(x c ) = v c .
We know that the fluid parcel must arrive at x a travelling in the direction ofv a = v a /α a , with α a = v a . Then v c can be written in terms of this direction as v c = α cva + β cna , for scalar α c and β c and somen a normal tov a .
This forms a two-dimensional system spanned by vectorsv a andn a . If we additionally make the assumption that v(x) varies linearly in this plane, we can construct a simplified, two-dimensional coordinate system to solve (28). Here, the origin 470 of the coordinate system corresponds to x a , and the rotated coordinatesx andŷ align withv a andn a respectively. This implies that x c projects onto the point (x c ·v a , x c ·n a ) = (x c , y c ). The linearly-interpolated velocities lie strictly in this plane, so the equations of motion for a fluid parcel are:

Trajectory iteration
Evaluating (33)  This algorithm is ideally suited to cases that look like flow away from a stagnation point, where a fluid parcel is accelerating as it reaches the grid point at t 0 + ∆t. In those cases, the (α c − α a )/x c terms will be positive, and the exponential terms will limit the size of the trajectory for finite ∆t. In the opposite case, however, the exponential terms will tend to lengthen the 495 trajectory. For large ∆t or large deceleration, this effectively demands that (32)-(33) extrapolate beyond the velocity sample at x c , a potential source of instability.
To remedy this, a limiter is added to step 3 of algorithm 3, whereby x(t 0 − 2∆t) is constrained to the greater 6 of that from (33a) and −2∆t max(α a , α c ). When limiting is necessary it effectively reduces the timestep used for the trajectory iteration, so for consistency a revised ∆t is computed by inverting (33a) with the limited x c , which is then used to evaluate (33b).

Underrelaxation and land boundaries
While the construction of algorithm 3 guarantees that trajectories cannot converge to an out-of-boundary point, there are no guarantees that the algorithm remains in-boundary during the iteration process or that the iteration converges. The problem of a divergent or oscillatory iteration is more likely when the underlying velocity field does not resemble the linearly-approximated velocity field integrated by (32), as then each iteration might result in very different approximations.

505
Addressing the latter point first, this work heuristically applies underrelaxation when algorithm 3 is slow to converge. After 10 local iterations, step 3 is replaced by x c ← 1 2 (x c + x c ), after 20 iterations the right-hand side becomes 1 4 (3x c + x c ), and after 30 iterations the right-hand side becomes 1 8 (7x c + x c ). At 40 iterations, the trajectory is truncated by ending the iteration with the first in-domain point returned from the process; this ensures some sort of advection even if the iterative process enters a limit cycle.

510
This underrelaxation also addresses the possibility that x c might lie outside of the ocean domain. If x c is masked, then there is no valid velocity to provide via off-grid interpolation, so instead of evaluating (33) x c is set to x a in step 3 of algorithm 3.
This combines with the underrelaxation after 10 iterations to reduce the trajectory length until an in-boundary point is found, whereupon iteration resumes normally.
These values for iteration counts and underrelaxation parameters are conservatively specified. In the numerical tests dis-515 cussed in this work, the vast majority of trajectories converge after one or two iterations, without needing to resort to underrelaxation or trajectory truncation.

Velocity interpolation
The trajectory algorithm requires the off-grid interpolation of velocities at each iteration. In principle, these velocities can be interpolated using the interpolation process of section 3. Doing so would be ideal for ultimate consistency with the final off-grid 520 interpolation, but this process is also computationally expensive. In practice, it is more efficient to evaluate the off-grid velocity field in step 2 of algorithm 3 using trilinear interpolation; doing so causes little change in the numerical test cases in this work.
Trilinear interpolation proceeds with the same order of operations as algorithm (1): velocities are first interpolated in depth to the (x, y) corners of the grid-box at the off-grid level, then along the x-direction, and finally along the y-direction. Each individual interpolation respects the relevant boundary condition, so for example the u-velocity is considered to reflect symmetrically 525 around a boundary in y and z but is constrained to zero at a boundary in x.
One complication of linear interpolation, however, is that the velocity points are staggered by half a cell with respect to the This causes two problems for trajectory computation. The first problem is that after repeated one-dimensional interpolation, the boundary condition is no longer necessarily respected by the interpolated velocity, which can result in a trajectory iteration that "pushes" the departure point into the wall, causing non-convergence. The second problem is that while the interpolation 535 process guarantees continuity of the interpolated field at the cell corners, the boundary conditions can cause large discontinuities along the cell edges, again resulting in a convergence failure. In the above example, the interpolated velocity at U (+ , 0.6) would be influenced by both U (0, 1) and the zero velocity at the physical boundary of U (0, 0), but the interpolated velocity at U (− , 0.6) would be influenced by U (0, 1) and its reflection at a ghost point. These problems are illustrated in the top panel of The solution to both of these problems is to blend the linearly-interpolated function with a corner singularity solution. A bilinear function is a solution to Laplace's equation (∇ 2 f = 0), so it is reasonable to consider corner solutions that are also solutions to Laplace's equation.
The blended function exactly respects the solid boundary condition, and the discontinuity at the cell edges is significantly reduced. Blended functions for other configurations of the solid wall are given by applying the appropriate reflections and 555 rotations to (36).

Flow past an island
To demonstrate the impacts of semi-Lagrangian advection on a simple test case with a lengthened timestep, we first present the quasi-two dimensional test case of isothermal flow past an interposed island.

560
This test case consists of a 280 × 70 × 3 point grid, with grid resolution ∆x = ∆y = 5m and ∆z = 10m. A 50m × 50m region (10 × 10 points) is masked as land in the middle of the domain. The inflow boundary condition is set to u = 0.03m/s, v = 0; this was also imposed throughout the domain as an initial condition. The reference frame was also irrotational, with a Coriolis parameter of 0.
Relevant namelist parameters are given in table 1, with parameters that differ between the control and semi-Lagrangian runs 565 highlighted. The control run used flux-form velocity advection 7 via the QUICKEST scheme (Leonard, 1979(Leonard, , 1991, whereas the semi-Lagrangian run used semi-Lagrangian advection of momentum in flux form as described in sections 3 and 4. To emphasize the dynamical differences between the advection schemes, both test cases were run with no explicit horizontal diffusion of momentum. Vertical mixing terms, largely irrelevant for this quasi-two dimensional case, were set consistently with the ORCA025 simulations in section 5.2.

570
Both series of runs used the implicit free surface formulation (enabled with the compile-time key key_dynspg_flt), which damped the large initial surface gravity waves caused by the imposition of the blocking island on the steady-state flow.
After the initial gravity-wave adjustment, this test case quickly develops a set of recirculating vortices in the lee of the island.
Over time these vortices grow in extent and would begin detaching to form a vortex street, but this does not happen before the 8000s end of the simulation. Although there is no explicit horizontal diffusion of momentum in these runs, the flow regime is 575 much more laminar than would be implied by the physical Reynolds number of 1.5 · 10 6 , based on the free-stream velocity, edge-length of the island, and molecular viscosity of water.
In moving around the box, the flow locally accelerates to a maximum steady velocity of about 0.05m s −1 , and this maximum velocity is reached in the vicinity of the leading-edge corners of the box. The exact value of this maximum depends on both the simulation time and the timestep, but our expected pattern holds: the control simulation is stable with a timestep of 64 seconds, which corresponded to a maximum steady Courant number (max(u steady )/∆x) above 0.6 (and a maximum transient CFL of 0.95), but it is unstable with a timestep of 80 seconds.
Semi-Lagrangian advection maintains stability for much longer timesteps. Figure 5 shows the free surface height and flow streamlines for ∆t between 5 and 160 seconds, and only the semi-Lagrangian method remains stable for 80 and 160-second timesteps. For both advection schemes, the longer timestep is associated with a more diffuse flow pattern, with lengthening 585 (and less intense) recirculating vortices in the lee of the island.
This effect is stronger with semi-Lagrangian advection than with Eulerian advection. We attribute this to the nature of the flow at the leading edge of the island. Here, the dominant flow balance is cyclostrophic, where the pressure gradient at these corners balances the local vorticity. The operator splitting method used here treats the advective terms in a frame following the flow, but it can only apply the pressure force at the destination cell. This results in an inconsistency that grows with ∆t, related 590 to the forces in equation (2)

595
In the full ocean, the geostrophic effect predominates, with a leading-order balance between the pressure gradient and the Coriolis force (planetary vorticity), so we expect this issue to be less pronounced.

Global forced runs
To evaluate semi-Lagrangian advection in a more realistic forecasting setting, we conducted a preliminary series of free runs of the NEMO-OPA model. The runs consisted of: Lagrangian advection of momentum are at right. As the timestep increases both advection schemes show more diffuse behaviour, however the semi-Lagrangian advection scheme remains stable to ∆t = 160s whereas the Eulerian scheme becomes unstable after ∆t = 64s.
-A control run, based on the configuration of Environment and Climate Change Canada's 1/4• nominal-resolution Global Ice/Ocean Prediction System (GIOPS) (Smith et al., 2016) with a 10-minute model timestep 8 . Tracers were advected with the model's tracer variance dissipation scheme (Lévy et al., 2001), and momentums were advected in vector form with the model's energy and enstrophy conserving scheme 9 (Arakawa and Lamb, 1981), -A "semi-Lagrangian tracer" run, where momentum was advected as in the control scheme and the semi-Lagrangian 605 advection described in this work was used for advection of salinity and temperature. Additionally, this run disabled horizontal diffusion of salinity and temperature, and -A "semi-Lagrangian momentum and tracer" run, where momentum as well is advected with the semi-Lagrangian scheme. The configuration was otherwise the same as the semi-Lagrangian tracer run, save for a 15-minute model timestep.

610
The runs were all initialized at October 1, 2001 on the ORCA025 grid. The ocean was at rest, and temperature and salinity were given by the 2011 World Ocean Atlas climatology Zweng et al., 2013). Atmospheric forcing was provided at one-hour intervals from Environment and Climate Change Canada's 1/4 • global atmospheric reforecast, and sea ice was modeled via coupling with version 4.0 of the CICE model (Hunke and Dukowicz, 1997), with dynamically active (moving) ice. Selected namelist parameters are provided in table 2.

615
As with section 5.1, the test cases used NEMO-OPA's linear free surface with a time-implicit solver, and tidal forcing was not present in these configurations. In a typical timestep, the vast majority of semi-Lagrangian trajectories converged in one iteration (mean 1.004 over the "semi-Lagrangian tracer" run). A very small minority of cells required an extended number of iterations or underrelaxation as described in section 4.2, but this did not affect the overall trajectory-calculation performance because convergence was measured (and iterations limited) on a per-cell basis.

620
Each run continued through late 2009. For reasons of space efficiency, we recorded the two-dimensional sea surface height, temperature, and salinity fields for each model day, and we preserved every fifth daily-mean, three-dimensional output of temperature, salinity, and horizontal ocean velocity.
For short and medium-term forecasts, the operational coupled forecasting systems at CMC are constrained by observations and periodic re-initialization. With a focus on this forecasting horizon the objective with these long free-runs was:

625
-To provide a test of model stability with semi-Lagrangian advection, in terms of both avoiding crashes and providing plausible ocean fields; 8 This timestep is shorter than other commonly-used ORCA025 configurations, such as in the ocean reanalysis of Ferry et al. (2016). This shorter timestep is required to stabilize the coupling of ocean/ice stress with the CICE model, where following Roy et al. (2015) the ice/ocean drag coefficient is larger than typically considered. We chose to maintain this configuration and coupling approach to provide for the cleanest like-for-like comparisons against the operational configuration 9 For compatibility with the operational model, as run in this work the scheme did not include the "fix" for the Hollingsworth instability (Hollingsworth et al., 1983) reported in Ducousso et al. (2017). This instability is more prominent at higher resolutions, and we do not believe it meaningfully impacted the results as presented in this section. -To check for any large-scale conservation errors, which might be difficult to correct given the sparsity of observation data for the deep ocean, and -To note any qualitative improvement or deterioration in the effective resolution of the model.

630
This first goal of model stability was met in part by the successful completion of these runs. Use of semi-Lagrangian advection for both tracers and momentum allowed us to increase the effective timestep from 10 minutes (with typical maximum Courant number 10 of 0.2, found in the vertical direction) to 15 minutes (Courant number 0.3). Further increases led to instability and model crashes not from the advection component, but from the ice model. In this version of the model, the ocean/ice stress is coupled in a time-explicit way between the water and ice components. Concurrent work towards a time-implicit coupling 635 has given encouraging preliminary results on further timestep increases.
The use of semi-Lagrangian advection also gives global flows qualitatively similar to the control run, and average transports in the Atlantic overturning circulation and Circumpolar current are comparable between the control and semi-Lagrangian runs (figure 6). The semi-Lagrangian runs appear to result in a slightly weaker overturning circulation and a slightly stronger circumpolar current than the control run, but these results may not be robust to re-tuned physical paramterizations. Using semi-

640
Lagrangian advection for the velocity components results in a significant decrease to overall ocean kinetic energy (figure 7), both during and after the spin-up period.
The cause of this energy disparity is under investigation, but we believe the most likely cause is the application of slopelimiting to the u and v fields independently. Future work will focus on taking a more nuanced approach to filtering, but this effect may not be very significant in a shorter-term forecast setting with frequent re-initializations from an analysis.

645
The second goal of global conservation was met. Although semi-Lagrangian advection does not guarantee conservation of tracers, the impact on the global balance of temperature and salinity was small. Figure 8 shows the evolution of oceanaverage temperature and salinity over time in these runs, and the effect of non-conservation attributable to the semi-Lagrangian advection of tracers is comparable to the magnitude of uncertainty in the global balance of atmospheric forcing -the imbalance seen in the control run. Each case saw an overall temperature drift of about 0.04 K over the simulated period, with the semi-650 Lagrangian cases having a slight warming trend against the control run's slight cooling trend, and all three runs showed a very small increase in ocean average salinity, by about 0.01 PSU.
The temperature change versus depth over the simulated period is shown in figure 9. Both the control and semi-Lagrangian runs showed a warming trend in the surface layers, but the semi-Lagrangian runs showed temperature stability in fluid layers below 1000 meters depth whereas the control run showed a cooling trend in these waters.

655
Despite the energy shortfall with semi-Lagrangian advection of momentum, we see tentative signs that the method increases the model's effective resolution. Figure 10  -The Hermite interpolation form in section 3, especially combined with the C 1 -continuous estimate of the vertical derivative in section 3.2 might find application in other domains where, as in the ocean, one dimension (the vertical) is more 670 oscillatory than others.
-The exponential integration of trajectories in (4) may be useful in other applications that feature strong accelerations over trajectories. In particular, it forbids trajectory-crossing in one dimensional flows, and here that property ensures that trajectories remain inside the ocean domain.
-The "corner solution" treatment of velocity for trajectory calculations near corners might find use in other applications 675 with staggered velocity components.
Overall, we find that the semi-Lagrangian method is effective at extending the realizable timestep in the NEMO-OPA model.

Performance and implementation
In spite of this increased timestep, the semi-Lagrangian method by itself does not yet improve overall computational performance. The semi-Lagrangian momentum and tracer run of section 5.2 took approximately one hour of computational time 685 per five days of simulated time, using 128 Intel Xeon E5530 processors at 2.4GHz. With a 10-minute timestep, the semi-Lagrangian tracer run took approximately 50 minutes for the same five days of simulated time, whereas the control run took just 30 minutes. We expect these results to improve with further numerical optimization work. In particular, we did not take great care to ensure that loops were vectorized where possible, and it is much more difficult for compilers to automatically About one-third of the additional computational cost comes from trajectory iterations, and the remainder comes from the cubic interpolation. This suggests that the relative cost of semi-Lagrangian advection will be lower than presented here if tra- jectories can be reused for multiple tracer species (such as biogeochemical constituents). Additionally, it suggests that a further optimization may be to re-use tracer trajectories for momentum advection, at least away from the boundaries where interpolat-695 ing (staggering) trajectories might be reasonable. It seems unlikely that optimization will reduce the per-timestep penalty to the 20% value seen by Ritchie et al. (1995) for an atmospheric -model owing to the lack of three-dimensional implicit equations and expensive physical parameterizations elsewhere in NEMO-0PA -but we are hopeful that semi-Lagrangian advection will nonetheless improve overall system performance.
The parallel (MPI) implmenetation of this algorithm was straightforward. With the relatively modest increase in Courant 700 number for the cases in this work, we simply needed to increase the inter-processor lateral halo (parameters jpreci and jprecj) to three points, which was sufficient to allow a fluid parcel arriving at a processor's edge to apply the full interpolating stencil for the Courant numbers reached in the presented simulations. This increase in halo size was small compared to the processor tile size of about 50 × 260 grid points for the runs in section 5.2. Extending this to support very large horizontal Courant numbers, however (if another solution could be found to stabilize baroclinic waves) would require either prohibitively

Qualitative comments on results
Although the semi-Lagrangian method does not guarantee tracer conservation, we see no evidence that its implementation here leads to a degradation relevant in a weekly to seasonal forecast setting. In particular, the deep-water temperature stability shown in figure 9 is an encouraging sign that semi-Lagrangian advection will preserve the deep-water structure that is weakly con-710 strained by data. Even small imbalances, however, might become significant over the decade-to-century timescales of climate simulations. Further work will be necessary to characterize this method before we can safely recommend semi-Lagrangian advection in such settings.
For the test cases in section 5.2, semi-Lagrangian advection of tracers appears to slightly increase the effective resolution of the model. However, this effect is much more mixed when momentum is also advected with the semi-Lagrangian method, 715 in part because the underlying currents differ. Both of these differences will be the subject of future study, with the specific intention of assessing these effects in the setting of shorter-term forecasts. We speculate that the overall loss of kinetic energy with semi-Lagrangian advection of momentum is attributable to the use of the slope limiter: limiting each component of velocity separately may be causing unrealistic diffusion of smaller-scale structures in the presence of background vorticity. We hope to address this issue with more selective limiting.

Future development
Finally, the development in this paper implicitly assumes that the coordinate system is static with time. This is not the case in NEMO-OPA when using its nonlinear free surface option, which necessarily implies time-varying vertical levels. Adapting the semi-Lagrangian method to this more general coordinate system will be a focus of future work, which will be required to apply this advection scheme to higher-resolution domains that require tide-permitting simulations.

725
Additionally, future versions of NEMO intend to move to a third-order Runge-Kutta time-stepping algorithm (Shu and Osher, 1997), which constructs a full timestep as a linear combination of forward Euler steps. We expect that the semi-Lagrangian "advective trend" of (7) can be adapted to this framework in a straightforward manner by basing the calculated trend on the current-step values of tracers and velocities, but the adaptation may require care to preserve the higher-order temporal accuracy of the overall scheme.
Shu, C.-W. and Osher, S.: Efficient Implementation of Essentially Non-oscillatory Shock-Capturing Schemes, II, in: Upwind and High-