Semi-Lagrangian advection in the NEMO ocean model

As model resolutions increase, the Courant-Frederichs-Lewy (CFL) number based on advective motion becomes the limiting factor in setting the timestep of time-explicit circulation models. Some atmospheric models escape this limit by using an implicit or semi-implicit semi-Lagrangian formulation of advection. This formulation calculates fluid properties along parcel trajectories which follow the fluid motion and end, for each timestep, at prescribed grid-points. This work is the first application of the semi-Lagrangian method to an operational ocean model. In this context, we solve 5 the difficulty posed by the ocean’s irregular, interior boundaries by calculating parcel trajectories using a time-exponential formulation. This formulation ensures that all trajectories that are solutions to a fixed-point iteration have an origin point in the valid domain, and it does not require any prescribed extrapolation of the fluid velocities into the invalid (land) portion of the domain. We derive this method in a way that is compatible with the leapfrog timestepping scheme used in the NEMOOPA (Nucleus for European Modelling of the Ocean, Océan Parallélisé) ocean model, and we present simulation results for a 10 simplified test-case of flow past a model island and for 10-year free runs of the global ocean on the quarter-degree ORCA025 grid.

3 https://doi.org/10.5194/gmd-2020-9 Preprint. Discussion started: 7 April 2020 c Author(s) 2020. CC BY 4.0 License. assumption that the fluid parcel experiences relatively little acceleration. This assumption is more significantly broken by the irregular bathymetry and coastline of the ocean. 55 Some attempts have been made previously to incorporate semi-Lagrangian advection into the ocean context. The work of Casulli and Cheng (1992), which is used as part of the ELCOM lake and estuary model (Hodges and Dallimore, 2006), calculates parcel trajectories via a substepping approach, where fluid parcel trajectories are integrated via an explicit Euler method over many short steps per model timestep. The two-dimensional, unstructured shallow water model of Walters et al. (2007) takes a similar approach, where it also must take at least one substep per element boundary traversed by a fluid parcel. 60 In this work, we overcome this difficulty with an iterative trajectory calculation which reduces in the limit to an implicit trapezoidal rule. In addition, we also derive the semi-Lagrangian advection scheme in a form which calculates effective advective tendencies, such that the advection routine can operate as a "plug-in" scheme for models which traditionally use Eulerian fluxes. We apply this to the NEMO-OPA model, and we believe this algorithm may be useful when applied to other ocean models with a structured grid.

Organization
We first introduce the time discretization of the semi-Lagrangian scheme in section 2, in order to develop a formulation that remains compatible with the common leapfrog scheme. In section 3, we begin to spatially discretize the semi-Lagrangian scheme by specifying the horizontal and vertical interpolation operators, and in section 4 we complete the discretization by defining the trajectory calculations. We present preliminary numerical examples in section 5, demonstrating the stability of the 70 advection scheme.

Time discretization
The first requirement of a semi-Lagrangian advection scheme for the NEMO-OPA model is that it be consistent with the model's overall timestepping approach: the advection scheme is but one component of the full model.
In version 3 of NEMO-OPA, non-diffusive, non-damping processes such as advection are implemented via the leapfrog 75 scheme (Mesinger and Arakawa, 1976), where at each timestep a field f receives its new value at f t0+∆t based on its value at the previous timestep and forcing terms. This gives a schematic of: where RHSE f (Eulerian right-hand side) generally stands in for both true forcing terms (such as heating or evaporation) as well as time-tendency terms from the transport and momentum equations. For advective processes the RHSE terms arising 80 from tracer and momentum flux are evaluated at the current time-level, whereas RHSE terms arising from diffusive, damping, and hydrostatic pressure may be evaluated at either the previous or new time-levels.
This is an Eulerian approach to fluid motion, where tracer and momentum values are tracked at specific locations (namely the grid points) over time, and fluid flows through this grid. In contrast, the semi-Lagrangian advection scheme considers the where D Dt = ∂ t + u · ∇ is the material derivative and RHSL f (Lagrangian right-hand side) contains all the same forcing terms as RHSE except those arising from tracer and momentum flux, which are included inside the material derivative.
Ordinarily, (2) is discretized so that RHSL f is evaluated following the Lagrangian particles. One such approach is the semi-90 implicit semi-Lagrangian (SISL) method used in the GEM atmospheric model (Girard et al., 2014) among others, where the RHSL terms are evaluated using a trapezoidal rule to give: where x(t 0 +∆t) is constrained to be coincident with the computational grid (abbreviated as the unadorned x in the subsequent) and x(t) is solved for iteratively.

95
The difference between (3) and (1) is more than just cosmetic. The forcing terms in (3) are evaluated off-grid either directly or via interpolation, and doing so inside NEMO-OPA would be incompatible with its core structure, which considers advection to be just one of many operators that influences the RHSE term in (1).
To reconcile these differences, the semi-Lagrangian method described in this work adopts as much of the framing of (1) as possible, including evaluating the forcing terms strictly on-grid. To adapt the semi-Lagrangian method to this framework, 100 consider (2) without forcing terms (RHSL = 0) over the interval t 0 − ∆t to t 0 + ∆t. The function f is preserved following the flow, so this gives: Equating (1) and (4), noting that x in (1) corresponds x(t 0 + ∆t) in (4), gives: which defines the advective forcing term RHS f,adv that, in the context of the leapfrog method, emulates the behaviour of semi-Lagrangian advection.
Equation (6) forms the fundamental equation for this work, and the subsequent focuses on the interpolations necessary to approximate x(t 0 −∆t) and f (x(t 0 −∆t)). Importantly, the effects of the Earth's curvature and the curvilinear grid used are not 110 included in this equation, even when f assumes the role of the u or v components of velocity. NEMO-OPA separately includes the effects of planetary vorticity and of the coordinate metric in the governing equations, and including those pseudo-forces inside (6) would double-count these effects.
To effect the one-dimensional interpolations in algorithm 1, we make use of the cubic Hermite polynomials (Hildebrand, 1974). On the interval 0 ≤ χ ≤ 1, these polynomials are: and a function f (χ) defined on this interval is interpolated via:

130
Here, we prefer to use the cubic Hermite polynomials over simple Lagrange polynomial interpolation because the former choice allows greater freedom (via (8)) in implementation. If f is approximated by a four-point finite difference stencil, then (8) reduces to Lagrange interpolation. However, we can also make other choices for f to impose desirable properties: restricting f to have the same sign as the discrete difference imposes a type of slope limiting, and calculating f through a three-point stencil provides for continuous derivatives. These approaches are discussed in more detail in the following sections.

135
Interpolation using the above algorithm involves appropriately defining the interval to be scaled to [0, 1] and approximating f at the endpoints. Because of the high aspect ratio of oceanic flows and the special character of vertical motion in a stratified ocean, these approximations differ between the horizontal and vertical interpolations.

Horizontal interpolation
In the horizontal, the interpolation in (8) can be directly conducted in grid-index space. Even when the underlying grid is mapped to the sphere, such as in the ORCA global grid (Madec, 2008, Ch. 16), the grid generally transitions smoothly and slowly from point to point 4 . The physical trajectory departure location x d (and y d ) can be translated into a fractional grid index offset by dividing by the appropriate grid scale factor, available inside the NEMO-OPA source code as one of e[12] [tuv].
Achieving third-order accuracy inside (8) is possible, but doing so requires an equally-accurate estimate for f . Unfortunately, interpolating successively in 1D using the above algorithm does not allow for precomputation of these derivatives:

145
after the vertical interpolation step, all of the function values need to be taken off-grid, so any precomputed derivatives would themselves require interpolation. Instead, sufficiently-accurate estimates of the derivative are available by applying a finitedifference formula to the function values themselves.

Derivative estimates
For notational simplicity, begin with the last step of the above algorithm where we have f (x d , y j , z d ) and would like to estimate If y d lies between y 0 and y 1 , then the four-point interpolation stencil implies that we have computed f (x d , y j , z d ) for some j ∈ [0, 1]. In this domain, g (0) and g (1) can be approximated by the finite differences: 155 which then substitute for the appropriate derivatives in (8).
These finite differences are exact expressions for the first derivative for polynomials up to third order in j, and their use essentially converts (8) to interpolation via Lagrange polynomials. The Hermite polynomial form, however, allows for an easier imposition of boundary conditions.

160
On the NEMO-OPA z-level grid, the lateral boundaries coincide with u-and v-points (velocity points), which are spaced halfway between t-points (tracer points). Tracer points that lie inside the land region are masked (tmask = 0) as are velocity points that are at the edge of or within the masked region. This arrangement is illustrated for a sample region in figure 2.
The physical interpretation of the boundary varies with respect to the field being interpolated. For tracers, lateral boundaries imply no-flux conditions for the purposes of advection, which in turn implies a zero derivative at the boundary. The normal 165 velocity (u with respect to a boundary along the first grid dimension, v with respect to a grid boundary along the second) is obviously constrained to zero by geometry to give a Dirichlet boundary condition, whereas the tangential velocity can be set as  free-slip, no-slip, or some combination via a namelist entry. In the subsequent, we assume that velocity has a free-slip boundary condition, with boundary friction left for future work.
If a boundary occurs in the left portion of this interpolation stencil, there are a total of seven possible cases: 170 Algorithm 2. Lateral boundary conditions To find g(j ) for 0 ≤ j ≤ 1 in the vicinity of a lateral boundary: 1. If g(−1) corresponds to a point at the boundary and the boundary is of the Dirichlet-type, then g(−1) = 0 and (9a) and (8) apply normally.
2. If g(−1) is inside the boundary, g(0) is inside the fluid domain (that is, the boundary is between these two points), and 6. If g(0) is inside the boundary, j > 0.5, and the boundary is of the Neumann-type, then g(0) = g(1) and g(−1) = g(2).
For boundaries that occur in the right portion of the interpolation stencil, the values taken for ghost points are given symmetrically.

185
The combination of "the grid point is at the boundary" and "the boundary is of Neumann-type" is missing from algorithm 2. This construction is forbidden by the grid structure of NEMO-OPA, where tangential velocity is located one half-cell away from a boundary.

Slope limiting
As a final step, once values for the function and its derivative at the interval endpoints are specified, the derivative values are 190 limited to help prevent new maxima in the interpolated function. In particular, if g(0) is a local minimum (maximum) among itself, g(−1), and g(1), then g (0) is set to zero if the above procedure finds that it would be negative (positive). A similar procedure applies symmetrically for g (1) if g(1) is a relative extremum.
This limiting is milder than methods derived from Bermejo and Staniforth (1992), which would strictly preserve positivity for any j , but it effectively limits excursions when j is close to 0 or 1. Without such limiting, numerical testing showed 195 that semi-Lagrangian advection of temperature and salinity could cause weak instabilities near the coastline, where a locally extreme temperature or salinity could become "trapped" near the coast and slowly amplified.

Two-dimensional application
With the one-dimensional algorithm completely specified above, the interplay between steps 2 and 3 of Algorithm 1 is now clear. After step 2, interpolation along the first dimension, we have now specified f (x d , y j , z d ) such that each point contains 200 either an interpolated value or is found (by step 3 of Algorithm 1) to be inside the land region and hence masked. That provides the requisite four gridded function values to compute f (x d , y d , z d ).
Generally, this value should not be inside land, as by (6) this is the value that forms the advective trend of a presumably inside-water grid point. This imposes an obligation for the trajectory calculations in section 4 to return only valid departure points, and this requirement will be discussed in more length there. 205 9 https://doi.org/10.5194/gmd-2020-9 Preprint. Discussion started: 7 April 2020 c Author(s) 2020. CC BY 4.0 License.

Vertical interpolation
Vertical motion in the NEMO-OPA model differs from horizontal motion in a number of respects: -Vertical gradients of temperature and density are much stronger than typical horizontal gradients, especially near the surface.
-Typical vertical grids used with NEMO-OPA are strongly stretched, with a higher resolution near the surface and a lower 210 resolution in the deep ocean.
-Vertical flow is often oscillatory, where vertical motion is driven by barotropic and baroclinic waves.
Although the horizontal interpolation described in section 3.1 is third-order accurate, it is ill-suited for interpolation in the vertical. The worst characteristic of the horizontal interpolation is that it is only C 0 continuous, where the interpolating function generally has a discontinuous derivative at the gridpoints. This is reasonable for horizontal flow that does not often 215 change direction, but with the more-oscillatory vertical flow C 0 continuity causes the temperature and salinity fields to drift. In particular, a fluid parcel that is displaced upwards by in one timestep and downwards by on the next would see a net change of temperature and salinity proportional to times the jump between the upward and downward-directed derivatives. This drift does not just remain localized; it causes larger-scale impacts through the generation of baroclinic waves.
Maintaining global accuracy requires that the interpolation be C 1 continuous in the vertical. Fortunately, the Hermite poly-220 nomial interpolation form is already equipped to handle this scenario. For the purposes of the first step of algorithm 1, the relevant coordinate system is physical depth, with z = 0 at the surface and z = z max at the local ocean bottom.
Unlike for horizontal interpolation, where derivatives at the interval endpoints are defined in grid-coordinate space, vertical derivatives at the gridpoints are defined in the physical space using a three-point stencil that reduces to the classic centered difference when the grid spacing is uniform. For a function f (z n ) defined at the z n levels, define ∆f , ∆z + = z n+1 − z n , and ∆z − = z n − z n−1 . These differences combine to give an estimate for the derivative: which is accurate to O(∆z 2 ) for the derivative and accurately reproduces quadratic functions of z.
Because vertical interpolation comes first in algorithm 1, (10) need be evaluated only at grid points, and in fact it may be 230 precomputed for the entire grid for a given function and timestep. This is a key advantage of placing vertical interpolation first in the interpolation sequence, and it avoids duplication of work.
Whereas interpolation near the horizontal boundaries is complicated by the many combinations of grid staggering and physical boundary conditions, interpolation near the vertical boundaries is much simpler. On the NEMO grid, tracers and horizontal velocities lie along the same vertical level, and these levels are staggered one half-cell away from the boundaries.

235
Likewise, the natural vertical boundary condition for both tracer and horizontal velocity fields is a no-flux boundary condition; NEMO-OPA models boundary-layer friction in another module. Interpolation near the boundaries then proceeds in two steps.
The first step is to define f z at the top and bottom points in the water column, for which the central difference formula of (10) is not directly valid. Here, we approximate the physical no-flux condition through a fictitious ghost point such that ∆f − = 0 at the top boundary and ∆f + = 0 at the bottom boundary, with the respective ∆z matching the layer thickness (e3t).

240
The second step is to define how (8) applies to the interval between the grid level and the physical boundary. Here, the no-flux boundary conditions reduce to even symmetry, and the derivative at the ghost points is the negative of the vertical derivative calculated for the in-boundary point. Near the free surface, if the interpolation point is above the level of the free surface (above z = 0) then it is clamped to the surface itself. Near the ocean bottom, if the interpolation point is below the level of the ocean bottom (below z = z max ) then the point is masked and is treated as an "inside the boundary" point for the 245 purposes of horizontal interpolation above.

Treatment of partial cells
Over most of the domain, this interpolation works well. Although there is no guarantee of positivity in the derivative formulation of (10), overshoots and the consequent generation of spurious maxima are limited. For the tests presented in section 5, there was no need for slope-limiting for vertical interpolation over most of the domain.

250
One exception to this rule is at the bottom boundary. Here, vertical levels are spaced far apart, but to better-represent the ocean bottom the z-level grid of NEMO-OPA uses a partial cell configuration (Madec, 2008, sec. 5.9). For water columns where the bottommost cell is much deeper than its neighbours, a local (small) upwelling can cause an overshoot of temperature or salinity that spuriously increases the local density but does not diminish the upwelling. Over time, the maxima-increasing trend can accumulate and cause some points at the bottom boundary to reach implausibly cold temperatures (below −10 • C, 255 for example) or high salinities. In the absence of explicit horizontal diffusion (which would mix this maximum into more dynamically-active regions), these spurious maxima do not generally corrupt the flow, although they obviously would corrupt whole-ocean (or whole-level) statistics such as average or extreme temperatures.
Near these boundary cells, vertical limiting is implemented in the simplest possible way: the interpolation of (8) is replaced with a constant, such that f (z) = f (z k ) over the whole interval from z k downwards to the physical boundary.

260
Implementing this limiting over the whole bottom level is possible, but that is far stronger than necessary and leads to erroneous diffusion along gentle slopes. Imagine a horizontally nondivergent flow, such that the vertical velocity w = 0 everywhere, with salinity and temperature being given by a function of z alone such that the underlying stratification is stable.
Here, horizontal advection should nowhere result in a trend in salinity or temperature, and effecting this (even approximately) in the framework of semi-Lagrangian advection requires extrapolating tracer values downwards towards the boundary. As a 265 compromise, for this work we only limit vertical interpolation as described for cells that are at the bottom boundary and have a layer thickness greater than 1.75 times that of the neighbouring cell with the smallest local layer thickness.
This exact threshold is empirical, and other grids might require a re-tuning of this parameter. Ideally, the grid generation itself would avoid abrupt transitions in cell-layer thicknesses, but adding such a restriction would make this advection scheme useless as a drop-in replacement for the standard advection routines of NEMO-OPA.

A numerical example
As a simple numerical example, consider the case of a tracer being advected in a rectangular, two-dimensional domain by an internal wave and a background current. This tracer satisfies the advection equation: for some prescribed velocity field (u, w).

275
If this tracer field σ(x, z, t) would be a function of z alone (σ(z)) if not for the wave motion, then its motion is analytically given by: where η(x, z) is the isopycnal displacement, u 0 is the x-directed background current (uniform in z), and c is the phase speed of the wave. A streamfunction defined as: gives velocities: which are exact solutions of (11) for σ(x, z, t).

285
To give an internal wave that respects no-flux conditions at the top and bottom of the domain, we set: where k and m are horizontal and vertical wavenumbers respectively and A is the wave amplitude. For a domain of size L x in the horizontal (periodic) and L z in the vertical, k = 2π/L x and m = π/L z give the lowest internal wave mode, used here.
In dimensional units, we take the model domain to be a channel L x = 1km long and L z = 100m deep with a background 290 current of u 0 = 1m s −1 , and we set c = N/ √ k 2 + m 2 based on a mean buoyancy frequency of N = .03s −1 , which corresponds to a 1% density change from the surface to the bottom of the channel. With a wave amplitude of A = 10m, the maximum waveinduced current is about 10% of u 0 , and the phase speed is c ≈ 0.94m s −1 .
In order to represent the pycnocline found in many ocean waters, we choose 5σ (z) = tanh 1 2 − 1 10 zL −1 y . The domain is discretized by N x × N y points, defined as: where i = 1, 2, · · · , N x , j = 1, 2, · · · , N z , and: This implements a stretched vertical coordinate that increases the vertical resolution in the vicinity of the pycnocline.

Semi-Lagrangian advection
In integrating this system with semi-Lagrangian advection, the leapfrog method reduces to an Euler method of twice the timestep because there is no external forcing. The time-discrete equation is: where (x d(ij) , z d(ij) ) is the departure point of the trajectory that arrives at the gridpoint (x i , z j ), and the off-grid evaluation of 305 σ proceeds via the interpolation processes described earlier without slope-limiting.
The departure points are given by the trapezoidal rule 6 with a time-centered evaluation of velocity: where the velocities are evaluated exactly via (14). The overall system (18) is solved via simple iteration, with an initial guess 310 given by setting ( This algorithm is stable for large timesteps, so we tested this system for timesteps corresponding to CFL numbers of 0.2 and 2.1, with spatial grid resolutions between 40 × 4 and 2560 × 256. The final integration time was chosen to be t f in = 5L x /u 0 , which allowed the wave to propagate through the domain several times. Since the exact solution is analytically known, we recorded the maximum error experienced over the integration, and error convergence rates are shown in figure 3.

Flux-form advection
As a control, we also integrate this system in flux form (σ t − ∇ · (uσ) = 0) via centered differences, with σ evaluated at the midpoints between grid cells via a simple average, matching the central difference tracer advection scheme in NEMO-OPA.
The velocity field given by (13) is divergence free, so this form of the equation is pointwise equivalent to (11). However, this no longer holds after discretization. In order to eliminate the divergence error, the velocity field is defined by creating the 320 streamfunction at the staggered points (x i+1/2 , z j+1/2 ) and defining discrete velocities u and w via the discrete equivalents to (14). With this modification, the discrete flux-form operator is equivalent to a discrete advection equation. 6 The trajectory calculation scheme of section 4 could be used instead, but since the overall trajectory lengths are small compared to the length scales of the velocity field (Lx and Lz), that method would give equivalent results to the trapezoidal rule.
As usual, this leapfrog timestepping algorithm is only stable to a CFL number of ∆t max(u)/∆x < 1, so we tested (19) only at a CFL number of 0.2.

Results
The error over time of this test case is shown in figure 3. As expected, each method achieves second-order convergence. For the 330 Eulerian advection control case, this is governed by its two-point central difference scheme. For the semi-Lagrangian cases, the dominant contribution to the error field comes from the lower-order vertical interpolation. While the semi-Lagrangian method has a higher order of accuracy for horizontal motion, here the problem is constructed such that horizontal and vertical motions are of equal importance.
As is often observed with semi-Lagrangian methods, the overall error of the scheme is somewhat lower for the high-CFL case 335 than for the low-CFL case. The interpolation used to evaluate σ off the grid necessarily introduces error with each interpolation, and the overall contribution of this error necessarily scales in proportion to the number of interpolations, and hence inversely with the timestep.
Overall, this simplified test case supports the conclusion that the semi-Lagrangian treatment of advection is a viable replacement for flux-form advection. The semi-Lagrangian method achieves similar (for low-CFL flows) or better (for high-CFL 340 flows) error, and it remains stable for CFL values substantially larger than unity.

Trajectory calculation
With the mechanism for evaluating the f t0−∆t (x(t 0 − ∆t)) term in (6) established in section 3, the remaining half of the semi-Lagrangian advection algorithm is the estimation of the x(t 0 − ∆t) departure points (again abbreviated x d ). This corresponds to the positions at the "before" timestep (t 0 − ∆t) of those fluid parcels that will arrive at the grid points at the "after" timestep 345 (t 0 + ∆t). One such upstream location exists for each valid grid location, so in general x d needs to be estimated for each t, u, and v point on the NEMO-OPA grid to provide for (respectively) the tracer and velocity advective forcings.  In general, calculation of the departure points is an implicit and nonlinear problem, requiring knowledge of the flow velocity at every sub-grid place and time between the "before" and "after" time-levels, before the flow at the latter has been computed.
To make this problem tractable, we make a series of simplifying assumptions.

350
The first such assumption is to freeze the flow, such that trajectories are computed based on strictly the "now" velocities (that is, u, v, and w at the intermediate time-level). This is consistent with the underlying leapfrog timestepping algorithm and the other advection schemes in NEMO-OPA, where most fluxes are computed instantaneously with respect to the same "now" velocities. In physical terms, this constrains fluid parcels to follow paths based on estimated, instantaneous streamlines.
In exchange, this decouples the trajectory computation from the "after" velocities and makes the process time-explicit, which 355 eliminates what would otherwise be a need to iterate the entire timestepping process.

Exponential integration
Ordinarily, the next assumption in the trajectory calculation is to approximate the particle paths, either by a straight line or by a low-degree polynomial. In this case, the Lagrangian equation: 360 is integrated with an approximate quadrature. Using the trapezoidal rule gives the approximation: where x a = x(t 0 + ∆t) is the on-grid "arrival" point of the fluid parcel. This approximation is second-order in time, and it results in an iterative method where v(x d ) is interpolated, leading to a revised estimate of x d .
Unfortunately, this approximation is not suitable for trajectory calculations in the general ocean because it does not appro- x a = (1, 0). Here, the y-directed velocity along the streamline is zero by inspection (and so y d is also zero), so (21) reduces to a one-dimensional equation with solution: A better approach is to directly integrate (20). Here, this one-dimensional system reduces to the ordinary differential equation: with the boundary condition x(t 0 + ∆t) = x a = 1. The solution to this equation is obviously of the form x(t) = C exp(t) 380 for some constant C, and applying the boundary condition gives x(t) = exp(t − (t 0 + ∆t)) and a departure point of x d =

exp(−2∆t).
This solution is very well-behaved, lying exclusively in the right half-plane and asymptotically approaching the wall at x = 0 as ∆t → ∞. This approach works when that of (21) fails because the direct integration properly captures the exponential-intime path of the fluid parcel.

385
A generalization of this approach forms the basis for trajectory calculation in this work. Since the solution of (20) is not analytically possible with an arbitrary velocity field, we exactly solve (20) based on an approximate, linearly-varying velocity field. This is similar to an approach discussed by Walters et al. (2007), where within a single, two-dimensional finite-element cell the linear velocity form is exactly-given by the underlying discretization rather than an approximation.
Assume that an arbitrary fluid parcel arrives at x a , and that we know the velocity there (v a ) and at another point v(x c ) = v c .

390
We know that the fluid parcel must arrive at x a travelling in the direction ofv a = v a /α a , with α a = v a . Then v c can be written in terms of this direction as v c = α cva + β cna , for scalar α c and β c and somen a normal tov a .
This forms a two-dimensional system spanned by vectorsv a andn a . If we additionally make the assumption that v(x) varies linearly in this plane, we can construct a simplified, two-dimensional coordinate system to solve (20). Here, the origin of the coordinate system corresponds to x a , and the rotated coordinatesx andŷ align withv a andn a respectively. This implies 395 that x c projects onto the point (x c ·v a , x c ·n a ) = (x c , y c ). The linearly-interpolated velocities lie strictly in this plane, so the equations of motion for a fluid parcel are: subject to the boundary condition that x(t 0 + ∆t) = y(t 0 + ∆t) = 0. (24a) can be solved first, and applying the boundary 400 condition x(t 0 + ∆t) = 0 gives: Applying this to (24b) along with its boundary condition y(t 0 + ∆t) = 0 gives: When the along-trajectory acceleration is small (|(α c −α a )∆t/x c | 1), (25) reduces to a trapezoidal rule with second-order 405 accuracy in time.

Trajectory iteration
Evaluating (25)  This algorithm is ideally suited to cases that look like flow away from a stagnation point, where a fluid parcel is accelerating as it reaches the grid point at t 0 + ∆t. In those cases, the (α c − α a )/x c terms will be positive, and the exponential terms will limit the size of the trajectory for finite ∆t. In the opposite case, however, the exponential terms will tend to lengthen the trajectory. For large ∆t or large deceleration, this effectively demands that (24)-(25) extrapolate beyond the velocity sample at 420 x c , a potential source of instability.
To remedy this, a limiter is added to step 3 of algorithm 3, whereby x(t 0 − 2∆t) is constrained to the greater 7 of that from (25a) and −2∆t max(α a , α c ). When limiting is necessary it effectively reduces the timestep used for the trajectory iteration, so for consistency a revised ∆t is computed by inverting (25a) with the limited x c , which is then used to evaluate (25b).

425
While the construction of algorithm 3 guarantees that trajectories cannot converge to an out-of-boundary point, there are no guarantees that the algorithm remains in-boundary during the iteration process or that the iteration converges. The problem of a divergent or oscillatory iteration is more likely when the underlying velocity field does not resemble the linearly-approximated velocity field integrated by (24), as then each iteration might result in very different approximations.
Addressing the latter point first, this work heuristically applies underrelaxation when algorithm 3 is slow to converge. After 430 10 local iterations, step 3 is replaced by x c ← 1 2 (x c + x c ), after 20 iterations the right-hand side becomes 1 4 (3x c + x c ), and after 30 iterations the right-hand side becomes 1 8 (7x c + x c ). At 40 iterations, the trajectory is truncated by ending the iteration with the first in-domain point returned from the process; this ensures some sort of advection even if the iterative process enters a limit cycle.
This underrelaxation also addresses the possibility that x c might lie outside of the ocean domain. If x c is masked, then there 435 is no valid velocity to provide via off-grid interpolation, so instead of evaluating (25) x c is set to x a in step 3 of algorithm 3.
This combines with the underrelaxation after 10 iterations to reduce the trajectory length until an in-boundary point is found, whereupon iteration resumes normally.
These values for iteration counts and underrelaxation parameters are conservatively specified. In the numerical tests discussed in this work, the vast majority of trajectories converge after one or two iterations, without needing to resort to underre-440 laxation or trajectory truncation.

Velocity interpolation
The trajectory algorithm requires the off-grid interpolation of velocities at each iteration. In principle, these velocities can be interpolated using the interpolation process of section 3. Doing so would be ideal for ultimate consistency with the final off-grid interpolation, but this process is also computationally expensive. In practice, it is more efficient to evaluate the off-grid velocity 445 field in step 2 of algorithm 3 using trilinear interpolation; doing so causes little change in the numerical test cases in this work. Trilinear interpolation proceeds with the same order of operations as algorithm (1): velocities are first interpolated in depth to the (x, y) corners of the grid-box at the off-grid level, then along the x-direction, and finally along the y-direction. Each individual interpolation respects the relevant boundary condition, so for example the u-velocity is considered to reflect symmetrically around a boundary in y and z but is constrained to zero at a boundary in x.

450
One complication of linear interpolation, however, is that the velocity points are staggered by half a cell with respect to the physical boundary. In two dimensions, if the tracer point T (0, 0) (to use grid-cell coordinates for the tracer grid denoted T ) is a land point but T (0, 1), T (1, 0), and T (1, 1) are all ocean points, then u-velocity point U (0, 0) (denoting the u-velocity grid as U ), halfway between T (0, 0) and T (1, 0), lies along the boundary. The boundary continues to U (0, 0.5), whereupon U (0, 0.5)-U (0, 1) lies inside the ocean. This violates a basic assumption of linear interpolation, that the velocity should vary 455 smoothly (and approximately linearly) within the u-cell.
This causes two problems for trajectory computation. The first problem is that after repeated one-dimensional interpolation, the boundary condition is no longer necessarily respected by the interpolated velocity, which can result in a trajectory iteration that "pushes" the departure point into the wall, causing non-convergence. The second problem is that while the interpolation process guarantees continuity of the interpolated field at the cell corners, the boundary conditions can cause large discontinuities along the cell edges, again resulting in a convergence failure. In the above example, the interpolated velocity at U (+ , 0.6) would be influenced by both U (0, 1) and the zero velocity at the physical boundary of U (0, 0), but the interpolated velocity at U (− , 0.6) would be influenced by U (0, 1) and its reflection at a ghost point. These problems are illustrated in the top panel of for y > 0.5, the "corner" solution to Laplace's equation is: while bilinear interpolation would give: These two solutions are blended together, with (26) taking precedence along the solid boundary (x = 0 and 0 ≤ y ≤ 0.5) and (27) taking precedence along the x = 1 and y = 1 boundaries of the cell. This gives: where σ = max(1 − x, 2(y − 0.5)).
The blended function exactly respects the solid boundary condition, and the discontinuity at the cell edges is significantly reduced. Blended functions for other configurations of the solid wall are given by applying the appropriate reflections and rotations to (28).

Flow past an island
To demonstrate the impacts of semi-Lagrangian advection on a simple test case with a lengthened timestep, we first present the quasi-two dimensional test case of isothermal flow past an interposed island.
This test case consists of a 280 × 70 × 3 point grid, with grid resolution ∆x = ∆y = 5m and ∆z = 10m. A 50m × 50m 485 region (10 × 10 points) is masked as land in the middle of the domain. The inflow boundary condition is set to u = 0.03m/s, v = 0; this was also imposed throughout the domain as an initial condition. The reference frame was also irrotational, with a Coriolis parameter of 0. Relevant namelist parameters are given in table 1, with parameters that differ between the control and semi-Lagrangian runs highlighted. The control run used flux-form velocity advection 8 via the QUICKEST scheme (Leonard, 1979(Leonard, , 1991, whereas 490 the semi-Lagrangian run used semi-Lagrangian advection of momentum in flux form as described in sections 3 and 4. To emphasize the dynamical differences between the advection schemes, both test cases were run with no explicit horizontal diffusion of momentum. Vertical mixing terms, largely irrelevant for this quasi-two dimensional case, were set consistently with the ORCA025 simulations in section 5.2. Both series of runs used the implicit free surface formulation (enabled with the compile-time key key_dynspg_flt), 495 which damped the large initial surface gravity waves caused by the imposition of the blocking island on the steady-state flow.
After the initial gravity-wave adjustment, this test case quickly develops a set of recirculating vortices in the lee of the island.
Over time these vortices grow in extent and would begin detaching to form a vortex street, but this does not happen before the 8000s end of the simulation. Although there is no explicit horizontal diffusion of momentum in these runs, the flow regime is much more laminar than would be implied by the physical Reynolds number of 1.5 · 10 6 , based on the free-stream velocity, 500 edge-length of the island, and molecular viscosity of water.
In moving around the box, the flow locally accelerates to a maximum steady velocity of about 0.05m s −1 , and this maximum velocity is reached in the vicinity of the leading-edge corners of the box. The exact value of this maximum depends on both the simulation time and the timestep, but our expected pattern holds: the control simulation is stable with a timestep of 64 seconds, which corresponded to a maximum steady CFL number above 0.6 (and a maximum transient CFL of 0.95), but it is unstable 505 with a timestep of 80 seconds.
Semi-Lagrangian advection maintains stability for much longer timesteps. Figure 5 shows the free surface height and flow streamlines for ∆t between 5 and 160 seconds, and only the semi-Lagrangian method remains stable for 80 and 160-second timesteps. For both advection schemes, the longer timestep is associated with a more diffuse flow pattern, with lengthening (and less intense) recirculating vortices in the lee of the island.

510
This effect is stronger with semi-Lagrangian advection than with Eulerian advection. We attribute this to the nature of the flow at the leading edge of the island. Here, the dominant flow balance is cyclostrophic, where the pressure gradient at these corners balances the local vorticity. The operator splitting method used here treats the advective terms in a frame following the flow, but it can only apply the pressure force at the destination cell; there is an inconsistency that grows with ∆t. This is evident in the 160-second timestep case (bottom right panel of figure 5), where the maximum steady CFL number of 1.6 implies that 515 fluid parcels are advected by about three grid cells over the 2∆t leapfrog step. There, the lowest pressure region at the leading edge of the flow has moved slightly further downstream.
In the full ocean, the geostrophic effect predominates, with a leading-order balance between the pressure gradient and the Coriolis force (planetary vorticity), so we expect this issue to be less pronounced. The runs were all initialized at October 1, 2001 on the ORCA025 grid. The ocean was at rest, and temperature and salinity were given by the 2011 World Ocean Atlas climatology Zweng et al., 2013). Atmospheric forcing was provided at one-hour intervals from Environment and Climate Change Canada's 1/4 • global atmospheric reforecast, and 535 sea ice was modeled via coupling with version 4.0 of the CICE model (Hunke and Dukowicz, 1997), with dynamically active (moving) ice. Selected namelist parameters are provided in table 2.

Global forced runs
As with section 5.1, the test cases used NEMO-OPA's linear free surface with a time-implicit solver, and tidal forcing was not present in these configurations. Lagrangian advection of momentum are at right. As the timestep increases both advection schemes show more diffuse behaviour, however the semi-Lagrangian advection scheme remains stable to ∆t = 160s whereas the Eulerian scheme becomes unstable after ∆t = 64s. temperature, and salinity fields for each model day, and we preserved every fifth daily-mean, three-dimensional output of temperature, salinity, and horizontal ocean velocity.
For short and medium-term forecasts, the operational coupled forecasting systems at CMC are constrained by observations and periodic re-initialization. With a focus on this forecasting horizon the objective with these long free-runs was: -To provide a test of model stability with semi-Lagrangian advection, in terms of both avoiding crashes and providing 545 plausible ocean fields; -To check for any large-scale conservation errors, which might be difficult to correct given the sparsity of observation data for the deep ocean, and -To note any qualitative improvement or deterioration in the effective resolution of the model.
This first goal of model stability was met in part by the successful completion of these runs. Use of semi-Lagrangian 550 advection for both tracers and momentum allowed us to increase the effective timestep from 10 minutes (with typical maximum CFL number of 0.2, found in the vertical direction) to 15 minutes (CFL number 0.3). Further increases led to instability and model crashes not from the advection component, but from the ice model. In this version of the model, the ocean/ice stress is coupled in a time-explicit way between the water and ice components. Concurrent work towards a time-implicit coupling has given encouraging preliminary results on further timestep increases.

555
The use of semi-Lagrangian advection also gives global flows qualitatively similar to the control run, and average transports in the Atlantic overturning circulation and Circumpolar current are comparable between the control and semi-Lagrangian runs (figure 6). However, the use of semi-Lagrangian advection for the velocity components results in a significant decrease to overall ocean kinetic energy (figure 7), both during and after the spin-up period.
The cause of this energy disparity is under investigation, but we believe the most likely cause is the application of slope-560 limiting to the u and v fields independently. Future work will focus on taking a more nuanced approach to filtering, but this effect may not be very significant in a shorter-term forecast setting with frequent re-initializations from an analysis.
The second goal of global conservation was met. Although semi-Lagrangian advection does not guarantee conservation of tracers, the impact on the global balance of temperature and salinity was small. Figure 8 shows the evolution of oceanaverage temperature and salinity over time in these runs, and the effect of non-conservation attributable to the semi-Lagrangian 565 advection of tracers is comparable to the mangitude of uncertainty in the global balance of atmospheric forcing -the imbalance seen in the control run.
Each case saw an overall temperature drift of about 0.04 K over the simulated period, with the semi-Lagrangian cases having a slight warming trend against the control run's slight cooling trend, and all three runs showed a very small increase in ocean average salinity, by about 0.01 PSU. Overall, we find that the semi-Lagrangian method is effective at extending the realizable timestep in the NEMO-OPA model. Although we only extended the timestep from 10 to 15 minutes for the semi-Lagrangian momentum run in section 5.2, this limitation was imposed by the ice model. Disabling ice dynamics allowed us to increase the timestep to 30 minutes, but this would have made the results incomparable with those of the control and semi-Lagrangian tracer runs. Preliminary work with the CICE sea model and implicit coupling of the ice-ocean stress seems to allow us to relax the ice-related timestep restriction.

585
In spite of this increased timestep, the semi-Lagrangian method by itself does not yet improve overall computational performance. The semi-Lagrangian momentum and tracer run of section 5.2 took approximately one hour of computational time per five days of simulated time, using 128 Intel Xeon E5530 processors at 2.4GHz. With a 10-minute timestep, the semi-Lagrangian tracer run took approximately 50 minutes for the same five days of simulated time, whereas the control run took just 30 minutes. We expect these results to improve with further numerical optimization work. In particular, we did not take 590 great care to ensure that loops were vectorized where possible, and it is much more difficult for compilers to automatically vectorize the point-by-point semi-Lagrangian computations compared to volume flux calculations in the traditional advection schemes.
Although the semi-Lagrangian method does not guarantee tracer conservation, we see no evidence that its implementation here leads to a degradation relevant in a weekly to seasonal forecast setting. However, even small perturbations may become Ocean average salinity Control SL (Tracer) SL (Momentum) Figure 8. Ocean average temperature (top, • C) and salinity (bottom, PSU) over time for the test cases of section 5.2. Although conservation is not guaranteed by semi-Lagrangian advection, long-term trends are similar between the semi-Lagrangian runs and the control run. significant over decadal to century-long climate simulations, so further work will be necessary before we can safely recommend the use of semi-Lagrangian advection in such settings.
For the test cases in section 5.2, semi-Lagrangian advection of tracers appears to slightly increase the effective resolution of the model. However, this effect is much more mixed when momentum is also advected with the semi-Lagrangian method, in part because the underlying currents differ. Both of these differences will be the subject of future study, with the specific 600 intention of assessing these effects in the setting of shorter-term forecasts. We speculate that the overall loss of kinetic energy with semi-Lagrangian advection of momentum is attributable to the use of the slope limiter: limiting each component of velocity separately may be causing unrealistic diffusion of smaller-scale structures in the presence of background vorticity. We hope to address this issue with more selective limiting.
Finally, the development in this paper implicitly assumes that the coordinate system is static with time. This is not the case