Thetis coastal ocean model: discontinuous Galerkin discretization for the three-dimensional hydrostatic equations

. Unstructured grid ocean models are advantageous for simulating the coastal ocean and river-estuary-plume systems. However, unstructured grid models tend to be diffusive and/or computationally expensive which limits their applicability to real life problems. In this paper, we describe a novel discontinuous Galerkin (DG) ﬁnite element discretization for the hydrostatic equations. The formulation is fully conservative and second-order accurate in space and time. Monotonicity of the advection scheme is ensured by using a strong stability preserving time integration method and slope limiters. Compared to previous DG 5 models advantages include a more accurate mode splitting method, revised viscosity formulation, and new second-order time integration scheme. We demonstrate that the model is capable of simulating baroclinic ﬂows in the eddying regime with a suite of test cases. Numerical dissipation is well-controlled, being comparable or lower than in existing state-of-the-art structured grid models.

tion, which is currently only feasible in relatively small sub-regions (e.g. at the mouth of an estuary; Shi et al. 2016) due to its high computational cost. Historically, regional ocean models have used structured, (deformed) rectilinear lattice grids. Although structured grids offer better computational performance (Danilov et al., 2008;Danilov, 2013), unstructured grids are generally preferred in coastal domains as they can better represent the complex coastal topography and local features (Deleersnijder and Lermusiaux, 2008;Danilov, 2013;Piggott et al., 2013). Due to the large geometrical aspect ratio of the oceans (length versus depth), most models utilize computational grids that are layered in the vertical direction. Typical approaches include the terrain-following sigma levels (Blumberg and Mellor, 1987), equipotential z levels (Griffies et al., 2005), isopycnal coordinates (Bleck, 1978), and their generalizations (e.g. Song and Haidvogel, 1994;Bleck, 2002).
In this article, we focus on solving the hydrostatic equations on an unstructured grid. While many unstructured grid models exist, their drawbacks tend to be excessive numerical diffusion that smooths out important physical features (Kärnä et al., 2015;Kärnä and Baptista, 2016;Ralston et al., 2017), and/or high computational cost. To address these issues, we propose a novel finite element solver for the hydrostatic equations, based on discontinuous Galerkin discretization methods.
Maintaining high numerical accuracy is crucial in ocean applications. The ocean is a forced dissipative system where the mixing of water masses only takes place at the molecular level (Griffies, 2004). In practice, however, the finite grid resolution and numerical schemes used by the model introduce mixing rates of tracers and momentum that can be orders of magnitude larger than physical mixing (Burchard and Rennau, 2008;Rennau and Burchard, 2009;Hiester et al., 2014). Such spurious, numerical mixing is often dominated by the discretization of advection (Marchesiello et al., 2009;, but it can arise from other components as well, such as (implicit) time integration methods (Shchepetkin and McWilliams, 2005), or various filters introduced to improve numerical stability (Danilov, 2012;Zhang et al., 2016). In addition, wetting and drying schemes may introduce additional dissipation in order to stabilize the barotropic equation in the drying regime. We reserve consideration of this important latter topic for a future publication.
In global circulation models, numerical mixing is a major bottleneck as (diapycnal) diffusion is very low in the deep ocean basins and water masses can remain largely unchanged for hundreds of years (Griffies, 2004;. Numerical mixing can, however, be a major issue in coastal domains as well: coastal oceans are characterized by strong density gradients, fronts between water masses (e.g. in river plumes), small-scale dynamics (e.g. internal waves and hydraulic jumps), and baroclinic eddies. An overly diffusive model can, therefore, fail to capture many essential physical features of these domains: it can smear out fronts, underestimate the intrusion of saline waters into embayments (Burchard and Rennau, 2008;Hofmeister et al., 2010;Kärnä et al., 2015;Ralston et al., 2017), or misrepresent mixing in river plumes.
Some unstructured grid models are based on the continuous Galerkin Finite Element (FE) method or hybrid FE-FV formulations. Such models include ADCIRC (Luettich and Westerink, 2004), SELFE (Zhang and Baptista, 2008), and SCHISM (Zhang et al., 2016), and the earlier version of FESOM (Wang et al., 2014). The continuous FE method is ideal for solving elliptic equations but requires stabilization for advection (see Wang et al., 2008a, and references therein). In addition, these methods involve solving a fully-coupled global system which is less efficient in parallel applications compared to the FV method (Danilov, 2012;Danilov et al., 2016).
In recent years, discontinuous Galerkin (DG) methods have gained attention in geophysical modeling (Dawson and Aizinger, 2005;Aizinger and Dawson, 2007;Blaise et al., 2010;Comblen et al., 2010a;Kärnä et al., 2012Kärnä et al., , 2013. DG discretization resembles the FV method because it is local (i.e. elements are only connected by inter-element fluxes), fully conservative, and well-suited for advective problems, yet it offers higher-order accuracy. This article presents a DG discretization for the hydrostatic equations. Our goal is to design an efficient unstructured grid solver where numerical accuracy is not compromised.
Specifically, we aim to meet the following design criteria: a vertically extruded, layered mesh; accurate representation of free surface dynamics; second-order accurate, monotone tracer advection scheme; explicit time integration of 3D variables (except for vertical diffusion); and low numerical mixing.
Based on the advection scheme requirements, we have chosen to use linear discontinuous Galerkin elements for tracers, combined with a slope limiter (Kuzmin, 2010) and a strong stability preserving (SSP) time integration scheme (Shu, 1988;Shu and Osher, 1988;Gottlieb and Shu, 1998;Gottlieb, 2005;Gottlieb et al., 2009). This choice ensures that the scheme is second-order in smooth areas, while slope limiting combined with the SSP time integration scheme ensure monotonicity (i.e. no overshoots). The movement of the free surface is taken into account with an arbitrary Lagrangian-Eulerian (ALE) formulation (Donea et al., 2004), where the mesh moves in the vertical direction. The ALE formulation guarantees strict local and global conservation of volume and tracers and allows for the use of generic vertical grids (Petersen et al., 2015).
All numerical ocean models include some form of friction, either in the form of a numerical closure or a physical parametrization . Numerical closure involves adding a sufficient amount of dissipation to maintain numerical stability. There is a wealth of literature about stable finite volume (e.g., Danilov, 2012) and finite element discretizations (e.g., Hanert et al., 2003;Cotter et al., 2009a, b;Comblen et al., 2010b;McRae and Cotter, 2014) for rotational shallow water equations. Most of these schemes are stable for external gravity waves, and hence do not require any additional dissipation.
Solving the 3D hydrostatic equations under strong baroclinic forcing, however, generates noise at the grid-scale that does require dampening. A common approach is to add some form of viscosity proportional to the grid Reynolds number Ilıcak et al., 2012).  argue that conventional Laplacian viscosity has too wide a spectrum and tends to dissipate physically relevant (larger) scales too much. They show that biharmonic viscosity dissipates smaller scales more, and is thus more appropriate for removing noise at the grid-scale. In contrast to numerical closures, physical parametrizations aim to represent unresolved sub-grid-scale processes, such as strong lateral mixing near coasts or mixing at the bottom boundary layers. In this article, we focus on numerical closures; the presented viscosity schemes are mostly motivated by numerical stability considerations.
In this article, we present an efficient DG implementation of the three-dimensional hydrostatic equations. The model is implemented in the Thetis project -an open source coastal ocean circulation model freely available online (see thetisproject.org).
Thetis implements both a 2D depth-averaged circulation model and a full 3D hydrostatic model, the latter of which is discussed herein.
Thetis is implemented using the Firedrake finite element modeling platform (www.firedrakeproject.org; Rathgeber et al., 2016). We have chosen Firedrake because of its flexibility, and support for extruded meshes Bercea et al., 2016). Firedrake uses high-level abstractions for describing the weak formulation of partial differential equations, specifically the Unified Form Language (Alnaes et al., 2014), and automated code generation to produce efficient C code (Homolya et al., 2018;Luporini et al., 2017) and just-in-time compilation. As such it is an extremely flexible modeling framework that does not sacrifice computational efficiency; it is also an ideal platform for experimenting and benchmarking different discretizations.
Automated code generation can also support different target hardware architectures, making it attractive for current and emerging high-performance computing platforms. In addition, Firedrake can automatically derive the adjoint of the forward model (Farrell et al., 2013), permitting inverse modeling applications such as parameter optimization and data assimilation.
The governing equations are presented in Section 2, followed by their DG finite element discretization in Section 3. The second-order coupled time integration scheme is described in Section 4. Numerical tests are presented in Section 5.

Governing equations
Let Ω be the three-dimensional domain that spans from the sea floor z = −h(x, y) to the free surface z = η(x, y); the bottom and top surfaces are denoted by Γ b and Γ s , respectively. Total water column depth is thus H = η + h. The two-dimensional horizontal domain is denoted by Γ 0 .

The horizontal momentum equation reads
where u = (u, v) and w denote the horizontal and vertical velocity, respectively; ∇ h is the horizontal gradient operator; ∧ denotes the cross product operator; f is the Coriolis parameter; e z is the vertical unit vector; p is the pressure; and ν h and ν are the horizontal and vertical diffusivity, respectively. Water density is defined as ρ = ρ 0 + ρ (T, S, p), where T, S stand for temperature and salinity, respectively, and ρ 0 is a constant reference density.
Under the hydrostatic assumption the horizontal pressure gradient can be written as a combination of external, internal, and atmospheric pressure gradients: where p atm is the atmospheric pressure acting on the sea surface, and is the baroclinic head. For brevity the internal pressure gradient field is denoted as F pg = g∇ h r.
Neglecting atmospheric pressure, the full horizontal momentum equation reads Vertical velocity w is diagnosed from the continuity equation: Water temperature and salinity are modeled with an advection-diffusion equation of the form where µ h , µ stand for the horizontal and vertical (eddy) diffusivity, respectively.
At the bottom boundary we impose quadratic bottom stress where C d is the drag coefficient, and u bf is the velocity in the middle of the bottommost element. n = (n x , n y , n z ) is the outward normal vector, and n h = (n x , n y , 0) its horizontal projection. The bottom boundary condition is treated implicitly; (8) is linearized by keeping the magnitude |u bf | fixed at the "old" value while solving for u (and u bf ). Typically C d is computed from the logarithmic law of the wall (e.g. Kärnä et al., 2013).

Mode splitting
Following Higdon and de Szoeke (1997) we split the horizontal velocity field into depth-averagedū and deviation u = u −ū components. The depth-averaged momentum equation is then defined as where G is a forcing term used to couple the 2D and 3D modes. This equation is complemented with the depth-averaged continuity (free surface) equation: The 2D system (9)-(10) contains the fast-propagating, rotational surface gravity waves. The corresponding equation for u is obtained by subtracting (9) from (4) (Higdon and de Szoeke, 1997): Note that the advection and viscosity terms are included in (11) without splitting, based on the assumption that these processes are slow enough to be captured with long time steps. The Coriolis term, on the other hand, only contains the slow modes.
The vertical velocity w only appears in the advection term, which is not split, and thus there is no need to split w.

Coupling 2D and 3D modes
The 2D and 3D modes are coupled using the additional term G (Higdon and de Szoeke, 1997;Ringler et al., 2013). First, the 3D momentum equation (11) is solved with G = 0, resulting in a velocity field u that has a non-zero depth-average, generated by the advection and viscosity terms (that depend onū). We then compute the depth-average u and apply a correction: to enforce zero depth-average. By definition, the field G is a constant over the vertical, and it will be used as a forcing term in the 2D momentum equation (9) in the subsequent solve. This procedure ensures that equations (9) and (11) sum up to (4) and u dz = 0.

Equation of state
In this paper a linear equation of state is used: where α T , β S are the thermal expansion and saline contraction coefficients, respectively, and T 0 , S 0 are reference temperature and salinity. In all the test cases presented herein salinity does not contribute to water density (β S = 0). Thetis also implements a full non-linear equation of state (Jackett et al., 2006).

Viscosity and turbulence closure
Baroclinic flows require some form of viscosity to filter out grid-scale noise. In this paper we only consider Laplacian horizontal viscosity, set to a constant ν h = U ∆x/Re h corresponding to the velocity scale U , horizontal mesh resolution ∆x, and the desired grid Reynolds number Re h . Here the velocity scale U is taken as a global constant specific to each test case. Unless otherwise specified, the horizontal diffusivity of tracers is zero.
In most test cases vertical viscosity is set to a constant. In certain cases we use the gradient Richardson number dependent parametrization by Pacanowski and Philander (1981): where Ri = N 2 /M 2 is the gradient Richardson number, N is the buoyancy frequency, and M is the vertical shear frequency.

Finite element discretization
This section describes the spatial discretization of the governing equations. In Section 3.1 we define the finite element function spaces, followed by the weak forms of the underlying equations.

Function spaces
The prognostic variables of the coupled 2D-3D system (9,10,11,6) are η,ū, u , T , and S. Diagnostic variables include the vertical velocity w, water density ρ , baroclinic head r, and internal pressure gradient F pg . The choice of function spaces where these variables reside is crucial for numerical stability and accuracy. Our discretization is based on the linear discontinuous Galerkin function space, P DG 1 . The 2D system is discretized with a P DG 1 − P DG 1 velocity-pressure finite element pair: Water elevation and both components of the depth-averaged velocity are approximated in P DG When embedded with appropriate Riemann fluxes at element interfaces the P DG 1 − P DG 1 element pair is well suited for rotational shallow water problems (Comblen et al., 2010b;Kärnä et al., 2011).
Achieving an accurate and monotone 3D tracer advection scheme is one of our main design criteria. The tracers therefore are also considered within a discontinuous function space, T, S ∈ H = P DG 1 × P DG 1 (here the × operator stands for the Cartesian product of function spaces in the extruded mesh: horizontal × vertical function space). Tracer consistency (sometimes called local tracer conservation) is a necessary condition for monotonicity; it ensures that a constant tracer field does not exhibit spurious local extrema. In practice it implies that the discrete tracer equation must reduce to the discrete continuity equation for a constant tracer. In this work we satisfy this property by requiring that the vertical velocity belongs to the tracer space H (White et al., 2008b). In addition, compatibility between the 2D and 3D momentum equations requires that the 3D horizontal velocity must be P DG 1 in the horizontal direction. We therefore set u ∈ U = [P DG 1 × P DG 1 ] 2 as well. Note that this choice of function spaces is not mimetic (McRae and Cotter, 2014;Danilov, 2013): the discrete system does not preserve all the properties of the continuous equations, for example enstrophy is not conserved exactly. As the coastal ocean is generally very dissipative, maintaining mimetic properties is however not crucial. It is possible to define a mimetic discretization as well, for example using Raviart-Thomas elements for the velocity, i.e. element pair RT 1 − P DG 1 (McRae and Cotter, 2014). Our preliminary experiments however indicate that this choice significantly increases the computational cost of the system, without a corresponding improvement in accuracy. Formal assessment of the performance of mimetic discretizations in coastal ocean applications will be investigated in the future.
In the weak forms we use the following notation for volume and interface integrals In interface terms we additionally use the average { {·} } and jump [[·]] operators for scalar a and vector u fields: where the superscripts '+' and '−' arbitrarily label the values on either side of the interface and n is the outward unit normal vector of each element on the interface.

2D system
Let T stand for the triangulation of the 2D domain Γ 0 . The set of element interfaces are denoted by I = {k ∩ k |k, k ∈ T }, and n = (n x , n y ) the outward unit normal vector of an interface e ∈ I. For brevity boundary conditions are omitted from the weak forms.
Let φ 2D ∈ H 2D and ψ 2D ∈ U 2D be test functions in the 2D function spaces. The weak formulation of the 2D system then reads, find η ∈ H 2D ,ū ∈ U 2D such that Here the divergence ∇ h ·(Hū) and external gradient g∇ h η terms have been integrated by parts.

Momentum equation
Let P denote the set of prisms of the 3D domain Ω, obtained from a vertical extrusion of Γ 0 . The set of horizontal and vertical interfaces are denoted by I h and I v , respectively. Let ψ ∈ U be a test function. The weak formulation of the 3D momentum equation then reads: find u ∈ U such that Here the advection and viscosity terms have been integrated by parts (see Kärnä et al., 2013); the colon operator is the

Tracer equation
The weak formulation of the tracer equations is derived analogously: find T ∈ H such that Note that we do not employ the Lax-Friedrichs flux in the tracer equation.

Symmetric Interior Penalty stabilization
The presented discretization is unstable for elliptic operators, and the diffusion operators require additional stabilization. Here we use the Symmetric Interior Penalty Galerkin (SIPG) method (Epshteyn and Rivière, 2007). The SIPG formulation of the tracer diffusion operators read For the viscosity terms we get The penalty factor σ is defined as σ = γ p(p+1) L (Epshteyn and Rivière, 2007), where p is the degree of the basis functions, γ is a factor depending on mesh quality, and L is the local element length scale in the normal direction of the interface. Let h h and h v denote the horizontal and vertical element sizes, and Pestiaux et al., 2014). In this paper we use γ = 5.

Continuity equation
The vertical velocity w is computed diagnostically from the continuity equation (5) where both the left and right hand sides have been integrated by parts. Note that the terms on the bottom surface Γ b vanish due to the impermeability constraint u · n h + wn z = 0.

Computing the internal pressure gradient
The water density is computed diagnostically using the equation of state. We use the same P DG 1 ×P DG 1 function space for tracers and water density. In this work we use a linear equation of state (14), and consequently density can be computed locally (at each node of the tracer field). In general, however, the equation of state is non-linear, and the density is projected on the ρ field.
The baroclinic head is computed from (3) by integrating ρ over the vertical. In practice we solve equation ∂r ∂z = ρ /ρ 0 weakly with the appropriate boundary conditions: Here the left hand side has been integrated by parts, and r up denotes the value on the prism above the interface. Note that the free surface terms vanish because r = 0 on Γ s by definition. We use function space P DG 1 × P 2 for r to alleviate internal pressure gradient errors (Piggott et al., 2008).
Finally, taking a test function ψ ∈ U, we compute the internal pressure gradient with the weak form where the right hand side has been integrated by parts. Usually F pg belongs to the same space as the horizontal velocity, i.e. [P DG 1 × P DG 1 ] 2 . However, to reduce bathymetry induced internal pressure gradient errors it is possible to use a quadratic horizontal space, i.e. r ∈ P DG 2 × P 2 and F pg ∈ [P DG 2 × P DG 1 ] 2 . In this paper we use a linear F pg field unless otherwise specified.

Slope limiters
We use vertex-based P DG 1 slope limiters (Kuzmin, 2010) for three-dimensional variables to ensure positivity. The limiter is applied to both tracer and horizontal velocity fields after each update of the advection operator as discussed in the next Section.

Time integration
The coupled 2D-3D system is advanced in time with a two-stage arbitrary Lagrangian Eulerian (ALE) time integration scheme.
In this section we present the ALE formulation and summarize the final time integration scheme.

ALE mesh formulation
To accurately account for the free surface movement one must move the mesh in the vertical direction. In this work we adopt the ALE method (Donea et al., 2004). Here we describe a mesh update procedure that stretches (or compresses) the mesh uniformly over the vertical direction. The ALE formulation, however, allows more complex mesh moving methods as well, such as the (approximate) tracking of isopycnals (Hofmeister et al., 2010).
In three dimensions an ALE update consists of solving an advection-diffusion equation between two domains, Ω n and Ω n+1 .
Here the domain is uniquely defined by the surface elevation field, such that for any time level n the surface Γ n s matches η n . Due to the chosen discretization the elevation field η is discontinuous, yet we wish to maintain a conforming mesh, i.e. a continuous coordinate field z. This is achieved by projecting the elevation field η n to a continuous space and updating the geometry with the continuous field η n cg . The projection induces a small discrepancy between the elevation field and the 3D domain, but its effect remains negligible in practical applications because jumps in the elevation field are typically small.
Let Ω ref be the reference domain corresponding to unperturbed elevation field η cg = 0, and z ref ∈ [−h, 0] its vertical coordinate. Applying a uniform mesh stretching, the time dependent mesh coordinates can then be written as The mesh velocity is obtained as w m = ∂z ∂t . In practice the consecutive fields η n+1 cg and η n cg are known so we can evaluate Given the mesh velocity a conservative ALE update can be written as for a generic tracer equation ∂T ∂t = F T (T, u, w).

Coupled time integration scheme
The coupled 2D-3D system is advanced in time with a two-stage ALE time integration scheme. For convenience we re-write the 3D momentum and tracer equations as where F T and F u denote all the terms that are treated explicitly while G T and G u contain all the implicit terms. In this work only vertical diffusion (28), vertical viscosity (30), and bottom friction terms are treated implicitly.
The explicit 3D equations are advanced in time with a second-order strong stability preserving (SSP) Runge-Kutta scheme, SSPRK(2,2) (Shu and Osher, 1988;Gottlieb and Shu, 1998). For a generic problem ∂c ∂t = F (c) the scheme reads: When applied to the explicit 3D momentum and tracer equations, (25) and (26), both of these stages are ALE updates where the mesh is updated from geometry Ω n to Ω (1) and then Ω n+1 . The ALE formulation of the explicit 3D tracer equation can then be written as where the vertical velocity is adjusted by the mesh velocity w m .
After the SSPRK update, the implicit terms are advanced with the backward Euler method. This step is computed in domain The 3D momentum equation is treated analogously.
Algorithm 1 Summary of the coupled time integration algorithm.
Require: Model state variables at time tn: η n ,ū n , T n , S n , u n First stage: 1: Solve 2D system for (η (1) ,ū (1) ) (46)- (47) 2: Update mesh geometry to Ω (1) and compute mesh velocity w The 2D equations are advanced in time with an implicit scheme to circumvent the strict time step constraint imposed by surface gravity waves. To ensure consistency between the movement of the 3D mesh and the 2D mode, the 2D time integration scheme must be compatible with the aforementioned SSPRK(2,2) method. Here we use a combination of a forward Euler and trapezoidal steps: Denoting the tendencies of the 2D system (23)-(24) by F η and Fū, respectively, we can write the 2D solver as The second implicit stage is linearized by treating the total depth H explicitly in (48).
The 2D system is solved first, resulting in an updated elevation field (η (1) and η n+1 for the two stages, respectively) and consequently mesh geometry (Ω (1) and Ω n+1 ). Once the mesh geometry is known it is straightforward to compute the corresponding mesh velocity w m and perform a 3D ALE update.
The time integration method is second-order for all the terms. The whole algorithm is summarized in Algorithm 1.

Choosing the time step
The maximal admissible time step is constrained by the stability of the coupled time integrator. The presented SSPRK(2,2) scheme has a CFL (Courant-Friedrichs-Lewy) factor 1. The 2D scheme (45), and the implicit vertical solver (43), on the other hand, are unconditionally stable. This implies that the coupled system is stable under the same conditions as the explicit SSP scheme on its own.
The horizontal advection term imposes a constraint where L h is the horizontal element size, U is the maximal horizontal velocity scale, and σ h is a length scaling factor. For the presented P DG 1 discretization we take L h as the square root of the triangle area. For rectangular P DG 1 elements and 2nd order RK schemes the scaling factor is approximately σ h = 0.33 (Cockburn and Shu, 2001). In this work we use σ h = 0.05 for all the diagnostic test cases. In strongly stratified flows internal waves may impose a stricter constraint where C iw is the speed of the internal waves.
Analogously, the time step constraint for vertical advection is where L z is the element height, W is the vertical velocity scale, and σ v = 0.125 is the scaling factor.
Given a horizontal viscosity scale N h , the explicit viscosity operator imposes a constraint which may become stringent for small elements and large viscosity values. The scaling factor σ visc depends on the used stabilization scheme; here a value σ visc = 2 is used. The constraint for horizontal diffusivity is analogous.
In the simulations presented herein, the minimal admissible time step is evaluated on the mesh based on constant a-priori velocity and viscosity scales. The time step is kept constant throughout the simulation.

Test cases
We demonstrate the performance of the proposed discretization with a suite of test cases of increasing complexity. We first evaluate the conservation and convergence of the solver in a barotropic standing wave test case. The convergence of baroclinic terms is then examined in a specific steady-state test case. The baroclinic solver and its numerical mixing is then evaluated with a non-rotating lock exchange test case and a rotating baroclinic eddy test, followed by the DOME overflow test.

Standing wave
We first evaluate the performance of the solver in a barotropic standing wave test case. The domain is a L x = 60 km long rectangular channel, 625 m wide, and 100 m deep. All lateral boundaries are closed. Initially the water is at rest. A 10 m tall sinusoidal elevation perturbation is prescribed along the channel (η a = −η 0 cos(2πx/L x ), η 0 = 10 m), leading to a nonlinear wave as the simulation progresses.
The simulation is run for two full wave periods, approximately 3831.31 s. To investigate tracer conservation and consistency properties two passive tracers are included: salinity is set to a constant 4.5 psu, while temperature varies between 5.0 and 15.0 • C along the channel (T = 5 sin(2πx/L x ) + 10 • C).
The domain was discretized with a split-quad mesh using 40 elements along the channel (1500 m edge length) and 4 vertical layers. The time step is ∆t = 95.78 s, chosen to meet the horizontal advection condition.
During the simulation the volume of the 3D domain was conserved to accuracy O(10 −15 ). The "2D volume", i.e. the integral of the elevation field, was conserved to accuracy O(10 −16 ). Salinity remained at constant 4.5 psu with a small O(10 −9 ) deviation. The total mass of salinity and temperature were both conserved to accuracy O(10 −12 ). Over-and undershoots in the temperature field were negligible due to the slope limiters. Without the limiter temperature overshoots were O(10 −2 ). These results show that the model indeed fully conserves volume and tracers and does not exhibit overshoots. Moreover, the tracer consistency property is satisfied, verifying the integrity of the ALE formulation. To investigate the order of convergence of the solver, we used a smaller initial elevation perturbation η 0 = 1 cm. In this case the resulting standing wave is close to linear. At the end of the simulation the solution was compared to the analytical solution of the linear wave equation (which coincides with the initial condition) by computing the L 2 error, E(η) = ( Ω (η −η a ) 2 dx) 1/2 . We ran the simulation varying the horizontal mesh resolution between 3 km and 300 m; the number of vertical levels varied between 2 and 20. In each case the channel was made one element wide, and the time step was chosen to meet the CFL criterion for horizontal advection. At the end of the simulation the L 2 error was computed for water elevation and velocity (see Figure 1).
The velocity field shows the expected second-order convergence, whereas elevation converges at a rate of 3.2. It is known that P DG 1 shallow water equations models may exhibit superconvergence properties, especially for the elevation field (Bernard et al., 2008;Comblen et al., 2010b). Here our results verify that the solver behaves as expected and yields second-order accuracy under barotropic forcing.

Baroclinic MMS test
Verifying model accuracy under baroclinic forcing is more challenging as no analytical solutions are available. Here we use the method of manufactured solutions (MMS; Salari and Knupp, 2000) to construct a steady state test case that allows us to verify the correctness of the discrete baroclinic operators. The domain is a L x = 15 by L y = 10 km large and h = 40 m deep rectangular box. All lateral boundaries are closed. We prescribe initial velocity and temperature fields These functions were chosen to be analytic (infinitely differentiable) and fully three-dimensional to better quantify the spatial discretization error.
Salinity is set to a constant 35 psu. We use the linear equation of state (14) with ρ 0 = 1000 kg m −3 , α T = 0.2 kg m −3 • C −1 and T 0 = 5 • C −1 . For the sake of simplicity, bathymetry is constant and elevation is set to zero initially. Coriolis frequency was set to a constant f = 10 −4 s −1 . Bottom friction, viscosity, and diffusivity are omitted.
Without any additional forcing the initial conditions lead to a time-dependent solution. Following the MMS strategy, we add analytical source terms in the dynamic equations that cancel all the active terms in the equations, leading to a steady state solution. The remaining error is purely the discretization error of the advection, pressure gradient, and Coriolis operators. The source terms are derived analytically and projected to the corresponding function space. The analytical formulae are given in Appendix A.
The coarsest mesh contains 4 elements both in x and y directions and 2 vertical levels. We refine the mesh up to 10 times (40 elements and 20 vertical levels) and compute the L 2 error of the prognostic fields against the exact solutions. In each case, the model is run for 50 iterations with a time step chosen to meet the CFL condition.
The variation of the L 2 errors with resolution is shown in Figure 2. All the prognostic variables exhibit the correct secondorder convergence rate. The diagnostic vertical velocity field (which depends on the divergence of u) converges linearly as expected. Therefore we conclude that advection, pressure gradient, and Coriolis terms are discretized correctly. We have also developed similar MMS tests for the diffusivity/viscosity operators and the bottom friction term, all of which show secondorder convergence as well (not shown).
Here    Figure 3 shows the initial density field and solution after 17 h of simulation for the three cases. Higher background viscosity leads to a less noisy velocity field and therefore sharper density front. The sharpness and shape of the fronts are similar to results presented in the literature (e.g. Fig. 5 in Ilıcak et al., 2012). The low viscosity cases (Re h = 25, 250) exhibit an internal wave at the front which significantly increases the overall mixing.
Assuming that, in the absence of bottom friction, all available potential energy is transformed into kinetic energy, the maximum front propagation speed can be estimated as c = 1/2 gH∆ρ/ρ 0 (Benjamin, 1968;Jankowski, 1999).   To diagnose the role of spurious mixing we use the reference potential energy (RPE, Ilıcak et al. 2012;Petersen et al. 2015).
RPE is computed as the vertical center of mass of a sorted density field ρ * : RP E = g ρ * (z + h)dx. The ρ * field is defined  as the unique, stratified density field where the densest water parcels are distributed over the bottom, and density increases monotonically over the water column. As such, ρ * is the steady-state density distribution, and RPE represents the portion of potential energy that cannot be transformed into kinetic energy. Mixing the two water masses increases RPE (the center of mass) and thus the amount of unavailable potential energy increases. Figure 4 (c) shows the evolution of normalized RPE, RP E(t) = (RP E(t) − RP E(0))/RP E(0) during the simulation. At t = 17 h the values are 0.612, 1.13, 2.35, 3.11 × 10 −5 for the four simulations. These results are in good agreement with those reported with MPAS-Ocean model (Petersen et al., 2015): With the same mesh resolution, MPAS-Ocean shows slightly larger normalized RPE, for example, at t = 17 h RP E ≈ 3.5 × 10 −5 in the case of Re h = 25. The difference is likely due to the different spatial discretization (P DG 1 instead of finite volumes), or differences in the numerical viscosity operator. Applying slope limiters to the velocity field is not necessary for numerical stability, but it reduces high-frequency noise in the velocity field and hence results in lower RPE values.
In order to investigate the role of the Lax-Friedrichs flux on numerical mixing we ran the lock exchange test case with zero viscosity. After 17 h of simulation, the RPE value was approximately 3.2 × 10 −5 . When the Lax-Friedrichs flux was omitted, a similar RPE value was obtained with viscosity ν = 3.125 m 2 s −1 . Therefore, in this particular test case, the Lax-Friedrichs flux introduces mixing that is roughly equivalent to 3 m 2 s −1 viscosity, corresponding to Re h = 80. When viscosity was nonzero, it was evident from the numerical simulations that the Lax-Friedrichs flux has a negligible impact on numerical mixing if Re < 10 (not shown).

Baroclinic eddies
We investigate the model's ability to generate baroclinic eddies with the eddying channel test case of Ilıcak et al. (2012). This test case is an idealization of the Antarctic Circumpolar Current, the domain spanning 500 km and 160 km in the meridional and zonal directions, respectively. The domain is 1000 m deep. At the zonal boundaries, periodic boundary conditions are applied; northern and southern boundaries are closed. The Coriolis parameter is taken as a constant 1.2 × 10 −4 s −1 .
Initially, the domain is linearly stratified with warmer water at the surface. In addition, the northern half of the domain is warmer, with a narrow sinusoidal transition band separating the warm (northern) and cold (southern) water masses ( Figure 5; see Petersen et al. 2015 for the definition of the initial temperature field). Water temperature ranges between 10 and 20 • C. A linear equation of state is used with ρ 0 = 1000 kg m −3 , α T = 0.2 kg m −3 • C −1 and T 0 = 5 • C. Salinity is kept at constant 35 psu and it does not affect density (β S = 0). Bottom friction is parametrized by a constant drag coefficient C D = 0.01. The baroclinic Rossby radius of deformation is 20 km (Ilıcak et al., 2012). Horizontal mesh resolution is constant in space.
We use a regular split-quad mesh with two different mesh resolutions: eddy-permitting 10 km and a finer 4 km resolution.  In the vertical direction, 26 and 40 equidistant sigma levels are used in the two cases, respectively. Simulations are carried out with different values of horizontal viscosity, the grid Reynolds number ranging from 2 to 100. The different setups are summarized in Table 2. Vertical viscosity is set to a constant 10 −4 m 2 s −1 .
As the simulation progresses, baroclinic eddies develop at the center of the domain, quickly propagating elsewhere. This is a spin-down experiment, i.e. the domain is a closed system with no forcing at the boundaries. Therefore all the energy in the system originates from the initial potential energy, which is being dissipated during the simulation; again the RPE is used as a metric for the energy transfer, or, the loss of energy due to mixing.  The test cases were run on a Linux cluster with 16-core Intel Xeon E5620 processors and Mellanox Infiniband interconnect.
The 320-day simulation took roughly 42 hours to run on 96 cores with the 4 km resolution mesh and 140.26 s time step. It should be noted, however, that the time step employed here is smaller than the maximal allowed time step. We also carried out a strong scaling test with the 4 km mesh. In the scaling test, the simulation was run for 40 time steps, recording the total elapsed wall clock time and time spent in different parts of the solver. Figure 7 (a) shows the overall speedup up to 96 processors. The scaling efficiency drops to roughly 50% at 96 cores, when the local degree of freedom count for the tracer field is 25 000 (see black line Figure 7 b). This scaling efficiency is close to typical Firedrake performance .
The scaling efficiency of the separate solvers is plotted with colored lines in Figure 7 (b). The implicit vertical diffusion/viscosity solvers perform best due to the fact that the problem is purely local without any horizontal dependencies. The explicit momentum solver scales almost as well, whereas the explicit tracer solver scales poorer. The implicit 2D solver (assembly and linear solve) scales the poorest because the problem is relatively small; at 96 cores there are only around 940 degrees of freedom in the (ū, η) system per core. We have also experimented with explicit 2D solvers but they do not scale significantly better compared to the two-stage implicit scheme used herein.
To further assess the CPU cost, we compared Thetis timings against the SLIM 3D model (White et al., 2008a;Blaise et al., 2010;Comblen et al., 2010a;Kärnä et al., 2013) which uses a similar DG formulation but is implemented in C/C++. The wall clock time, and parallel efficiency used by both Thetis and SLIM 3D are presented in Appendix B. The setup, mesh, and time step were identical for the two models. On a single core, Thetis runs 3.3× faster. On 24 cores the ratio is 4.0×, and on 144 cores Thetis is still 2.2× faster than SLIM 3D. This highlights the fact that Firedrake can deliver good parallel performance compared to models written in lower level languages.
It should be noted, however, that Thetis performance is currently not fully optimized. We expect that the timings can be significantly improved both in terms of serial and strong scaling performance. These will be addressed in future work.

DOME
Next we investigate the model's ability to simulate density driven overflows with the Dynamics of Overflow Mixing and Entrainment (DOME) benchmark (Ezer and Mellor, 2004;Legg et al., 2006;Wang et al., 2008b;Burchard and Rennau, 2008).
The domain is a 1100 km by 600 km large basin, whose depth varies linearly from 600 m at the northern boundary to 3600 m in the middle of the domain (see Figure 8). To avoid boundary condition issues we have extended the domain to the west by 120 km. At the northern boundary, there is a 100 km wide and 200 km long inlet. Initially, the basin is stably stratified with a linear temperature variation from 10 • C in the deepest part of the basin to 20 • C at the surface. We use the linear equation of state with ρ 0 = 1000 kg m −3 , α T = 0.2 kg m −3 • C −1 and T 0 = 10 • C resulting in a ∆ρ = 2.0 kg m −3 density difference.
At the inlet, a dense inflow (temperature 10 • C) is prescribed in the bottom layer, with the surface layer being at 20 • C.
The inflow is in geostrophic balance, the thickness of the bottom layer being roughly 300 m in the eastern end of the boundary diminishing exponentially westward (Legg et al., 2006). The total inflow in the bottom layer is 5 Sv (5 × 10 6 m 3 s −1 ), the surface layer being static. During the simulation, the fate of the inflowing waters is tracked with a passive tracer that is initially zero in the basin and unity at the inlet. Initially, the tracer field is set to the inflow conditions in the northern part of the basin The domain is discretized with an unstructured grid (Figure 8). Horizontal mesh resolution is 6 km near the northern boundary, increasing southward. 24 vertical sigma levels are used. Over the slope, the mesh resolution was designed to result in a hydrostatic consistency metric r < 1.5 (Beckmann and Haidvogel, 1993). Horizontal viscosity is set to a constant 50 m 2 s −1 , which corresponds to Re h ≈ 200 at the inlet. Horizontal diffusivity is constant at 10 m 2 s −1 . Vertical viscosity and diffusivity are parametrized by the Pacanowski-Philander scheme as described in Section 2.4. Bottom friction is parametrized with a quadratic drag coefficient C d = 2 × 10 −3 (Legg et al., 2006;Wang et al., 2008b). A quadratic function space is used for the baroclinic head and internal pressure gradient as discussed in Section 3.7.
As the inflowing current reaches the basin, it turns to the west and forms a coastal plume that is approximately 150 km wide (Figure 9). The plume detaches from the lateral boundary as it flows westward and along the bottom slope. As the dense water mass meets the stratified ocean, the plume becomes unstable and starts to generate eddies and internal waves. The most vigorous eddies are found in the first 300 km after the inlet (x = 500-800 km), after which the plume is more mixed and quiescent. Overall the plume is shallow; most of the passive tracer is concentrated within 200 m of the bottom. Qualitatively, the extent and propagation of the plume, and its eddy structure are in good agreement with the literature (e.g., Burchard and Rennau, 2008;Wang et al., 2008b). The results show that Thetis is able to represent eddying flows over sloping bathymetry, generating and maintaining strong gradients between water masses. The sharpest fronts in the simulation encompass only one or two elements. Figure 10 shows the distribution of the inflowing tracer concentration as a function of water density and the x-axis. The inflowing waters are initially very dense but get mixed to lower density as the plume advances along the coast. The histogram shows that the plume volume is low in the first 150 km after the inlet (x = 650-800 km) where the plume accelerates. After The 47-day simulation took roughly 42 hours to run on 90 cores with a 39.65 s time step on the same Linux cluster.

Conclusions
This paper describes a DG implementation of an eddy-permitting, unstructured grid coastal ocean model. The solver is secondorder accurate in space and time. We have demonstrated that the formulation is fully conservative and preserves monotonicity.
The test cases indicate that the model is capable of reproducing the expected physical behavior, including baroclinic eddies.
Moreover, numerical mixing is well-controlled and comparable to other established structured grid models, such as MITgcm and ROMS, and the large-scale finite volume model MPAS-Ocean. Finding an accurate formulation is important as commonlyused unstructured grid models tend to be overly diffusive, preventing accurate modeling of certain coastal domains (e.g., Kärnä et al., 2015). The formulation presented herein thus contributes to the development of more accurate next-generation coastal ocean models. Future work will include solving the equations on a sphere, DG implementation of a biharmonic viscosity operator, twoequation turbulence closure models, wetting-drying treatment, development of an adjoint solver, as well as improving the computational efficiency and parallel scaling of the solver.

Code availability
All code used to perform the experiments in this papers is publicly available. Firedrake, and its components, may be obtained from www.firedrakeproject.org; Thetis from thetisproject.org.
For reproducibility, we also cite archives of the exact software versions used to produce the results in this paper. All major Firedrake components have been archived on Zenodo (zenodo/Firedrake, 2018). This record collates DOIs for the components, and can be installed following the instructions at www.firedrakeproject.org/download.html with firedrake-install -doi 10.5281/zenodo.1407898. Thetis itself has been archived at (zenodo/Thetis, 2018).

Data availability
No external data were used in this manuscript.

Appendix A: Source terms for the baroclinic MMS test
Using the analytical velocity and temperature fields we can derive the steady state solution for the remaining fields Now we can evaluate the different terms that appear in the momentum and tracer equations: f (e z ∧ū) x = 2f 0 3 − cos 1 2 + 1 cos πy L y , f (e z ∧ū) y = f 0 6 sin (3) sin 2π ∇ h · (Hū) = πh 3L x L y 2L x − cos 1 2 + 1 sin πy L y + L y sin (3) cos 2π (∇ h · (uu)) y = − π 9L y sin 2 z 2h sin πy L y cos πy L y , f (e z ∧ u ) x = f 0 3 − sin z 2h − 2 + 2 cos 1 2 cos πy L y , f (e z ∧ u ) y = f 0 6 3 cos 3z h − sin (3) ∇ h · (uT ) = 5π L x L y L x sin z 2h cos 2 πy L y + 3L y sin πy L y cos 3z h These terms are added as source terms to the right hand side of the equations (9), (10), (11), and (6). In the weak form this corresponds to multiplying the analytical function by the test function and integrating over the domain. The solutions were derived using the SymPy symbolic mathematics Python library (Meurer et al., 2017).

Appendix B: CPU cost comparison against SLIM
A strong scaling test was carried out with both Thetis and SLIM 3D model (White et al., 2008a;Blaise et al., 2010;Comblen et al., 2010a;Kärnä et al., 2013) using the baroclinic eddies test case. These tests were carried out on a Linux cluster with 16core Intel Xeon E5620 processors and Mellanox Infiniband interconnect. The total time spent to run 40 time steps is presented in