Abstract

GMD

Geoscientific Model Development

GMD

Geosci. Model Dev.

1991-9603

Copernicus Publications

Göttingen, Germany

10.5194/gmd-13-735-2020

Slate: extending Firedrake's domain-specific abstraction to hybridized solvers for geoscience and beyond

Hybridization and static condensation methods for GFD

Gibson

Thomas H.

t.gibson15@imperial.ac.uk Mitchell

Lawrence

Ham

David A.

https://orcid.org/0000-0001-9545-9110

Cotter

Colin J.

1Department of Mathematics, Imperial College London, London, SW7 2AZ, UK 2Department of Computer Science, Durham University, Durham, DH1 3LE, UK

Thomas H. Gibson (t.gibson15@imperial.ac.uk)

25February2020

13 2 735761 1April2019 26April2019 20November2019 25November2019

2020

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/

This article is available from https://gmd.copernicus.org/articles/13/735/2020/gmd-13-735-2020.html

The full text article is available as a PDF file from https://gmd.copernicus.org/articles/13/735/2020/gmd-13-735-2020.pdf

Abstract

Within the finite element community, discontinuous Galerkin (DG) and mixed finite element methods have become increasingly popular in simulating geophysical flows. However, robust and efficient solvers for the resulting saddle point and elliptic systems arising from these discretizations continue to be an ongoing challenge. One possible approach for addressing this issue is to employ a method known as hybridization, where the discrete equations are transformed such that classic static condensation and local post-processing methods can be employed. However, it is challenging to implement hybridization as performant parallel code within complex models whilst maintaining a separation of concerns between applications scientists and software experts. In this paper, we introduce a domain-specific abstraction within the Firedrake finite element library that permits the rapid execution of these hybridization techniques within a code-generating framework. The resulting framework composes naturally with Firedrake's solver environment, allowing for the implementation of hybridization and static condensation as runtime-configurable preconditioners via the Python interface to the Portable, Extensible Toolkit for Scientific Computation (PETSc), petsc4py. We provide examples derived from second-order elliptic problems and geophysical fluid dynamics. In addition, we demonstrate that hybridization shows great promise for improving the performance of solvers for mixed finite element discretizations of equations related to large-scale geophysical flows.

1Introduction

The development of simulation software is an increasingly important aspect of modern scientific computing, in the geosciences in particular. Such software requires a vast range of knowledge spanning several disciplines, ranging from applications expertise to mathematical analysis to high-performance computing and low-level code optimization. Software projects developing automatic code generation systems have become quite popular in recent years, as such systems help create a separation of concerns which focuses on a particular complexity independent from the rest. This allows for agile collaboration between computer scientists with hardware and software expertise, computational scientists with numerical algorithm expertise, and domain scientists such as meteorologists, oceanographers and climate scientists. Examples of such projects in the domain of finite element methods include FreeFEM++ , Sundance , the FEniCS Project , Feel++ , and Firedrake .

The finite element method (FEM) is a mathematically robust framework for computing numerical solutions of partial differential equations (PDEs) that has become increasingly popular in fluids and solids models across the geosciences, with a formulation that is highly amenable to code generation techniques. A description of the weak formulation of the PDEs, together with appropriate discrete function spaces, is enough to characterize the finite element problem. Both the FEniCS and Firedrake projects employ the Unified Form Language (UFL) to specify the finite element integral forms and discrete spaces necessary to properly define the finite element problem. UFL is a highly expressive domain-specific language (DSL) embedded in Python, which provides the necessary abstractions for code generation systems.

There are classes of finite element discretizations resulting in discrete systems that can be solved more efficiently by directly manipulating local tensors. For example, the static condensation technique for the reduction of global finite element systems produces smaller globally coupled linear systems by eliminating interior unknowns to arrive at an equation for the degrees of freedom defined on cell interfaces only. This procedure is analogous to the point-wise elimination of variables used in staggered finite difference codes, such as the ENDGame dynamical core of the UK Meteorological Office (Met Office) but requires the local inversion of finite element systems. For finite element discretizations of coupled PDEs, the hybridization technique provides a mechanism for enabling the static condensation of more complex linear systems. First introduced by and analyzed further by , , and , the hybridization method introduces Lagrange multipliers enforcing certain continuity constraints. Local static condensation can then be applied to the augmented system to produce a reduced equation for the multipliers. Methods of this type are often accompanied by local post-processing techniques, which exploit the approximation properties of the Lagrange multipliers. This enables the manufacturing of fields exhibiting superconvergent phenomena or enhanced conservation properties . These procedures require invasive manual intervention during the equation assembly process in intricate numerical code.

In this paper, we provide a simple yet effective high-level abstraction for localized dense linear algebra on systems derived from finite element problems. Using embedded DSL technology, we provide a means to enable the rapid development of hybridization and static condensation techniques within an automatic code generation framework. In other words, the main contribution of this paper is in solving the problem of automatically translating from the mathematics of static condensation and hybridization to compiled code. This automated translation facilitates the separation of concerns between applications scientists and computational/computer scientists and facilitates the automated optimization of compiled code. This framework provides an environment for the development and testing of numerics relevant to the Gung-Ho Project, an initiative by the UK Met Office in designing the next-generation atmospheric dynamical core using mixed finite element methods . Our work is implemented in the Firedrake finite element library and the PETSc solver library , accessed via the Python interface petsc4py .

The rest of the paper is organized as follows. We introduce common notation used throughout the paper in Sect. . The embedded DSL, called “Slate”, is introduced in Sect. , which allows concise expression of localized linear algebra operations on finite element tensors. We provide some contextual examples for static condensation and hybridization in Sect. , including a discussion on post-processing. We then outline in Sect. how, by interpreting static condensation techniques as a preconditioner, we can go further and automate many of the symbolic manipulations necessary for hybridization and static condensation. We first demonstrate our implementation on a manufactured problem derived from a second-order elliptic equation, starting in Sect. . The first example compares a hybridizable discontinuous Galerkin (HDG) method with an optimized continuous Galerkin method. Section illustrates the composability and relative performance of hybridization for compatible mixed methods applied to a semi-implicit discretization of the nonlinear rotating shallow water equations. Our final example in Sect. demonstrates time-step robustness of a hybridizable solver for a compatible finite element discretization of a rotating linear Boussinesq model. Conclusions follow in Sect. .

1.1Notation

We begin by establishing notation used throughout this paper. Let Th denote a tessellation of Ω⊂Rn, the computational domain, consisting of polygonal elements K associated with a mesh size parameter h, and ∂Th={e∈∂K:K∈Th} the set of facets of Th. The set of facets interior to the domain Ω is denoted by Eh∘:=∂Th∖∂Ω. Similarly, we denote the set of exterior facets as Eh∂:=∂Th∩∂Ω. For brevity, we denote the finite element integral forms over Th and any facet set Γ⊂∂Th by 1u,vK=∫Ku⋅vdx,u,ve=∫eu⋅vds,2u,vTh=∑K∈Thu,vK,u,vΓ=∑e∈Γu,ve, where dx and ds denote appropriate integration measures. The operation ⋅ should be interpreted as standard multiplication for scalar functions or a dot product for vector functions.

For any double-valued vector field w on a facet e∈∂Th, we define the jump of its normal component across e by 3we=w|e+⋅ne++w|e-⋅ne-,e∈Eh∘w|e⋅ne,e∈Eh∂, where + and - denote arbitrarily but globally defined sides of the facet. Here, ne+ and ne- are the unit normal vectors with respect to the positive and negative sides of the facet e. Whenever the facet domain is clear by the context, we omit the subscripts for brevity and simply write ⋅.

2A system for localized algebra on finite element tensors

We present an expressive language for dense linear algebra on the elemental matrix systems arising from finite element problems. The language, which we call Slate, provides typical mathematical operations performed on matrices and vectors; hence the input syntax is comparable to high-level linear algebra software such as MATLAB. The Slate language provides basic abstract building blocks which can be used by a specialized compiler for linear algebra to generate low-level code implementations.

Slate is heavily influenced by the Unified Form Language (UFL) , a DSL embedded in Python which provides symbolic representations of finite element forms. The expressions can be compiled by a form compiler, which translates UFL into low-level code for the local assembly of a form over the cells and facets of a mesh. In a similar manner, Slate expressions are compiled to low-level code that performs the requested linear algebra element-wise on a mesh.

2.1An overview of Slate

To clarify conventions and the scope of Slate, we start by establishing our notation for a general finite element form following the convention of . We define a real-valued multi-linear form as an operator which maps a list of arguments v=(v0,⋯,vα-1)∈V0×⋯×Vα-1 into R: 4a:V0×⋯×Vα-1→R,a↦a(v0,⋯,vα-1)=a(v), where a is linear in each argument vk. The arity of a form is α, an integer denoting the total number of form arguments. In traditional finite element nomenclature (for α≤2), V0 is referred to as the space of test functions and V1 as the space of trial functions. Each Vk are referred to as argument spaces. Forms with arity α=0,1, or 2 are best interpreted as the more familiar mathematical objects: scalars (0-forms), linear forms or functionals (1-forms), and bilinear forms (2-forms), respectively.

If a given form a is parameterized by one or more coefficients, say c=(c0,⋯,cq)∈C0×⋯×Cq, where {Ck}k=0q are coefficient spaces, then we write 5a:C0×⋯×Cq×V0×⋯×Vα-1→R,a↦a(c0,⋯,cq;v0,⋯,vα-1)=a(c;v). From here on, we shall work exclusively with forms that are linear in v and possibly nonlinear in the coefficients c. This is reasonable since nonlinear methods based on Newton iterations produce linear problems via Gâteaux differentiation of a nonlinear form corresponding to a PDE (also known as the form Jacobian). We refer the interested reader to Sect. 2.1.2 for more details. For clarity, we present examples of multi-linear forms of arity α=0,1, and 2 that frequently appear in finite element discretizations: 6a(κ;v,u):=∇v,κ∇uTh≡∑K∈Th∫K∇v⋅κ∇udx,κ∈C0,u∈V1,v∈V0,α=2,q=1, 7a(f;v):=v,fTh≡∑K∈Th∫Kvfdx,f∈C0,v∈V0,α=1,q=1, 8a(f,g;):=f-g,f-gTh≡∑K∈Th∫K|f-g|2dx,g∈C1,f∈C0,α=0,q=2, 9a(γ,σ):=γ,σ∂Th≡∑e∈Eh∘∫eγσds+∑e∈Eh∂∫eγσ⋅nds,σ∈V1,γ∈V0,α=2,q=0.

In general, a finite element form will consist of integrals over various geometric domains: integration over cells Th, interior facets Eh∘, and exterior facets Eh∂. Therefore, we express a general multi-linear form in terms of integrals over each set of geometric entities: 10a(c;v)=∑K∈Th∫KIKT(c;v)dx+∑e∈Eh∘∫eIeE,∘(c;v)ds+∑e∈Eh∂∫eIeE,∂(c;v)ds, where IKT denotes a cell integrand on K∈Th, IeE,∘ is an integrand on the interior facet e∈Eh∘, and IeE,∂ is an integrand defined on the exterior facet e∈Eh∂. The form a(c;v) describes a finite element form globally over the entire problem domain.

Here, we will consider the case where the interior facet integrands IeE,∘ (c;v) can be decomposed into two independent parts on each interior facet e: one for the positive restriction (+) and the negative restriction (-). That is, for each e∈Eh∘, we may write IeE,∘(c;v)=Ie+E,∘(c;v)+Ie-E,∘(c;v). This allows us to express the integral over an interior facet e connecting two adjacent elements, say K+ and K-, as the sum of integrals: 11∫e⊂∂K+∪∂K-IeE,∘(c;v)ds=∫e⊂∂K+Ie+E,∘(c;v)ds+∫e⊂∂K-Ie-E,∘(c;v)ds. The local contribution of Eq. () in each cell K, along with its associated facets e⊂∂K, is then 12aK(c;v)=∫KIKT(c;v)dx+∑e⊂∂K∂Ω∫eIeE,∘(c;v)ds+∑e⊂∂K∩∂Ω∫eIeE,∂(c;v)ds. We call Eq. () the cell-local contribution of a(c;v), with 13a(c;v)=∑K∈ThaK(c;v).

To make matters concrete, let us suppose a(c;v) is a bilinear form with arguments v=(v0,v1)∈V0×V1. Now let {Φi}i=1N and {Ψi}i=1M denote bases for V0 and V1, respectively. Then the global N×M matrix A corresponding to a(c;v0,v1) has its entries defined via 14Aij=ac;Φi,Ψj=∑K∈ThAK,ij,AK,ij=aKc;Φi,Ψj. By construction, AK,ij≠0 if and only if Φi and Ψj take non-zero values in K. Now we introduce the cell-node map i=e(K,i^) as the mapping from the local node number i^ in K to the global node number i. Suppose there are n and m nodes defining the degrees of freedom for V0 and V1, respectively, in K. Then all non-zero entries of AK,ij arise from integrals involving basis functions with local indices corresponding to the global indices i,j: 15Ai^j^K:=aKc;Φe(K,i^),Ψe(K,j^),i^∈1,⋯,n,j^∈1,⋯,m. These local contributions are collected in the n×m dense matrix AK, which we call the element tensor. The global matrix A is assembled from the collection of element tensors: A←{AK}K∈Th. For details on the general evaluation of finite element basis functions and multi-linear forms, we refer the reader to , , , and . Further details on the global assembly of finite element operators, with a particular focus on code generation, are summarized in the work of and .

In standard finite element software packages, the element tensor is mapped entry-wise into a global sparse array using the cell-node map e(K,⋅). Within Firedrake, this operation is handled by PyOP2 and serves as the main user-facing abstraction for global finite element assembly. For many applications, one may want to produce a new global operator by algebraically manipulating different element tensors. This is relatively invasive in numerical code, as it requires bypassing direct operator assembly to produce the new tensor. This is precisely the scope of Slate.

Like UFL, Slate relies on the grammar of the host-language: Python. The entire Slate language is implemented as a Python module which defines its types (classes) and operations on said types. Together, this forms a high-level language for expressing dense linear algebra on element tensors. The Slate language consists of two primary abstractions for linear algebra:

terminal element tensors corresponding to multi-linear integral forms (matrices, vectors, and scalars) or assembled data (for example, coefficient vectors of a finite element function) and

expressions consisting of algebraic operations on terminal tensors.

The composition of binary and unary operations on terminal tensors produces a Slate expression. Such expressions can be composed with other Slate objects in arbitrary ways, resulting in concise representations of complex algebraic operations on locally assembled arrays. We summarize all currently supported Slate abstractions here.

2.1.1Terminal tensors

In Slate, one associates a tensor with data on a cell either by using a multi-linear form, or assembled coefficient data:

Tensor(a(c;v))associates a form, expressed in UFL, with its local element tensor16AK←aK(c;v),for allK∈Th.The form arity α of aK(c;v) determines the rank of the corresponding Tensor; i.e., scalars, vectors, and matrices are produced from scalars, linear forms, and bilinear forms, respectively.

Similarly to UFL, Slate is capable of abstractly representing arbitrary rank tensors. However, only rank ≤2 tensors are typically used in most finite element applications, and therefore we currently only generate code for those ranks.

The shape of the element tensor is determined by both the number of arguments and total number of degrees of freedom local to the cell.

AssembledVector(f),where f is some finite element function. The function f∈V is expressed in terms of the finite element basis of V: f(x)=∑i=1NfiΦi(x). The result is the local coefficient vector of f on K:17FK←fe(K,i^)i^=1n,where e(K,i^) is the local node numbering and n is the number of nodes local to the cell K.

2.1.2Symbolic linear algebra

Slate supports typical binary and unary operations in linear algebra, with a high-level syntax close to mathematics. At the time of this paper, these include the following.

A + B, the addition of two equal shaped tensors: AK+BK.

A ∗ B, a contraction over the last index of A and the first index of B. This is the usual multiplicative operation on matrices, vectors, and scalars: AKBK.

-A, the additive inverse (negation) of a tensor: -AK.

A.T, the transpose of a tensor: AKT.

A.inv, the inverse of a square tensor: AK-1.

A.solve(B, decomposition=``...''), the result, XK, of solving a local linear system AKXK=BK, optionally specifying a factorization strategy.

A.blocks[indices], where A is a tensor from a mixed finite element space. This allows for the extraction of sub-blocks, which are indexed by field (slices are allowed). For example, if a matrix A corresponds to the bilinear form a:V×W→R, where V=V0×⋯×Vn and W=W0×⋯×Wm are product spaces consisting of finite element spaces {Vi}i=0n, {Wi}i=0m, then the element tensors have the form18AK=A00KA01K⋯A0mKA10KA11K⋯A1mK⋮⋮⋱⋮An0KAn1K⋯AnmK.The associated submatrix of Eq. () with indices i=(p,q), p={p1,⋯,pr}, q={q1,⋯,qc}, is19ApqK=Ap1q1K⋯Ap1qcK⋮⋱⋮Aprq1K⋯AprqcK=AK.blocks[p,q],where p⊆{0,⋯,n}, q⊆{0,⋯,m}.

Each Tensor object knows all the information about the underlying UFL form that defines it, such as form arguments, coefficients, and the underlying finite element space(s) it operates on. This information is propagated through as unary or binary transformations are applied. The unary and binary operations shown here provide the necessary algebraic framework for a large class of problems, some of which we present in this paper.

In Firedrake, Slate expressions are transformed into low-level code by a linear algebra compiler. The compiler interprets Slate expressions as a syntax tree, where the tree is visited to identify what local arrays need to be assembled and the sequence of array operations. At the time of this work, our compiler generates C++ code, using the templated library Eigen for dense linear algebra. The translation from Slate to C++ is fairly straightforward, as all operations supported by Slate have a representation in Eigen.

The compiler pass will generate a single “macro” kernel, which performs the dense linear algebra operations represented in Slate. The resulting code will also include (often multiple) function calls to local assembly kernels generated by TSFC (Two Stage Form Compiler) to assemble all necessary sub-blocks of an element tensor. All code generated by the linear algebra compiler conforms to the application programming interface (API) of the PyOP2 framework, as detailed by Sect. 3. Figure provides an illustration of the complete tool chain.

Figure 1

The Slate language wraps UFL objects describing the finite element system. The resulting Slate expressions are passed to a specialized linear algebra compiler, which produces a single macro kernel assembling the local contributions and executes the dense linear algebra represented in Slate. The kernels are passed to the Firedrake's PyOP2 interface, which wraps the Slate kernel in a mesh-iteration kernel. Parallel scheduling, code generation, and compilation occurs after the PyOP2 layer.

Most optimization of the resulting dense linear algebra code is handled directly by Eigen. In the case of unary and binary operations such as A.inv and A.solve(B), stable default behaviors are applied by the linear algebra compiler. For example, A.solve(B) without a specified factorization strategy will default to using an in-place LU factorization with partial pivoting. For local matrices smaller than 5×5, the inverse is translated directly into Eigen's A.inverse(), which employs stable analytic formulas. For larger matrices, the linear algebra replaces A.inv with an LU factorization.

For more details on solving linear equations in Eigen, see https://eigen.tuxfamily.org/dox/group__TutorialLinearAlgebra.html (last access: 3 January 2020).

Currently, we only support direct matrix factorizations for solving local linear systems. However, it would not be difficult to extend Slate to support more general solution techniques like iterative methods.

3Examples

We now present examples and discuss solution methods which require element-wise manipulations of finite element systems and their specification in Slate. We stress here that Slate is not limited to these model problems; rather these examples are chosen for clarity and to demonstrate key features of the Slate language. For our discussion, we use a model elliptic equation defined in a computational domain Ω. Consider the second-order PDE with both Dirichlet and Neumann boundary conditions: 20-∇⋅κ∇p+cp=f,inΩ,21p=p0,on∂ΩD,22-κ∇p⋅n=g,on∂ΩN, where ∂ΩD∪∂ΩN=∂Ω and κ, c:Ω→R+ are positive-valued coefficients. To obtain a mixed formulation of Eqs. ()–(), we introduce the auxiliary velocity variable u=-κ∇p. We then obtain the first-order system of PDEs: 23μu+∇p=0,inΩ,24∇⋅u+cp=f,inΩ,25p=p0,on∂ΩD,26u⋅n=g,on∂ΩN, where μ=κ-1.

3.1Hybridization of mixed methods

To motivate our discussion in this section, we start by recalling the mixed method for Eqs. ()–(). Methods of this type seek approximations (uh,ph) in finite-dimensional subspaces Uh×Vh⊂H(div;Ω)×L2(Ω), defined by 27Uh=w∈H(div;Ω):w|K∈U(K),∀K∈Th,w⋅n=g on ∂ΩN, 28Vh=ϕ∈L2(Ω):ϕ|K∈V(K),∀K∈Th. The space Uh consists of H(div)-conforming piecewise vector polynomials, where choices of U(K) typically include the Raviart–Thomas (RT), Brezzi–Douglas–Marini (BDM), or Brezzi–Douglas–Fortin–Marini (BDFM) elements . The space Vh is the Lagrange family of discontinuous polynomials. These spaces are of particular interest when simulating geophysical flows, since choosing the right pairing results in stable discretizations with desirable conservation properties and avoids spurious computational modes. We refer the reader to , , , and for a discussion of mixed methods relevant for geophysical fluid dynamics. Two examples of such discretizations are presented in Sect. .

The mixed formulation of Eqs. ()–() is arrived at by multiplying Eqs. ()–() by test functions and integrating by parts. The resulting finite element problem reads as follows: find (uh,ph)∈Uh×Vh satisfying 29w,μuhTh-∇⋅w,phTh=-w⋅n,p0∂ΩD,∀w∈Uh,0, 30ϕ,∇⋅uhTh+ϕ,cphTh=ϕ,fTh,∀ϕ∈Vh, where Uh,0 is the subspace of Uh with functions whose normal components vanish on ∂ΩN. The discrete system is obtained by first expanding the solutions in terms of the finite element bases: 31uh=∑i=1NuUiΨi,ph=∑i=1NpPiξi, where {Ψi}i=1Nu and {ξi}i=1Np are bases for Uh and Vh, respectively. Here, Ui and Pi are the coefficients to be determined. As per standard Galerkin-based finite element methods, taking w=Ψj, j∈{1,⋯,Nu} and ϕ=ξj, j∈{1,⋯,Np} in Eqs. ()–() produces the following discrete saddle point system: 32A-BTBDUP=F0F1. where U={Ui}i=1Nu, P={Pi}i=1Np are the coefficient vectors, and 33Aij=Ψi,μΨjTh,34Bij=ξi,∇⋅ΨjTh,35Dij=ξi,cξjTh,36F0,j=-Ψj⋅n,p0∂ΩD,37F1,j=ξj,fTh.

Methods to efficiently invert such systems include H(div)-multigrid (requiring complex overlapping Schwarz smoothers), global Schur complement factorizations (which require an approximation to the inverse of the dense

The Schur complement, while elliptic, is globally dense due to the fact that A has a dense inverse. This is a result of velocities in Uh having continuous normal components across cell interfaces.

elliptic Schur complement D+BA-1BT), or auxiliary space multigrid . Here, we focus on a solution approach using a hybridized mixed method .

The hybridization technique replaces the original system with a discontinuous variant, decoupling the velocity degrees of freedom between cells. This is done by replacing the discrete solution space for uh with the “broken” space Uhd, defined as 38Uhd=w∈L2(Ω)n:w|K∈U(K),∀K∈Th. The vector finite element space Uhd is a subspace of L2(Ω)n consisting of local H(div) functions, but normal components are no longer required to be continuous on ∂Th. The approximation space for ph remains unchanged.

Next, Lagrange multipliers are introduced as an auxiliary variable in the space Mh, defined only on cell interfaces: 39Mh={γ∈L2(∂Th):γ|e∈M(e),∀e∈∂Th}, where M(e) denotes a polynomial space defined on each facet. We call Mh the space of approximate traces. Functions in Mh are discontinuous across vertices in two dimensions and vertices or edges in three dimensions.

Deriving the hybridizable mixed system is accomplished through integration by parts over each element K. Testing with w∈Uhd(K) and integrating Eq. () over the cell K produces 40w,μuhdK-∇⋅w,phK+w⋅n,λh∂K=-w⋅n,p0∂K∩∂ΩD. The trace function λh is introduced in the surface integral as an approximation to p|∂K. An additional constraint equation, called the transmission condition, is added to close the system. The resulting hybridizable formulation reads as follows: find (uhd,ph,λh)∈Uhd×Vh×Mh such that 41w,μuhdTh-∇⋅w,phTh+w,λh∂Th∖∂ΩD=-w⋅n,p0∂ΩD,∀w∈Uhd, 42ϕ,∇⋅uhdTh+ϕ,cphTh=ϕ,fTh,∀ϕ∈Vh, 43γ,uhd∂Th∖∂ΩD=γ,g∂ΩN,∀γ∈Mh,0, where Mh,0 denotes the space of traces vanishing on ∂ΩD. The transmission condition Eq. () enforces both the continuity of uhd⋅n across element boundaries as well as the boundary condition uhd⋅n=g on ∂ΩN. If the space of Lagrange multipliers Mh is chosen appropriately, then the broken velocity uhd, albeit sought a priori in a discontinuous space, will coincide with its H(div)-conforming counterpart. Specifically, the formulations in Eqs. ()–() and ()–() are solving equivalent problems if the normal components of w∈Uh lie in the same polynomial space as the trace functions .

The discrete matrix system arising from Eqs. ()–() has the general form 44A00A01A02A10A11A12A20A21A22UdPΛ=F0F1F2, where the discrete system is produced by expanding functions in terms of the finite element bases for Uhd, Vh, and Mh like before. Upon initial inspection, it may not appear to be advantageous to replace our original formulation with this augmented equation set; the hybridizable system has substantially more total degrees of freedom. However, Eq. () has a considerable advantage over Eq. () in the following ways.

Since both Uhd and Vh are discontinuous spaces, Ud and P are coupled only within the cell. This allows us to simultaneously eliminate both unknowns via local static condensation to produce a significantly smaller global (hybridized) problem for the trace unknowns, Λ:45SΛ=E,where S←{SK}K∈Th and E←{EK}K∈Th are assembled via the local element tensors:46SK=A22K-A20KA21KA00KA01KA10KA11K-1A02KA12K,

47EK=F2K-A20KA21KA00KA01KA10KA11K-1F0KF1K.Note that the inverse of the block matrix in Eqs. () and () is never evaluated globally; the elimination can be performed locally by performing a sequence of Schur complement reductions within each cell.

The matrix S is sparse, symmetric, positive-definite, and spectrally equivalent to the dense Schur complement D+BA-1BT from Eq. () of the original mixed formulation .

Once Λ is computed, both Ud and P can be recovered locally in each element. This can be accomplished in a number of ways. One way is to compute PK by solving48A11K-A10KA00K-1A01KPK=F1K-A10KA00K-1F0K-A12K-A10KA00K-1A02KΛK,followed by solving for UdK:49A00KUdK=F0K-A01KPK-A02KΛK.Similarly, one could rearrange the order in which each variable is reconstructed.

If desired, the solutions can be improved further through local post-processing. We highlight two such procedures for Ud and P, respectively, in Sect. .

Figure displays the corresponding Slate code for assembling the trace system, solving Eq. (), and recovering the eliminated unknowns. For a complete reference on how to formulate the hybridized mixed system Eqs. ()–() in UFL, we refer the reader to . Complete Firedrake code using Slate to solve a hybridizable mixed system is also publicly available in “Code verification”. We remark that, in the case of this hybridizable system, Eq. () contains zero-valued blocks which can simplify the resulting expressions in Eqs. ()–() and ()–(). This is not true in general and therefore the expanded form using all sub-blocks of Eq. () is presented for completeness.

Figure 2

Firedrake code for solving Eq. () via static condensation and local recovery, given UFL expressions a, L for Eqs. ()–(). Arguments of the mixed space Uhd×Vh×Mh are indexed by 0, 1, and 2, respectively. Lines 8 and 9 are symbolic expressions for Eqs. () and (), respectively. Any vanishing conditions on the trace variables should be provided as boundary conditions during operator assembly (line 10). Lines 26 and 28 are expressions for Eqs. () and () (using LU). Code generation occurs in lines 10, 11, 30, and 31. A global linear solver for the reduced system is created and used in line 15. Configuring the linear solver is done by providing an appropriate Python dictionary of solver options for the PETSc library.

3.2Hybridization of discontinuous Galerkin methods

The HDG method is a natural extension of discontinuous Galerkin (DG) discretizations. Here, we consider a specific HDG discretization, namely the LDG-H method . Other forms of HDG that involve local lifting operators can also be implemented in this software framework by the introduction of additional local (i.e., discontinuous) variables into the definition of the local solver.

Deriving the LDG-H formulation follows exactly from standard DG methods. All prognostic variables are sought in the discontinuous spaces Uh×Vh⊂L2(Ω)n×L2(Ω). Within a cell K, integration by parts yields 50w,μuhK-∇⋅w,phK+w⋅n,p^∂K=0,∀w∈U(K), 51-∇ϕ,uhK+ϕ,u^⋅n∂K+ϕ,cphK=ϕ,fK,∀ϕ∈V(K), where U(K) and V(K) are vector and scalar polynomial spaces, respectively. Now, we define the numerical fluxes p^ and u^ to be functions of the trial unknowns and a new independent unknown in the trace space Mh: 52u^(uh,ph,λh;τ)=uh+τph-p^n,53p^(λh)=λh, where λh∈Mh is a function approximating p on ∂Th and τ is a positive stabilization function that may vary on each facet e∈∂Th. We further require that λh satisfies the Dirichlet condition for p on ∂ΩD in an L2-projection sense. The full LDG-H formulation reads as follows. Find (uh,ph,λh)∈Uh×Vh×Mh such that 54w,μuhTh-∇⋅w,phTh+w,λh∂Th=0,∀w∈Uh, 55-∇ϕ,uhTh+ϕ,uh+τph-λhn∂Th+ϕ,cphTh=ϕ,fTh,∀ϕ∈Vh, 56γ,uh+τph-λhn∂Th∖∂ΩD=γ,g∂ΩN,∀γ∈Mh, 57γ,λh∂ΩD=γ,p0∂ΩD,∀γ∈Mh. Equation () is the transmission condition, which enforces the continuity of u^⋅n on ∂Th and q. Equation () ensures λh satisfies the Dirichlet condition. This ensures that the numerical flux is single-valued on the facets. Hence, the LDG-H method defines a conservative DG method . Note that the choice of τ has a significant influence on the expected convergence rates of the computed solutions.

The LDG-H method retains the advantages of standard DG methods while also enabling the assembly of reduced linear systems through static condensation. The matrix system arising from Eqs. ()–() has the same general form as the hybridized mixed method in Eq. (), except all sub-blocks are now populated with non-zero entries due to the coupling of trace functions with both ph and uh. However, all previous properties of the discrete matrix system from Sect. still apply. The Slate expressions for the local elimination and reconstruction operations will be identical to those illustrated in Fig. . For the interested reader, a unified analysis of hybridization methods (both mixed and DG) for second-order elliptic equations is presented in and .

3.3Local post-processing

For both mixed and discontinuous Galerkin methods , it is possible to locally post-process solutions to obtain superconvergent approximations (gaining 1 order of accuracy over the unprocessed solution). These methods can be expressed as local solves on each element and are straightforward to implement using Slate. In this section, we present two post-processing techniques: one for scalar fields and another for the vector unknown. The Slate code follows naturally from previous discussions in Sect. and , using the standard set of operations on element tensors summarized in Sect. .

3.3.1Post-processing of the scalar solution

Our first example is a modified version of the procedure presented by for enhancing the accuracy of the scalar solution. This was also highlighted within the context of hybridizing eigenproblems by . This post-processing technique can be used for both the hybridizable mixed and LDG-H methods. We proceed by posing the finite element systems cell-wise.

Let Pk(K) denote a polynomial space of degree ≤k on a cell K∈Th. Then for a given pair of computed solutions uh,ph of the hybridized methods, we define the post-processed scalar ph⋆∈Pk+1(K) as the unique solution of the local problem: 58∇w,∇ph⋆K=-∇w,κ-1uhK,∀w∈Pk+1⟂,l(K),59v,ph⋆K=v,phK,∀v∈Pl(K), where 0≤l≤k. Here, the space Pk+1⟂,l(K) denotes the L2-orthogonal complement of Pl(K). This post-processing method directly uses the definition of the flux uh, the approximation of -κ∇p. In practice, the space Pk+1⟂,l(K) may be constructed using an orthogonal hierarchical basis, and solving Eqs. ()–() amounts to inverting a symmetric positive definite system in each cell of the mesh.

At the time of this work, Firedrake does not support the construction of such a finite element basis. However, we can introduce Lagrange multipliers to enforce the orthogonality constraint. The resulting local problem then becomes the following mixed system: find (ph⋆,ψ)∈Pk+1(K)×Pl(K) such that 60∇w,∇ph⋆K+w,ψK=-∇w,κ-1uhK,∀w∈Pk+1(K), 61ϕ,ph⋆K=ϕ,phK,∀ϕ∈Pl(K), where 0≤l≤k. The local problems Eqs. ()–() and Eqs. ()–() are equivalent, with the Lagrange multiplier ψ enforcing orthogonality of test functions in Pk+1(K) with functions in Pl(K).

This post-processing method produces a new approximation which superconverges at a rate of k+2 for hybridized mixed methods . For the LDG-H method, k+2 superconvergence is achieved when τ=O(1) and τ=O(h), but only k+1 convergence is achieved when τ=O(1/h) . We demonstrate the increased accuracy in computed solutions in Sect. . An abridged example using Firedrake and Slate to solve the local linear systems is provided in Fig. .

Figure 3

Example of local post-processing using Firedrake and Slate. Here, we locally solve the mixed system defined in Eqs. ()–(). The corresponding symbolic local tensors are defined in lines 9 and 11. The Slate expression for directly inverting the local system is written in line 12. In line 16, a Slate-generated kernel is produced which solves the resulting linear system in each cell. Since we are not interested in the multiplier, we only return the block corresponding to the new pressure field.

3.3.2Post-processing of the flux

Our second example illustrates a procedure that uses the numerical flux of an HDG discretization for Eqs. ()–(). Within the context of the LDG-H method, we can use the numerical trace in Eq. () to produce a vector field that is H(div)-conforming. The technique we outline here follows that of .

Let Th be a mesh consisting of simplices. On each cell K∈Th, we define a new function uh⋆ to be the unique element of the local Raviart–Thomas space Pk(K)n+xPk(K) satisfying 62r,uh⋆K=r,uhK,∀r∈Pk-1(K)n,63μ,uh⋆⋅ne=μ,u^⋅ne,∀μ∈Pk(e) for all facets e on ∂K, where u^ is the numerical flux defined in Eq. (). This local problem produces a new velocity uh⋆ with the following properties.

uh⋆ converges at the same rate as uh for all choices of τ producing a solvable system for Eqs. ()–(). However,

uh⋆∈H(div;Ω). That is, uh⋆e=0,∀e∈Eh∘.

Additionally, the divergence of uh⋆ convergences at a rate of k+1.

The Firedrake implementation using Slate is similar to the scalar post-processing example (see Fig. ); the cell-wise linear systems Eqs. ()–() can be expressed in UFL, and therefore the necessary Slate expressions to invert the local systems follows naturally from the set of operations presented in Sect. . We use the very sensitive parameter dependency in the post-processing methods to validate our software implementation in

“Code-verification”

4Static condensation as a preconditioner

Slate enables static condensation approaches to be expressed very concisely. Nonetheless, the application of a particular approach to different variational problems using Slate still requires a certain amount of code repetition. By formulating each form of static condensation as a preconditioner, code can be written once and then applied to any mathematically suitable problem. Rather than writing the static condensation by hand, in many cases, it is sufficient to just select the appropriate, Slate-based, preconditioner.

For context, it is helpful to frame the problem in the particular context of the solver library: PETSc. Firedrake uses PETSc as its main solver abstraction framework and can provide operator-based preconditioners for solving linear systems as PC objects expressed in Python via petsc4py . For a comprehensive overview on solving linear systems using PETSc, we refer the interested reader to Sect. 4.

Suppose we wish to solve a linear system: Ax=b. We can think of (left) preconditioning the system in residual form: 64r=r(A,b)≡b-Ax=0 by an operator P (which may not necessarily be linear) as a transformation into an equivalent system of the form 65Pr=Pb-PAx=0. Given a current iterate xi the residual at the ith iteration is simply ri≡b-Axi, and P acts on the residual to produce an approximation to the error ϵi≡x-xi. If P is an application of an exact inverse, the residual is converted into an exact (up to numerical roundoff) error.

We will denote the application of a particular Krylov subspace method (KSP) for the linear system Eq. () as Kx(r(A,b)). Upon preconditioning the system via P as in Eq. (), we write 66Kx(Pr(A,b)). If Eq. () is solved directly via P=A-1, then Pr(A,b)=A-1b-x. So Eq. () then becomes Kx(r(I,A-1b)), producing the exact solution of Eq. () in a single iteration of K. Having established notation, we now present our implementation of static condensation via Slate by defining the appropriate operator, P.

4.1Interfacing with PETSc via custom preconditioners

The implementation of preconditioners for the systems considered in this paper requires the manipulation not of assembled matrices but rather their symbolic representation. To do this, we use the preconditioning infrastructure developed by , which gives preconditioners written in Python access to the symbolic problem description. In Firedrake, this means all derived preconditioners have direct access to the UFL representation of the PDE system. From this mathematical specification, we manipulate this appropriately via Slate and provide operators assembled from Slate expressions to PETSc for further algebraic preconditioning. Using this approach, we have developed a static condensation interface for the hybridization of H(div)×L2 mixed problems and a generic interface for statically condensing finite element systems. The advantage of writing even the latter as a preconditioner is the ability to switch out the solution scheme for the system, even when nested inside a larger set of coupled equations or nonlinear solver (Newton-based methods) at runtime.

4.1.1A static condensation interface for hybridization

As discussed in Sect. and , one of the main advantages of using a hybridizable variant of a DG or mixed method is that such systems permit the use of cell-wise static condensation. To facilitate this, we provide a PETSc PC static condensation interface: firedrake.SCPC. This preconditioner takes the discretized system as in Eq. () and performs the local elimination and recovery procedures. Slate expressions are generated from the underlying UFL problem description.

More precisely, the incoming system has the form 67Ae,eAe,cAc,eAc,cXeXc=ReRc, where Xe is the vector of unknowns to be eliminated, Xc is the vector of unknowns for the condensed field, and Re and Rc are the incoming right-hand sides. The partitioning in Eq. () is determined by the solver option: pc_sc_eliminate_fields. Field indices are provided in the same way one configures solver options to PETSc. These indices determine which field(s) to statically condense into. For example, on a three-field problem (with indices 0, 1, and 2), setting -pc_sc_eliminate_fields 0,1 will configure firedrake.SCPC to cell-wise eliminate fields 0 and 1; the resulting condensed system is associated with field 2.

The firedrake.SCPC preconditioner can be interpreted as a Schur complement method for Eq. () of the form 68P=I-Ae,e-1Ae,c0IAe,e-100S-1I0-Ac,eAe,e-1I, where S=Ac,c-Ac,eAe,e-1Ae,c is the Schur complement operator for the Xc system. The distinction here from block preconditioners via the PETSc fieldsplit option , for example, is that P does not require global actions; by design Ae,e-1 can be inverted locally and S is sparse. As a result, S can be assembled or applied exactly, up to numerical roundoff, via Slate-generated kernels.

In practice, the only globally coupled system requiring iterative inversion is S: 69KXcP1rS,Rs, where Rs=Rc-Ac,eAe,e-1Re is the condensed right-hand side and P1 is a preconditioner for S. Once Xc is computed, Xe is reconstructed by inverting the system Xe=Ae,e-1Rc-Ae,cXc cell-wise.

By construction, this preconditioner is suitable for both hybridized mixed and HDG discretizations. It can also be used within other contexts, such as the static condensation of continuous Galerkin discretizations or primal-hybrid methods . As with any PETSc preconditioner, solver options can be specified for inverting S via the appropriate options prefix (condensed_field). The resulting KSP for Eq. () is compatible with existing solvers and external packages provided through the PETSc library. This allows users to experiment with a direct method and then switch to a more parallel-efficient iterative solver without changing the core application code.

4.1.2Preconditioning mixed methods via hybridization

The preconditioner firedrake.HybridizationPC expands on the previous one, this time taking an H(div)×L2 system and automatically forming the hybridizable problem. This is accomplished through manipulating the UFL objects representing the discretized PDE. This includes replacing argument spaces with their discontinuous counterparts, introducing test functions on an appropriate trace space, and providing operators assembled from Slate expressions in a similar manner as described in Sect. .

More precisely, let AX=R be the incoming mixed saddle point problem, where R=RURPT, X=UPT, and U and P are the velocity and scalar unknowns, respectively. Then this preconditioner replaces AX=R with the extended problem: 70A^CTC0X^Λ=R^Rg where Λ are the Lagrange multipliers, R^=R^URPT, R^U and RP are the right-hand sides for the flux and scalar equations, respectively, and ⋅^ indicates modified matrices and co-vectors with discontinuous functions. Here, X^=UdPT are the hybridizable (discontinuous) unknowns to be determined and CX^=Rg is the matrix representation of the transmission condition for the hybridizable mixed method (see Eq. ).

The application of firedrake.HybridizationPC can be interpreted as the Schur complement reduction of Eq. (): 71P^=I-A^-1CT0IA^-100S-1I0-CA^-1I, where S is the Schur complement matrix S=-CA^-1CT. As before, a single global system for Λ can be assembled cell-wise using Slate-generated kernels. Configuring the solver for inverting S is done via the PETSc options prefix: -hybridization. The recovery of Ud and P happens in the same manner as firedrake.SCPC.

Since the hybridizable flux solution is constructed in the broken H(div) space Uhd, we must project the computed solution into Uh⊂H(div). This can be done cheaply via local facet averaging. The resulting solution is then updated via U←ΠdivUd, where Πdiv:Uhd→Uh is a projection operator. This ensures that the residual for the original mixed problem is properly evaluated to test for solver convergence. With P^ as in Eq. (), the preconditioning operator for the original system AX=R then has the form 72P=ΠP^ΠT,Π=Πdiv000I0.

We note here that assembly of the right-hand side for the Λ system requires special attention. Firstly, when Neumann conditions are present, then Rg is not necessarily 0. Since the hybridization preconditioner has access to the entire Python context (which includes a list of boundary conditions and the spaces in which they are applied), surface integrals on the exterior boundary are added where appropriate and incorporated into the generated Slate expressions. A more subtle issue that requires extra care is the incoming right-hand side tested in the H(div) space Uh.

The situation we are given is that we have RU=RU(w) for w∈Uh but require R^U(wd) for wd∈Uhd. For consistency, we also require for any w∈Uh that 73R^U(w)=RU(w). We can construct such an R^U satisfying Eq. () in the following way. By construction, we have for each basis function Ψi∈Uh 74Ψi=ΨidΨiassociated with an exterior,facet nodeΨid,++Ψid,-Ψiassociated with an interior,facet nodeΨidΨiassociated with a cell,interior node, where Ψid,Ψid,±∈Uhd, and Ψid,± are basis functions corresponding to the positive and negative restrictions associated with the ith facet node.

These are the two broken parts of Ψi on a particular facet connecting two elements. That is, for two adjacent cells, a basis function in Uh for a particular facet node can be decomposed into two basis functions in Uhd defined on their respective sides of the facet.

We then define our broken right-hand side via the local definition: 75R^U(Ψid)=RU(Ψi)Ni, where Ni is the number of cells that the degree of freedom corresponding to Ψi∈Uh is topologically associated with. Using Eq. (), Eq. (), and the fact that RU is linear in its argument, we can verify that our construction of R^U satisfies Eq. ().

5Numerical studies

We now present results utilizing the Slate DSL and our static condensation preconditioners for a set of test problems. Since we are using the interfaces outlined in Sect. , Slate is accessed indirectly and requires no manually written solver code for hybridization or static condensation or local recovery. All parallel results were obtained on a single fully loaded compute node of dual-socket Intel E5-2630v4 (Xeon) processors with 2×10 cores (2 threads per core) running at 2.2GHz. In order to avoid potential memory effects due to the operating system migrating processes between sockets, we pin MPI processes to cores.

The verification of the generated code is performed using parameter-sensitive convergence tests. The study consists of running a variety of discretizations spanning the methods outlined in Sect. . Details and numerical results are made public and can be viewed in (see “Code and data availability”). All results are in full agreement with the theory.

5.1HDG method for a three-dimensional elliptic equation

In this section, we take a closer look at the LDG-H method for the model elliptic equation (sign-definite Helmholtz): 76-∇⋅∇p+p=f, in Ω=0,13,77p=g, on ∂Ω, where f and g are chosen such that the analytic solution is p=exp⁡{sin⁡(πx)sin⁡(πy)sin⁡(πz)}. We use a regular mesh consisting of 6⋅N3 tetrahedral elements (N∈{4,8,16,32,64}). First, we reformulate Eqs. ()–() as the mixed problem: 78u+∇p=0,79∇⋅u+p=f,80p=g, on ∂Ω. We start with linear polynomial approximations, up to cubic, for the LDG-H discretization of Eqs. ()–(). Additionally, we compute a post-processed scalar approximation ph⋆ of the HDG solution. This raises the approximation order of the computed solution by an additional degree. In all numerical studies here, we set the HDG parameter τ=1. All results were computed in parallel, utilizing a single compute node (described previously).

A continuous Galerkin (CG) discretization of the primal problem Eqs. ()–() serves as a reference for this experiment. Due to the superconvergence in the post-processed solution for the HDG method, we use CG discretizations of polynomial orders 2, 3, and 4. This takes into account the enhanced accuracy of the HDG solution, despite being initially computed as a lower-order approximation. We therefore expect both methods to produce equally accurate solutions to the model problem.

Our aim here is not to compare the performance of HDG and CG, which has been investigated elsewhere (for example, see ). Instead, we provide a reference that the reader might be more familiar with in order to evaluate whether our software framework produces a sufficiently performant HDG implementation relative to what might be expected.

To invert the CG system, we use a conjugate gradient solver with Hypre's BoomerAMG implementation of algebraic multigrid (AMG) as a preconditioner . For the HDG method, we use the preconditioner described in Sect. and the same solver setup as the CG method for the trace system. While the trace operator is indeed symmetric and positive-definite, one should keep in mind that conclusions regarding the performance of off-the-shelf AMG packages on the HDG trace system are still relatively unclear. As a result, efforts on developing more efficient multigrid strategies are a topic of ongoing interest .

To avoid over-solving, we iterate to a relative tolerance such that the discretization error is minimal for a given mesh. In other words, the solvers are configured to terminate when there is no further reduction in the L2 error of the computed solution compared with the analytic solution. This means we are not iterating to a fixed solver tolerance across all mesh resolutions. Therefore, we can expect the total number of Krylov iterations (for both the CG and HDG methods) to increase as the mesh resolution becomes finer. The rationale behind this approach is to directly compare the execution time to solve for the best possible approximation to the solution given a fixed resolution.

5.1.1Error versus execution time

The total execution time is recorded for the CG and HDG solvers, which includes the setup time for the AMG preconditioner, matrix assembly, and the time to solution for the Krylov method. In the HDG case, we include all setup costs, the time spent building the Schur complement for the traces, local recovery of the scalar and flux approximations, and post-processing. The L2 error against execution time and Krylov iterations to reach discretization error for each mesh are summarized in Fig. .

Figure 4

Comparison of continuous Galerkin and LDG-H solvers for the model three-dimensional positive-definite Helmholtz equation. Panel (a): a log–log plot showing the error against execution time for the CG and HDG with post-processing (τ=1) methods. Panel (b): a log–linear plot showing Krylov iterations of the AMG-preconditioned conjugate gradient algorithm (to reach discretization error) against number of cells.

The HDG method of order k-1 (HDGk-1) with post-processing, as expected, produces a solution which is as accurate as the CG method of order k (CGk). While the full HDG system is never explicitly assembled, the larger execution time is a result of several factors. The primary factor is that the total number of trace unknowns for the HDG1, HDG2, and HDG3 discretizations is roughly 4, 3, and 2 times larger, respectively, than the corresponding number of CG unknowns. Therefore, each iteration is more expensive. We also observe that the trace system requires more Krylov iterations to reach discretization error, which appears to improve relative to the CG method as the approximation order increases. Further analysis on a multigrid methods for HDG systems is required to draw further conclusions. The main computational bottleneck in HDG methods is the global linear solver. We therefore expect our implementation to be dominated by the cost associated with inverting the trace operator. If one considers just the time-to-solution, the CG method is clearly ahead of the HDG method. However, the superior scaling, local conservation, and stabilization properties of the HDG method make it a particularly appealing choice for fluid dynamics applications . Therefore, the development of good preconditioning strategies for the HDG method is critical for its competitive use.

5.1.2Breakdown of solver time

The HDG method requires many more degrees of freedom than CG or primal DG methods. This is largely due to the fact that the HDG method simultaneously approximates the primal solution and its velocity. The global matrix for the traces is significantly larger than the one for the CG system at low polynomial order. The execution time for HDG is then compounded by a more expensive global solve.

Figure 5

Breakdown of the CGk and HDGk-1 execution times on a 6⋅643 simplicial mesh.

Table 1

Breakdown of the raw timings for the HDGk-1 (τ=1) and CGk methods; k=2, 3, and 4. Each method corresponds to a mesh size N=64 on a fully loaded compute node.

Stage HDG1 HDG2 HDG3 tstage (s)  % ttotal tstage (s)  % ttotal tstage (s)  % ttotal Matrix assembly (static cond.) 1.05 7.49 % 6.95 10.40 % 31.66 10.27 % Forward elimination 0.86 6.13 % 6.32 9.45 % 31.98 10.37 % Trace solve 10.66 76.24 % 43.89 65.66 % 192.31 62.36 % Back-substitution 1.16 8.28 % 8.71 13.03 % 45.81 14.85 % Post-processing 0.26 1.86 % 0.98 1.46 % 6.62 2.15 % HDG total 13.98 66.85 308.37 CG2 CG3 CG4 tstage (s)  % ttotal tstage (s)  % ttotal tstage (s)  % ttotal Matrix assembly (monolithic) 0.50 12.01 % 2.91 11.39 % 26.37 24.11 % Solve 3.63 87.99 % 22.67 88.61 % 82.99 75.89 % CG total 4.12 25.59 109.36

Figure displays a breakdown of total execution times on a simplicial mesh consisting of 1.5 million elements. The execution times have been normalized by the CG total time, showing that the HDG method is roughly 3 times more expensive than the CG method. This is expected given the larger degree-of-freedom count and expensive global solve. The raw numerical breakdown of the HDG and CG solvers are shown in Table . We isolate each component of the HDG method contributing to the total execution time. Local operations include static condensation (trace operator assembly), forward elimination (right-hand-side assembly for the trace system), backwards substitution to recover the scalar and velocity unknowns, and local post-processing of the scalar solution. For all k, our HDG implementation is solver-dominated as expected.

Both trace operator and right-hand-side assembly are dominated by the costs of inverting a local square mixed matrix coupling the scalar and velocity unknowns, which is performed directly via an LU factorization. This is also the case for backwards substitution. They should all therefore be of the same magnitude in time spent. We observe that this is the case across all degrees, with times ranging between approximately 6 % and 11 % of total execution time for assembling the condensed system. Back-substitution takes roughly the same time as the static condensation and forward elimination stages (approximately 12 % of execution time on average). Finally, the additional cost of post-processing accrues negligible time (roughly 2 % of execution time across all degrees). This is a small cost for an increase in order of accuracy.

We note that the caching of local tensors does not occur. Each pass to perform the local eliminations and backwards reconstructions rebuilds the local element tensors. It is not clear at this time whether the performance gained from avoiding rebuilding the local operators will offset the memory costs of storing the local matrices. Moreover, in time-dependent problems where the operators may contain state-dependent variables, rebuilding local matrices will be necessary in each time step regardless.

5.2Hybridizable mixed methods for the shallow water equations

A primary motivator for our interest in hybridizable methods revolves around developing efficient solvers for problems in geophysical flows. In this section, we present some results integrating the nonlinear, rotating shallow water equations on the sphere using test case 5 (flow past an isolated mountain) from . For our discretization approach, we use the framework of compatible finite elements .

The model equations we consider are the vector-invariant rotating nonlinear shallow water system defined on a two-dimensional spherical surface Ω embedded in R3: 81∂u∂t+∇⟂⋅u+fu⟂+∇gD+b+12|u|2=0, 82∂D∂t+∇⋅uD=0, where u is the fluid velocity, D is the depth field, f is the Coriolis parameter, g is the acceleration due to gravity, b is the bottom topography, and (⋅)⟂≡z^×⋅, with z^ being the unit normal to the surface Ω. After discretizing in time and space using a semi-implicit scheme and Picard linearization, following , we must solve a sequence of the saddle point system at each time step of the form 83A-gΔt2BTHΔt2BMΔUΔD=RuRD. See Appendix for a complete description of the entire discretization strategy. The system Eq. () is the matrix equation corresponding to the linearized equations in Eqs. ()–().

The Picard updates ΔU and ΔD are sought in the mixed finite element spaces Uh⊂H(div) and Vh⊂L2, respectively. Stable mixed finite element pairings correspond to the well-known RT and BDM mixed methods, such as RTk×DGk-1 or BDMk×DGk-1. These also fall within the set of compatible mixed spaces ideal for geophysical fluid dynamics . In particular, the lowest-order RT method (RT1×DG0) on a structured quadrilateral grid (such as the latitude–longitude grid used by many operational dynamical cores) is analogous to the Arakawa C-grid finite difference discretization.

In staggered finite difference models, the standard approach for solving Eq. () is to neglect the Coriolis term and eliminate the velocity unknown ΔU to obtain a discrete elliptic equation for ΔD, where smoothers like Richardson iterations or relaxation methods are convergent. This is more problematic in the compatible finite element framework, since A has a dense inverse. Instead, we use the preconditioner described in Sect. to form the equivalent hybridizable formulation, where both ΔU and ΔD are eliminated locally to produce a sparse elliptic equation for the Lagrange multipliers.

5.2.1Atmospheric flow over a mountain

As a test problem, we solve test case 5 of , on the surface of a sphere with radius R=6371km. We refer the reader to and for a more comprehensive study on mixed finite elements for shallow water systems of this type. We use the mixed finite element pairs (RT1,DG0) (lowest-order RT method) and (BDM2,DG1) (next-to-lowest-order BDM method) for the velocity and depth spaces. A mesh of the sphere is generated from seven refinements of an icosahedron, resulting in a triangulation Th consisting of 327 680 elements in total. The grid information for both mixed methods is summarized in Table .

Table 2

The number of unknowns to be determined are summarized for each compatible finite element method. Resolution is the same for both methods.

Discretization properties Mixed method No. of cells

Δx

Velocity Depth Total (millions) unknowns unknowns

RT1×DG0

327 680 ≈43 km 491 520 327 680 0.8 M

BDM2×DG1

2 457 600 983 040 3.4 M

We run for a total of 25 time steps, with a fixed number of four Picard iterations in each time step. We compare the overall simulation time using two different solver configurations for the implicit linear system. First, we use a flexible variant of the generalized minimal residual method (GMRES)

We use a flexible version of GMRES on the outer system since we use an additional Krylov solver to iteratively invert the Schur complement.

acting on the system Eq. () with an approximate Schur complement preconditioner: 84PSC=IgΔt2A-1BT0IA-100S̃-1 I0-HΔt2BA-1I, where S̃=M+gHΔt24Bdiag(A)-1BT and diag(A) is a diagonal approximation to the velocity mass matrix (plus the addition of a Coriolis matrix). The Schur complement system is inverted via GMRES due to the asymmetry from the Coriolis term, with the inverse of S̃ as the preconditioning operator. The sparse approximation S̃ is inverted using PETSc's smoothed aggregation multigrid (GAMG). The Krylov method is set to terminate once the preconditioned residual norm is reduced by a factor of 108. A-1 is computed approximately using a single application of incomplete LU (ILU) with zero fill-in.

Next, we use only the application of our hybridization preconditioner (no outer Krylov method), which replaces the original linearized mixed system with its hybridizable equivalent. After hybridization, we have the following extended problem for the Picard updates: find (Δuhd,ΔDh,λh)∈Uhd×Vh×Mh satisfying 85w,ΔuhdTh+Δt2w,fΔuhd⟂Th-Δt2∇⋅w,gΔDhTh+w,λh∂Th=R^u,∀w∈Uhd, 86ϕ,ΔDhTh+Δt2ϕ,H∇⋅ΔuhdTh=RD,∀ϕ∈Vh, 87γ,Δuhd∂Th=0,∀γ∈Mh. Note that the space Mh is chosen such that the trace functions, when restricted to a facet e∈∂Th, are in the same polynomial space as Δuh⋅n|e. Moreover, it can be shown that the Lagrange multiplier λh is an approximation to the depth unknown ΔtgΔD/2 restricted to ∂Th.

The resulting three-field problem in Eqs. ()–() produces the following matrix equation: 88A^CTC0ΔXΛ=R^ΔX0, where A^ is the discontinuous operator coupling ΔX=ΔUdΔDT and RΔX=R^uRDT are the problem residuals. An exact Schur complement factorization is performed on Eq. (), using Slate to generate the local elimination kernels. We use the same set of solver options for the inversion of S̃ in Eq. () to invert the Lagrange multiplier system. The increments ΔUd and ΔD are recovered locally, using Slate-generated kernels. Once recovery is complete, ΔUd is projected back into the conforming H(div) finite element space via ΔU←ΠdivΔUd. Based on the discussion in Sect. , we apply 89Phybrid=ΠI-A^-1CT0IA^-100S-1 I0-CA^-1IΠT.

Table 3

Preconditioner solve times for a 25-step run with Δt=100s. These are cumulative times in each stage of the two preconditioners throughout the entire profile run. We display the average iteration count (rounded to the nearest integer) for both the outer and the inner Krylov solvers. The significant speedup when using hybridization is a direct result of eliminating the outermost solver.

Preconditioner and solver details Mixed method Preconditioner ttotal (s) Avg. outer Avg. inner

ttotalSCttotalhybrid.

its. its.

RT1×DG0

approx. Schur. (PSC) 15.137 2 8 3.413 hybridization (Phybrid) 4.434 none 2

BDM2×DG1

approx. Schur. (PSC) 300.101 4 9 5.556 hybridization (Phybrid) 54.013 none 6

Table 4

Breakdown of the cost (average) of a single application of the preconditioned flexible GMRES method and hybridization preconditioner. Hybridization takes approximately the same time per iteration.

Preconditioner Stage

RT1×DG0

BDM2×DG1

tstage (s)  % ttotal tstage (s)  % ttotal approx. Schur (PSC) Schur solve 0.07592 91.28 % 0.78405 93.53 % invert velocity operator: A 0.00032 0.39 % 0.00678 0.81 % apply inverse: A-1 0.00041 0.49 % 0.00703 0.84 % gmres other 0.00652 7.84 % 0.04041 4.82 % Total 0.08317 0.83827 hybridization (Phybrid) Transfer: RΔX→R^ΔX 0.00322 7.26 % 0.00597 1.10 % Forward elim.: -CA^-1R^ΔX 0.00561 12.64 % 0.12308 22.79 % Trace solve 0.02289 51.63 % 0.28336 52.46 % Back-sub. 0.00986 22.23 % 0.12220 22.62 % Projection: ΠdivΔUd 0.00264 5.96 % 0.00516 0.96 % Total 0.04434 0.54013

Table displays a summary of our findings. The advantages of a hybridizable method versus a mixed method are more clearly realized in this experiment. When using hybridization, we observe a significant reduction in time spent in the implicit solver compared to the approximate Schur complement approach. This is primarily because we have reduced the number of “outer” iterations to zero; the hybridization preconditioner is performing an exact factorization of the global hybridizable system. This is empirically supported when considering per-application solve times. The values reported in Table show the average cost of a single outer GMRES iteration (which includes the application of PSC) and a single application of Phybrid. Hybridization and the approximate Schur complement preconditioner are comparable in terms of average execution time, with hybridization being slightly faster. This further demonstrates that the primary cause for the longer execution time of the latter is directly related to the additional outer iterations induced from using an approximate factorization. In terms of over all time-to-solution, the hybridizable methods are clearly ahead of the original mixed methods.

We also measure the relative reductions in the problem residual of the linear system Eq. (). Our hybridization preconditioner reduces the residual by a factor of 108 on average, which coincides with the specified relative tolerance for the Krylov method on the trace system. In other words, the reduction in the residual for the trace system translates into an overall reduction in the residual for the mixed system by the same factor.

The test case was run up to day 15 on a coarser resolution (20 480 simplicial cells with Δx≈210km) and a time-step size Δt=500 s. Snapshots of the entire simulation are provided in Fig. using the semi-implicit scheme described in Appendix . The results we have obtained for days 5, 10, and 15 are comparable to the corresponding results of , , and . We refer the reader to for further demonstrations of shallow water test cases featuring the use of the hybridization preconditioner described in Sect. .

5.3Hybridizable methods for a linear Boussinesq model

As a final example, we consider the simplified atmospheric model obtained from a linearization of the compressible Boussinesq equations in a rotating domain: 90∂u∂t+2Ω×u=-∇p+bz^,91∂p∂t=-c2∇⋅u,92∂b∂t=-N2u⋅z^, where u is the fluid velocity, p the pressure, b the buoyancy, Ω the planetary angular rotation vector, c the speed of sound (≈343ms-1), and N the buoyancy frequency (≈0.01s-1). Equations ()–() permit fast-moving acoustic waves driven by perturbations in b. This is the model presented in , which uses a quadratic equation of state to avoid some of the complications of the full compressible Euler equations (the hybridization of which we shall address in future work). We solve these equations subject to the rigid-lid condition u⋅n=0 on all boundaries.

Our domain consists of a spherical annulus, with the mesh constructed from a horizontal “base” mesh of the surface of a sphere of radius R, extruded upwards by a height HΩ. The vertical discretization is a structured one-dimensional grid, which facilitates the staggering of thermodynamic variables, such as b. We consider two kinds of meshes: one obtained by extruding an icosahedral sphere mesh and another from a cubed sphere.

Since our mesh has a natural tensor product structure, we construct suitable finite element spaces constructed by taking the tensor product of a horizontal space with a vertical space. To ensure our discretization is “compatible,” we use the one- and two-dimensional finite element de Rham complexes: Vh0→∂zVh1 and Uh0→∇⟂Uh1→∇⋅Uh2. We can then construct the three-dimensional complex: Wh0→∇Wh1→∇×Wh2→∇⋅Wh3, where 93Wh0=Uh0⊗Vh0, 94Wh1=HCurl(Uh1⊗Vh0)⊕HCurl(Uh0⊗Vh1)=:Wh1,h⊕Wh1,v, 95Wh2=HDiv(Uh1⊗Vh1)⊕HDiv(Uh2⊗Vh0)=:Wh2,h⊕Wh2,v, 96Wh3=Uh2⊗Vh1. Here, HCurl and HDiv denote operators which ensure that the correct Piola transformations are applied when mapping from physical to reference element. We refer the reader to for an overview of constructing tensor product finite element spaces in Firedrake. For the analysis of compatible finite element discretizations and their relation to the complex Eqs. ()–(), we refer the reader to . Each discretization used in this section is constructed from more familiar finite element families, shown in Table .

Figure 6

Snapshots (view from the northern pole) from the isolated mountain test case. The surface height (m) at days 5, 10, and 15. The snapshots were generated on a mesh with 20 480 simplicial cells, a BDM2×DG1 discretization, and Δt=500 s. The linear system during each Picard cycle was solved using the hybridization preconditioner.

Table 5

Vertical and horizontal spaces for the three-dimensional compatible finite element discretization of the linear Boussinesq model. The RTk and BDFMk+1 methods are constructed on triangular prism elements, while the RTCFk method is defined on extruded quadrilateral elements.

Compatible finite element spaces Mixed method

Vh0

Vh1

Uh0

Uh1

Uh2

RTk

CGk(0,HΩ)

DGk-1(0,HΩ)

CGk(△)

RTk(△)

DGk-1(△)

BDFMk+1

CGk+1(0,HΩ)

DGk(0,HΩ)

CGk+1(△)

BDFMk+1(△)

DGk(△)

RTCFk

CGk(0,HΩ)

DGk-1(0,HΩ)

Qk(□)

RTCFk(□)

DQk-1(□)

5.3.1Compatible finite element discretization

A compatible finite element discretization of Eqs. ()–() constructs solutions in the following finite element spaces: 97uh∈W˚h2,ph∈Wh3,bh∈Whb, where W˚h2 is the subspace of Wh2⊂H(div) whose functions w satisfy w⋅n=0 on ∂Ω, Wh3⊂L2, and Whb≡Uh2⊗Vh0. Note that Whb is just the scalar version of the vertical velocity space.

The choice of Whb in Eq. () corresponds to a Charney–Phillips vertical staggering of the buoyancy variable, which is the desired approach for the UK Met Office's Unified Model . One could also collocate bh with ph (bh∈Wh3), which corresponds to a Lorenz staggering. This, however, supports a computational mode which is exacerbated by fast-moving waves. We restrict our discussion to the former case.

That is, Whb and Wh2,v have the same number of degrees of freedom but differ in how they are pulled back to the reference element.

To obtain the discrete system, we simply multiply Eqs. ()–() by test functions w∈W˚h2, ϕ∈Wh3, and η∈Whb and integrate by parts. We introduce the increments δuh≡uhn+1-uhn, and set u0≡uhn (similarly for δph, p0, δbh, and b0). Using an implicit midpoint rule discretization, we need to solve the following mixed problem at each time step: find δuh∈W˚h2, δph∈Wh3 and δbh∈Whb such that 98w,δuhTh+Δt2w,2Ω×δuhTh-Δt2∇⋅w,δphTh-Δt2w,δbhz^Th=ru,∀w∈W˚h2 99ϕ,δphTh+Δt2c2ϕ,∇⋅δuhTh=rp,∀ϕ∈Wh3, 100η,δbhTh+Δt2N2η,δuh⋅z^Th=rb,∀η∈Whb, where the residuals are ru=-Δtw,2Ω×u0Th, rp=-c2Δtϕ,∇⋅u0Th, and rb=-N2Δtη,u0⋅z^Th.

The resulting matrix equations have the form 101Au-Δt2DT-Δt2QTΔt2c2DMp0Δt2N2Q0MbUPB=RuRpRb, where Au=Mu+Δt2CΩ, CΩ is the asymmetric matrix associated with the Coriolis term, Mu, Mp, and Mb are mass matrices, D is the weak divergence term, and Q is an operator containing the vertical components of δuh. In the absence of orography, we can use the point-wise expression for the buoyancy, 102δbh=rb-Δt2N2δuh⋅z^, and eliminate δbh from Eq. () by substituting Eq. () into Eq. (). This produces the following mixed velocity–pressure system: 103AUP=Ãu-Δt2DTc2Δt2DMpUP=R̃uRp, where Ãu=Au+Δt24N2QTMb-1Q and R̃u=Ru+Δt2QTMb-1Rb are the modified velocity operator and right-hand side, respectively. Note that in our elimination strategy, Ãu corresponds to the bilinear form obtained after eliminating the buoyancy at the equation level: 104Ãu←w,δuTh+Δt2w,2Ω×δuTh+Δt24N2w⋅z^,δu⋅z^Th. A similar construction holds for R̃u. Once Eq. () is solved, δbh is reconstructed by solving 105MbB=Rb-Δt2N2QU. Equation () can be efficiently inverted using the conjugate gradient method.

5.3.2Preconditioning the mixed velocity pressure system

The primary difficulty is finding efficient solvers for Eq. (). This was studied by within the context of developing a preconditioner which is robust against parameters, like Δt and mesh resolution. However, the implicit treatment of the Coriolis term was not taken into account. We consider two preconditioning strategies.

One strategy proposed by is to build a preconditioner based on the Schur complement factorization of A in Eq. (): 106A-1=IΔt2Ãu-1DT0IÃu-100H-1 I0-c2Δt2DÃu-1I, where H=Mp+c2Δt24DÃu-1DT is the dense pressure Helmholtz operator. Because we have chosen to include the Coriolis term, the operator H is nonsymmetric and has the form 107H=Mp+c2Δt24DM̃u+Δt2CΩ-1DT, where M̃u is a modified mass matrix. As Δt increases, the contribution of CΩ becomes more prominent in H, making sparse approximations of H more challenging. We shall elaborate on this further below when we present the results of our second solver strategy.

Our preferred strategy solves the hybridizable formulation of the system Eq. (). Let Wh2,d denote the broken version of Wh2 and Mh the space of Lagrange multipliers. Then the hybridizable formulation for the velocity–pressure system reads as follows: find δuhd∈Wh2,d, δph∈Wh3, and λh∈Mh such that 108w,δuhdTh+Δt2w,2Ω×δuhdTh+Δt24N2w⋅z^,δuhd⋅z^Th-Δt2∇⋅w,δphTh+w,λh∂Th=r̃u,∀w∈Wh2,d 109ϕ,δphTh+Δt2c2ϕ,∇⋅δuhTh=rp,∀ϕ∈Wh3, 110γ,δuhd∂Th=0,∀γ∈Mh. The system Eqs. ()–() is automatically formed by the Firedrake preconditioner: firedrake.HybridizationPC. We then locally eliminate the velocity and pressure after hybridization, producing the following condensed problem: 111H∂Λ=E,H∂=C0Ã^u-Δt2D^Tc2Δt2D^Mp-1 CT0,E=C0Ã^u-Δt2D^Tc2Δt2D^Mp-1R^uRp, where ⋅^ denotes matrices or vectors with discontinuous test and trial functions. The nonsymmetric operator H∂ is inverted using a preconditioned generalized conjugate residual (GCR) Krylov method, as suggested in . For our choice of preconditioner, we follow strategies outlined in and employ an algebraic multigrid method (V cycle) with GMRES (five iterations) smoothers on the coarse levels. The GMRES smoothers are preconditioned with block ILU on each level. For the finest level, block ILU produces a line smoother (necessary for efficient solution on thin domains) when the trace variable nodes are numbered in vertical lines, as is the case in our Firedrake implementation. On the coarser levels, less is known about the properties of ILU under the AMG coarsening strategies, but as we shall see, we observe performance that suggests ILU is still behaving as a line smoother. More discussion on multigrid for nonsymmetric problems can be found in and . A gravity wave test using our solution strategy and hybridization preconditioner is illustrated in Fig. for a problem on a condensed Earth (radius scaled down by a factor of 125) and 10 km lid.

Figure 7

Buoyancy perturbation (y-z cross section) at t=3600 s from a simple gravity wave test (Δt=100 s). The initial conditions (in lat–long coordinates) for the velocity are a simple solid-body rotation: u=20eλ, where eλ is the unit vector pointing in the direction of decreasing longitude. A buoyancy anomaly is defined via b=d2d2+q2sin⁡(πz/10000), where q=Rcos⁡-1(cos⁡(ϕ)cos⁡(λ-λϕ)), d=5000 m, R=6371km/125 is the planet radius, and λϕ=2/3. The equations are discretized using the lowest-order method RTCF1, with 24 576 quadrilateral cells in the horizontal and 64 extrusion levels. The velocity–pressure system is solved using hybridization.

5.3.3Robustness against acoustic Courant number with implicit Coriolis

In this final experiment, we repeat a similar study to that presented in . Our setup closely resembles the gravity wave test of extended to a spherical annulus. We initialize the velocity in a simple solid-body rotation and introduce a localized buoyancy anomaly. A Coriolis term is introduced as a function of the Cartesian coordinate z and is constant along lines of latitude (f plane): 2Ω=2ΩrzRz^, with angular rotation rate Ωr=7.292×10-5s-1. We fix the resolution of the problem and run the solver over a range of Δt. We measure this by adjusting the horizontal acoustic Courant number λC=cΔtΔx, where c is the speed of sound and Δx is the horizontal resolution.

Note that the range of Courant numbers used in this paper exceeds what is typical in operational forecast settings (typically between O(2)–O(10)). The grid setup mirrors that of actual global weather models; we extrude a spherical mesh of the Earth upwards to a height of 85 km. The setup for the different discretizations (including degrees of freedom for the velocity–pressure and hybridized systems) is presented in Table .

Table 6

Grid setup and discretizations for the acoustic Courant number study. The total number of degrees of freedom (dofs) for the mixed (velocity and pressure) and hybridizable (velocity, pressure, and trace) discretizations are shown in the last two columns (millions). The vertical resolution is fixed across all discretizations.

Discretizations and grid information Mixed method No. of horiz. cells No. of vert. layers

Δx

Δz

U-P dofs Hybrid. dofs

RT1

81 920 85 86 km 1 000 m 24.5 M 59.3 M

RT2

5 120 85 346 km 1 000 m 9.6 M 17.4 M

BDFM2

5 120 85 346 km 1 000 m 10.5 M 18.3 M

RTCF1

98 304 85 78 km 1 000 m 33.5 M 83.7 M

RTCF2

6 144 85 312 km 1 000 m 16.7 M 29.3 M

It was shown in that using a sparse approximation of the pressure Schur complement of the form 112H̃=Mp+c2Δt24DDiag(Ãu)-1DT served as a good preconditioner, leading to a system that was amenable to multigrid methods and resulted in a Courant-number-independent solver. However, when the Coriolis term is included, this is no longer the case: the diagonal approximation Diag(Ãu) becomes worse with increasing λC. To demonstrate this, we solve the gravity wave system on a low-resolution grid (10 km lid, 10 vertical levels, maintaining the same cell aspect ratio as in Table ) using the Schur complement factorization Eq. (). LU factorizations are applied to invert both Ãu-1 and H̃-1. Inverting the Schur complement H-1 is done using preconditioned GMRES iterations, and a flexible-GMRES algorithm is used on the full velocity–pressure system. If H̃-1 is a good approximation to H-1, then we should see low iteration counts in the Schur complement solve. Figure shows the results of this study for a range of Courant numbers.

Figure 8

Number of Krylov iterations to invert the Helmholtz system using H̃-1 as a preconditioner. The preconditioner is applied using a direct LU factorization within a GMRES method on the entire pressure equation. While the lowest-order methods grow slowly over the Courant number range, the higher-order (by only 1 approximation order) methods quickly degrade and diverge after the critical range λC=O(2)–O(10). At λC>32, the solvers take over 150 iterations.

Figure 9

Courant number parameter test run on a fully loaded compute node. Both figures display the hybridized solver for each discretization, described in Table . Panel (a) displays to total iteration count (preconditioned GCR) to solve the trace system to a relative tolerance of 10-5. Panel (b) displays the relative work of each solver, which takes into account the time required to forward eliminate and locally recover the velocity and pressure.

For the lower-order methods, the number of iterations to invert H grows slowly but remains under control. Increasing the approximation degree by 1 results in degraded performance. As Δt increases, the number of Krylov iterations needed to invert the system to a relative tolerance of 10-5 grows rapidly. It is clear that this sparse approximation is not robust against Courant number. This can be explained by the fact that diagonalizing the velocity operator fails to take into account the effects of the Coriolis term (which appear in off-diagonal positions in the operator). Even if one were to use traditional mass lumping (row-wise sums), the Coriolis effects are effectively canceled out due to asymmetry.

Hybridization avoids this problem entirely: we always construct an exact Schur complement and only have to worry about solving the trace system Eq. (). We now show that this approach (described in Sect. ) is much more robust to changes in Δt. We use the same workstation as for the three-dimensional CG/HDG problem in Sect. (executed with a total of 40 MPI processes). Figure shows the parameter test for all the discretizations described in Table . We see that, in terms of total number of GCR iterations needed to invert the trace system, hybridization is far more controlled as Courant number increases. They largely remain constant throughout the entire parameter range, only varying by an iteration or two. It is not until after λC>32 that we begin to see a larger jump in the number of GCR iterations. This is expected, since the Coriolis operator causes the problem to become singular for very large Courant numbers. However, unlike with the approximate Schur complement solver, iteration counts are still under control. In particular, each method (lowest and higher order) remains constant throughout the critical range (shaded in gray in Fig. ).

In Fig. b, we display the ratio of execution time and the time-to-solution at the lowest Courant number of 2. We perform this normalization to better compare the lower and higher-order methods (and discretizations on triangular prisms vs. extruded quadrilaterals). The calculation of the ratios includes the time needed to eliminate and reconstruct the hybridized velocity–pressure variables. The fact that the hybridization solver remains close to 1 demonstrates that the entire solution procedure is largely λC-independent until around λC=32. The overall trend is largely the same as the number of Krylov iterations to reach solver convergence. This is due to our hybridization approach being solver dominated, with local operations like forward elimination together with local recovery taking approximately one-third of the total execution time for each method. The percentage breakdown of the hybridization solver is similar to what is already presented in Sect. .

Implicitly treating the Coriolis term has been discussed for semi-implicit discretizations of large-scale geophysical flows . The addition of the Coriolis term presents a particular challenge to the solution finite element discretizations of these equations since it increases the difficulty of finding a good sparse approximation of the nonsymmetric elliptic operator. Hybridization shows promise here, as it allows for the assembly of the elliptic equation that both captures the effects of rotation and results in a sparse linear system.

6Conclusions

We have presented Slate, and shown how this language can be used to create concise mathematical representations of localized linear algebra on the tensors corresponding to finite element forms. We have shown how this DSL can be used in tandem with UFL in Firedrake to implement solution approaches making use of automated code generation for static condensation, hybridization, and localized post-processing. Setup and configuration are done at runtime, allowing one to switch in different discretizations at will. In particular, this framework alleviates much of the difficulty in implementing such methods within intricate numerical code and paves the way for future low-level optimizations. In this way, the framework in this paper can be used to help enable the rapid development and exploration of new hybridization and static condensation techniques for a wide class of problems. We remark here that the reduction of global matrices via element-wise algebraic static condensation, as described in and is also possible using Slate, including other more general static condensation procedures outside the context of hybridization.

Our approach to preconditioner design revolves around its composable nature, in that these Slate-based implementations can be seamlessly incorporated into complicated solution schemes. In particular, there is current research in the design of dynamical cores for numerical weather prediction using implementations of hybridization and static condensation with Slate . The performance of such methods for geophysical flows are a subject of ongoing investigation.

In this paper, we have provided some examples of hybridization procedures for compatible finite element discretizations of geophysical flows. These approaches avoid the difficulty in constructing sparse approximations of dense elliptic operators. Static condensation arising from hybridizable formulations can best be interpreted as producing an exact Schur complement factorization on the global hybridizable system. This eliminates the need for outer iterations from a suitable Krylov method to solve the full mixed system and replaces the original global mixed equations with a condensed elliptic system. More extensive performance benchmarks, which require detailed analysis of the resulting operator systems arising from hybridization, are a necessary next step to determine whether hybridization provides a scalable solution strategy for compatible finite elements in operational settings.

Appendix ASemi-implicit method for the shallow water system

For some tessellation, Th, our semi-discrete mixed method for Eqs. ()–() seeks approximations (uh,Dh)∈Uh×Vh⊂H(div)×L2 satisfying A1w,∂uh∂tTh-∇⟂w⋅uh⟂,uh⟂Th+w,fuh⟂Th+n⟂w⋅uh⟂,ũh⟂∂Th -∇⋅w,gDh+b+12|uh|2Th=0,∀w∈Uh, A2ϕ,∂Dh∂tTh-∇ϕ,uhDhTh+ϕuh,D̃h∂Th,=0,∀ϕ∈Vh, where ⋅̃ indicates that the value of the function should be taken from the upwind side of each facet. The discretization of the velocity advection operator is an extension of the energy-conserving scheme of to the shallow water equations.

The time-stepping scheme follows a Picard iteration semi-implicit approach, where predictive values of the relevant fields are determined via an explicit step of the advection equations and corrective updates are generated by solving an implicit linear system (linearized about a state of rest) for (Δuh,ΔDh)∈Uh×Vh, given by A3w,ΔuhTh+Δt2w,fΔuh⟂Th-Δt2∇⋅w,gΔDhTh=-Ru[uhn+1,Dhn+1;w],∀w∈Uh, A4ϕ,ΔDhTh+HΔt2ϕ,∇⋅ΔuhTh=-RD[uhn+1,Dhn+1;ϕ],∀ϕ∈Vh, where H is the mean layer depth and Ru and RD are residual linear forms that vanish when uhn+1 and Dhn+1 are solutions to the implicit midpoint rule time discretization of Eqs. ()–(). The residuals are evaluated using the predictive values of uhn+1 and Dhn+1.

The implicit midpoint rule time discretization of the nonlinear rotating shallow water Eqs. ()–() is A5w,uhn+1-uhnTh-Δt∇⟂w⋅uh*⟂,uh*⟂Th+Δtw,fuh*⟂Th+Δtn⟂w⋅uh*⟂,ũh*⟂∂Th-Δt∇⋅w,gDh*+b+12|uh*|2Th=0,∀w∈Uh, A6ϕ,Dhn+1-DhnTh-Δt∇ϕ,uh*Dh*Th+Δtϕuh*,D̃h*∂Th,=0,∀ϕ∈Vh, where uh*=(uhn+1+uhn)/2 and Dh*=(Dhn+1+Dhn)/2.

One approach to construct the residual functionals Ru and RD would be to simply define these from Eqs. ()–(). However, this leads to a small critical time step for the stability of the scheme. To make the numerical scheme more stable, we define residuals as follows. For Ru, we first solve for vh∈Uh such that A7w,vh-uhnTh-Δt∇⟂w⋅uh*⟂,vh♯⟂Th+Δtw,fvh♯⟂Th+Δtn⟂w⋅uh*⟂,ṽh♯⟂∂Th-Δt∇⋅w,gDh*+b+12|uh*|2Th=0,∀w∈Uh, where vh♯=(vh+uhn)/2. This is a linear variational problem. Then, A8Ru[uhn+1,Dhn+1;w]=w,vh-uhn+1Th. Similarly, for RD we first solve for Eh∈Vh such that A9ϕ,Eh-DhnTh-Δt∇ϕ,uh*Eh♯Th+Δtϕuh*,Ẽh♯∂Th=0,∀ϕ∈Vh, where Eh♯=(Eh+Dhn)/2. This is also a linear problem. Then, A10RD[uhn+1,Dhn+1;ϕ]=ϕ,Eh-Dhn+1Th.

This process can be thought of as iteratively solving for the average velocity and depth that satisfies the implicit midpoint rule discretization. Both Eqs. () and () can be solved separately, since there is no coupling between them. The fields vh and Eh are then used to construct the right-hand side for the implicit linearized system in Eqs. ()–(). Once the system is solved, the solution (Δuh,ΔDh) is then used to update the iterative values of uhn+1 and Dhn+1 according to (uhn+1,Dhn+1)←(uhn+1+Δuh,Dhn+1+ΔDh), having initially chosen (uhn+1,Dhn+1)=(uhn,Dhn).

Code and data availability

The contribution in this paper is available through open-source software provided by the Firedrake Project: https://www.firedrakeproject.org/ (last access: 30 January 2020). We cite archives of the exact software versions used to produce the results in this paper. For all components of the Firedrake project used in this paper, see . The numerical experiments, full solver configurations, code verification (including local processing), and raw data are available in .

Author contributions

THG is the principal author and developer of the software presented in this paper and main author of the text. Authors LM and DAH assisted and guided the software abstraction as a domain-specific language and edited text. CJC contributed to the formulation of the geophysical fluid dynamics and the design of the numerical experiments and edited text.

Competing interests

David A. Ham is an executive editor of the journal. The other authors declare they have no other competing interests.

Acknowledgements

The authors would like to acknowledge funding from the Engineering and Physical Sciences Research Council (EPSRC) and the Natural Environment Research Council (NERC). The authors also wish to thank Andrew T. T. McRae for providing thoughtful comments on early drafts of this paper.

Financial support

This research has been supported by the Engineering and Physical Sciences Research Council (grant nos. EP/M011054/1, EP/L000407/1, and EP/L016613/1) and the Natural Environment Research Council (grant no. NE/K008951/1).

Review statement

This paper was edited by Simone Marras and reviewed by two anonymous referees.

References Alnæs et al.(2014)

Alnæs, M. S., Logg, A., Ølgaard, K. B., Rognes, M. E., and Wells, G. N.: Unified form language: A domain-specific language for weak formulations of partial differential equations, ACM Trans. Mathe. Softw. (TOMS), 40, 1–37, 2014.

Arnold and Brezzi(1985)

Arnold, D. N. and Brezzi, F.: Mixed and nonconforming finite element methods: implementation, postprocessing and error estimates, ESAIM: Mathe. Modell. Num. Anal., 19, 7–32, 1985.

Arnold et al.(2000)

Arnold, D. N., Falk, R. S., and Winther, R.: Multigrid in H(div) and H(curl), Num. Mathe., 85, 197–217, 10.1007/s002110000137, 2000.

Balay et al.(1997)

Balay, S., Gropp, W. D., McInnes, L. C., and Smith, B. F.: Efficient management of parallelism in object-oriented numerical software libraries, in: Modern software tools for scientific computing, 163–202, Springer, 1997.

Balay et al.(2019)

Balay, S., Abhyankar, S., Adams, M. F., Brown, J., Brune, P., Buschelman, K., Dalcin, L., Eijkhout, V., Gropp, W. D., Karpeyev, D., Kaushik, D., Knepley, M. G., May, D. A., McInnes, L. C., Mills, R. T., Munson, T., Rupp, K., Sanan, P., Smith, B. F., Zampini, S., Zhang, H., and Zhang, H.: PETSc Users Manual, Tech. Rep. ANL-95/11 – Revision 3.11, Argonne National Laboratory, 2019.

Bauer and Cotter(2018)

Bauer, W. and Cotter, C.: Energy-enstrophy conserving compatible finite element schemes for the rotating shallow water equations with slip boundary conditions, J. Comput. Phys., 373, 171–187, 10.1016/j.jcp.2018.06.071, 2018.

Boffi et al.(2013)

Boffi, D., Brezzi, F., and Fortin, M.: Mixed finite element methods and applications, vol. 44 of Springer Series in Computational Mathematics, Springer-Verlag New York, 2013.

Bramble and Xu(1989)

Bramble, J. H. and Xu, J.: A local post-processing technique for improving the accuracy in mixed finite-element approximations, SIAM J. Num. Anal., 26, 1267–1275, 1989.

Bramble et al.(1988)

Bramble, J. H., Pasciak, J. E., and Xu, J.: The analysis of multigrid algorithms for nonsymmetric and indefinite elliptic problems, Math. Comput., 51, 389–414, 1988.

Bramble et al.(1994)

Bramble, J. H., Kwak, D. Y., and Pasciak, J. E.: Uniform convergence of multigrid V-cycle iterations for indefinite and nonsymmetric problems, SIAM J. Num. Anal., 31, 1746–1763, 1994.

Brezzi and Fortin(1991)

Brezzi, F. and Fortin, M.: Mixed and hybrid finite element methods, vol. 15 of Springer Series in Computational Mathematics, Springer-Verlag New York, 1991.

Brezzi et al.(1985)

Brezzi, F., Douglas, J., and Marini, L. D.: Two families of mixed finite elements for second order elliptic problems, Num. Mathe., 47, 217–235, 1985.

Brezzi et al.(1987)

Brezzi, F., Douglas, J., Durán, R., and Fortin, M.: Mixed finite elements for second order elliptic problems in three variables, Num. Mathe., 51, 237–250, 1987.

Brown et al.(2012)

Brown, J., Knepley, M. G., May, D. A., McInnes, L. C., and Smith, B.: Composable linear solvers for multiphysics, in: Parallel and Distributed Computing (ISPDC), 2012 11th International Symposium on, 55–62, IEEE, 2012.

Cockburn(2016)

Cockburn, B.: Static condensation, hybridization, and the devising of the HDG methods, in: Building Bridges: Connections and Challenges in Modern Approaches to Numerical Partial Differential Equations, 129–177, Springer, 2016.

Cockburn et al.(2009a)

Cockburn, B., Gopalakrishnan, J., and Lazarov, R.: Unified hybridization of discontinuous Galerkin, mixed, and continuous Galerkin methods for second order elliptic problems, SIAM J. Num. Anal., 47, 1319–1365, 2009a.

Cockburn et al.(2009b)

Cockburn, B., Guzmán, J., and Wang, H.: Superconvergent discontinuous Galerkin methods for second-order elliptic problems, Mathe. Comput., 78, 1–24, 2009b.

Cockburn et al.(2010a)

Cockburn, B., Gopalakrishnan, J., Li, F., Nguyen, N.-C., and Peraire, J.: Hybridization and postprocessing techniques for mixed eigenfunctions, SIAM J. Num. Anal., 48, 857–881, 2010a.

Cockburn et al.(2010b)

Cockburn, B., Gopalakrishnan, J., and Sayas, F.-J.: A projection-based error analysis of HDG methods, Mathe. Comput., 79, 1351–1367, 2010b.

Cockburn et al.(2014)

Cockburn, B., Dubois, O., Gopalakrishnan, J., and Tan, S.: Multigrid for an HDG method, IMA J. Num. Anal., 34, 1386–1425, 2014.

Cotter and Shipton(2012)

Cotter, C. J. and Shipton, J.: Mixed finite elements for numerical weather prediction, J. Comput. Phys., 231, 7076–7091, 2012.

Cotter and Thuburn(2014)

Cotter, C. J. and Thuburn, J.: A finite element exterior calculus framework for the rotating shallow-water equations, J. Comput. Phys., 257, 1506–1526, 2014.

Cullen(2001)

Cullen, M.: Alternative implementations of the semi-Lagrangian semi-implicit schemes in the ECMWF model, Q. J. Roy. Meteorol. Soc., 127, 2787–2802, 2001.

Dalcin et al.(2011)

Dalcin, L. D., Paz, R. R., Kler, P. A., and Cosimo, A.: Parallel distributed computing using python, Adv. Water Resour., 34, 1124–1139, 2011.

Devloo et al.(2018)

Devloo, P., Faria, C., Farias, A., Gomes, S., Loula, A., and Malta, S.: On continuous, discontinuous, mixed, and primal hybrid finite element methods for second-order elliptic problems, Int. J. Num. Method. Eng., 115, 1083–1107, 10.1002/nme.5836, 2018.

Elman et al.(2001)

Elman, H. C., Ernst, O. G., and O'leary, D. P.: A multigrid method enhanced by Krylov subspace iteration for discrete Helmholtz equations, SIAM J. Sci. Comput., 23, 1291–1315, 2001.

Falgout et al.(2006)

Falgout, R. D., Jones, J. E., and Yang, U. M.: The design and implementation of hypre, a library of parallel high performance preconditioners, in: Numerical solution of partial differential equations on parallel computers, 267–294, Springer, 2006.

Fraeijs de Veubeke(1965)

Fraeijs de Veubeke, B.: Displacement and equilibrium models in the finite element method, in: Stress Analysis, edited by: Zienkiewicz, O. and Holister, G. S., John Wiley & Sons, reprinted in Internat, J. Numer. Methods Engrg., 52, 287–342, 1965.

Gopalakrishnan(2003)

Gopalakrishnan, J.: A Schwarz preconditioner for a hybridized mixed method, Comput. Methods Appl. Math., 3, 116–134, 2003.

Guennebaud et al.(2015)

Guennebaud, G., Jacob, B., Avery, P., Bachrach, A., and Barthelemy, S.: Eigen v3, 2010, available at: http://eigen.tuxfamily.org (last access: 7 February 2020), 2015.

Guyan(1965)

Guyan, R. J.: Reduction of stiffness and mass matrices, AIAA J., 3, 380, 1965.

Hecht(2012)

Hecht, F.: New development in FreeFem++, J. Num. Mathe., 20, 251–266, 2012.

Hiptmair and Xu(2007)

Hiptmair, R. and Xu, J.: Nodal auxiliary space preconditioning in H(curl) and H(div) spaces, SIAM J. Num. Anal., 45, 2483–2509, 10.1137/060660588, 2007.

Homolya et al.(2018)

Homolya, M., Mitchell, L., Luporini, F., and Ham, D. A.: TSFC: a structure-preserving form compiler, SIAM J. Sci. Comput., 40, C401–C428, 10.1137/17M1130642, 2018.

Irons(1965)

Irons, B.: Structural eigenvalue problems-elimination of unwanted variables, AIAA J., 3, 961–962, 1965.

Kang et al.(2020)

Kang, S., Giraldo, F. X., and Bui-Thanh, T.: IMEX HDG-DG: A coupled implicit hybridized discontinuous Galerkin and explicit discontinuous Galerkin approach for shallow water systems, J. Comput. Phys., 401, 109010, 2020.

Kirby(2004)

Kirby, R. C.: Algorithm 839: FIAT, a new paradigm for computing finite element basis functions, ACM Trans. Mathe. Softw. (TOMS), 30, 502–516, 2004.

Kirby and Logg(2006)

Kirby, R. C. and Logg, A.: A compiler for variational forms, ACM Trans. Mathe. Softw. (TOMS), 32, 417–444, 2006.

Kirby and Mitchell(2018)

Kirby, R. C. and Mitchell, L.: Solver composition across the PDE/linear algebra barrier, SIAM J. Sci. Comput., 40, C76–C98, 10.1137/17M1133208, 2018.

Kirby et al.(2012)

Kirby, R. M., Sherwin, S. J., and Cockburn, B.: To CG or to HDG: a comparative study, J. Sci. Comput., 51, 183–212, 2012.

Kronbichler and Wall(2018)

Kronbichler, M. and Wall, W. A.: A performance comparison of continuous and discontinuous Galerkin methods with fast multigrid solvers, SIAM J. Sci. Comput., 40, A3423–A3448, 2018.

Logg and Wells(2010)

Logg, A. and Wells, G. N.: DOLFIN: Automated finite element computing, ACM Trans. Mathe. Softw. (TOMS), 37, 20, 2010.

Logg et al.(2012a)

Logg, A., Mardal, K.-A., and Wells, G.: Automated solution of differential equations by the finite element method: The FEniCS book, Vol. 84, Springer Science & Business Media, 2012a.

Logg et al.(2012b)

Logg, A., Ølgaard, K. B., Rognes, M. E., and Wells, G. N.: FFC: the FEniCS form compiler, Automated Solution of Differential Equations by the Finite Element Method, 227–238, 2012b.

Long et al.(2010)

Long, K., Kirby, R., and van Bloemen Waanders, B.: Unified embedded parallel finite element computations via software-based Fréchet differentiation, SIAM J. Sci. Comput., 32, 3323–3351, 2010.

Mandel(1986)

Mandel, J.: Multigrid convergence for nonsymmetric, indefinite variational problems and one smoothing step, Appl. Mathe. Comput., 19, 201–216, 1986.

Markall et al.(2013)

Markall, G., Slemmer, A., Ham, D., Kelly, P., Cantwell, C., and Sherwin, S.: Finite element assembly strategies on multi-core and many-core architectures, International J. Num. Method. Fluids, 71, 80–97, 2013.

McRae et al.(2016)

McRae, A. T. T., Bercea, G.-T., Mitchell, L., Ham, D. A., and Cotter, C. J.: Automated generation and symbolic manipulation of tensor product finite elements, SIAM J. Sci. Comput., 38, S25–S47, 2016.

Melvin et al.(2010)

Melvin, T., Dubal, M., Wood, N., Staniforth, A., and Zerroukat, M.: An inherently mass-conserving iterative semi-implicit semi-Lagrangian discretization of the non-hydrostatic vertical-slice equations, Q. J. Roy. Meteorol. Soc. A, 136, 799–814, 2010.

Melvin et al.(2019)

Melvin, T., Benacchio, T., Shipway, B., Wood, N., Thuburn, J., and Cotter, C.: A mixed finite-element, finite-volume, semi-implicit discretization for atmospheric dynamics: Cartesian geometry, Q. J. Roy. Meteorol. Soc., 145, 2835–2853, 10.1002/qj.3501, 2019.

Mitchell and Müller(2016)

Mitchell, L. and Müller, E. H.: High level implementation of geometric multigrid solvers for finite element problems: Applications in atmospheric modelling, J. Comput. Phys., 327, 1–18, 2016.

Nair et al.(2005)

Nair, R. D., Thomas, S. J., and Loft, R. D.: A discontinuous Galerkin global shallow water model, Mon. Weather Rev., 133, 876–888, 2005.

Natale and Cotter(2017)

Natale, A. and Cotter, C. J.: A variational H (div) finite-element discretization approach for perfect incompressible fluids, IMA J. Num. Anal, 38, p. drx033, 10.1093/imanum/drx033, 2017.

Natale et al.(2016)

Natale, A., Shipton, J., and Cotter, C. J.: Compatible finite element spaces for geophysical fluid dynamics, Dynam. Stat. Clim. Syst., 1, dzw005, 10.1093/climsys/dzw005, 2016.

Nechaev and Yaremchuk(2004)

Nechaev, D. and Yaremchuk, M.: On the approximation of the Coriolis terms in C-grid models, Mon. Weather Rev., 132, 2283–2289, 2004.

Nédélec(1980)

Nédélec, J.-C.: Mixed finite elements in R3, Num. Mathe., 35, 315–341, 1980.

Prud'Homme et al.(2012)

Prud'Homme, C., Chabannes, V., Doyeux, V., Ismail, M., Samake, A., and Pena, G.: Feel++: A computational framework for galerkin methods and advanced numerical methods, in: ESAIM: Proceedings, 38, 429–455, EDP Sciences, 2012.

Rathgeber et al.(2012)

Rathgeber, F., Markall, G. R., Mitchell, L., Loriant, N., Ham, D. A., Bertolli, C., and Kelly, P. H. J.: PyOP2: A high-level framework for performance-portable simulations on unstructured meshes, in: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, 1116–1123, IEEE, 2012.

Rathgeber et al.(2016)

Rathgeber, F., Ham, D. A., Mitchell, L., Lange, M., Luporini, F., McRae, A. T. T., Bercea, G.-T., Markall, G. R., and Kelly, P. H. J.: Firedrake: automating the finite element method by composing abstractions, ACM Trans. Mathe. Softw. (TOMS), 43, 1–27, 2016.

Raviart and Thomas(1977)

Raviart, P.-A. and Thomas, J.-M.: A mixed finite element method for 2-nd order elliptic problems, in: Mathematical aspects of finite element methods, 292–315, Springer, 1977.

Shipton et al.(2018)

Shipton, J., Gibson, T. H., and Cotter, C. J.: Higher-order compatible finite element schemes for the nonlinear rotating shallow water equations on the sphere, J. Comput. Phys., 375, 1121–1137, 2018.

Skamarock and Klemp(1994)

Skamarock, W. C. and Klemp, J. B.: Efficiency and accuracy of the Klemp-Wilhelmson time-splitting technique, Mon. Weather Rev., 122, 2623–2630, 1994.

Stenberg(1991)

Stenberg, R.: Postprocessing schemes for some mixed finite elements, ESAIM: Mathe. Modell. Num. Anal., 25, 151–167, 1991.

Temperton(1997)

Temperton, C.: Treatment of the Coriolis terms in semi-Lagrangian spectral models, Atmos.-Ocean, 35, 293–302, 1997.

Thomas et al.(2003)

Thomas, S. J., Hacker, J. P., Smolarkiewicz, P. K., and Stull, R. B.: Spectral preconditioners for nonhydrostatic atmospheric models, Mon. Weather Rev., 131, 2464–2478, 2003.

Ullrich et al.(2010)

Ullrich, P. A., Jablonowski, C., and Van Leer, B.: High-order finite-volume methods for the shallow-water equations on the sphere, J. Comput. Phys., 229, 6104–6134, 2010.

Williamson et al.(1992)

Williamson, D. L., Drake, J. B., Hack, J. J., Jakob, R., and Swarztrauber, P. N.: A standard test set for numerical approximations to the shallow water equations in spherical geometry, J. Comput. Phys., 102, 211–224, 1992.

Wood et al.(2014)

Wood, N., Staniforth, A., White, A., Allen, T., Diamantakis, M., Gross, M., Melvin, T., Smith, C., Vosper, S., Zerroukat, M., et al.: An inherently mass-conserving semi-implicit semi-Lagrangian discretization of the deep-atmosphere global non-hydrostatic equations, Q. J. Roy. Meteorol. Soc., 140, 1505–1520, 2014.

Yakovlev et al.(2016)

Yakovlev, S., Moxey, D., Kirby, R. M., and Sherwin, S. J.: To CG or to HDG: a comparative study in 3D, J. Sci. Comput., 67, 192–220, 2016.

Zenodo/Firedrake(2019)

Zenodo/Firedrake: Software used in 'Slate: extending Firedrake's domain-specific abstraction to hybridized solvers for geoscience and beyond', Zenodo, 10.5281/zenodo.2587072, 2019.

Zenodo/Tabula-Rasa(2019)

Zenodo/Tabula-Rasa: Tabula Rasa: experimentation framework for hybridization and static condensation, Zenodo, 10.5281/zenodo.2616031, 2019.