Within the finite element community, discontinuous Galerkin (DG) and mixed finite element methods have become increasingly popular in simulating geophysical flows. However, robust and efficient solvers for the resulting saddle point and elliptic systems arising from these discretizations continue to be an ongoing challenge. One possible approach for addressing this issue is to employ a method known as hybridization, where the discrete equations are transformed such that classic static condensation and local post-processing methods can be employed. However, it is challenging to implement hybridization as performant parallel code within complex models whilst maintaining a separation of concerns between applications scientists and software experts. In this paper, we introduce a domain-specific abstraction within the Firedrake finite element library that permits the rapid execution of these hybridization techniques within a code-generating framework. The resulting framework composes naturally with Firedrake's solver environment, allowing for the implementation of hybridization and static condensation as runtime-configurable preconditioners via the Python interface to the Portable, Extensible Toolkit for Scientific Computation (PETSc), petsc4py. We provide examples derived from second-order elliptic problems and geophysical fluid dynamics. In addition, we demonstrate that hybridization shows great promise for improving the performance of solvers for mixed finite element discretizations of equations related to large-scale geophysical flows.

The development of simulation software is an increasingly important
aspect of modern scientific computing, in the geosciences in particular.
Such software requires a vast range of knowledge spanning several
disciplines, ranging from applications expertise to mathematical
analysis to high-performance computing and low-level
code optimization. Software projects developing automatic code
generation systems have become
quite popular in recent years, as such systems help create a
separation of concerns which
focuses on a particular complexity independent from the rest.
This allows for agile collaboration between computer scientists
with hardware and software expertise, computational scientists
with numerical algorithm expertise, and domain scientists such
as meteorologists, oceanographers and climate scientists.
Examples of such projects in the domain of finite element methods
include FreeFEM++

The finite element method (FEM) is a mathematically robust
framework for computing numerical
solutions of partial differential equations (PDEs) that has
become increasingly popular in fluids and solids models across
the geosciences, with a formulation that is highly
amenable to code generation techniques. A description of the
weak formulation of the PDEs,
together with appropriate discrete function spaces, is enough
to characterize the finite element problem. Both the FEniCS and Firedrake projects employ
the Unified Form Language
(UFL)

There are classes of finite element discretizations resulting
in discrete systems that can be
solved more efficiently by directly manipulating local tensors.
For example, the static condensation
technique for the reduction of global finite element
systems

In this paper, we provide a simple yet effective high-level
abstraction for localized dense linear algebra
on systems derived from finite element problems. Using
embedded DSL technology, we provide a means to
enable the rapid development of hybridization and static
condensation techniques within an automatic
code generation framework. In other words, the main
contribution of this paper is in solving the problem
of automatically translating from the mathematics of
static condensation and hybridization to compiled code.
This automated translation facilitates the separation of
concerns between applications scientists and
computational/computer scientists and facilitates the
automated optimization of compiled code.
This framework provides an environment for the
development and testing of numerics relevant
to the Gung-Ho Project, an initiative by the UK Met
Office in designing the next-generation atmospheric
dynamical core
using mixed finite element methods

The rest of the paper is organized as follows. We
introduce common notation used throughout the paper
in Sect.

We begin by establishing notation used throughout this
paper. Let

For any double-valued vector field

We present an expressive language for dense linear algebra on the elemental matrix systems arising from finite element problems. The language, which we call Slate, provides typical mathematical operations performed on matrices and vectors; hence the input syntax is comparable to high-level linear algebra software such as MATLAB. The Slate language provides basic abstract building blocks which can be used by a specialized compiler for linear algebra to generate low-level code implementations.

Slate is heavily influenced by the Unified Form Language (UFL)

To clarify conventions and the scope of Slate, we start by establishing
our notation for a general finite element form following the convention of

If a given form

In general, a finite element form will consist of integrals
over various geometric domains: integration over cells

Here, we will consider
the case where the interior facet integrands

To make matters concrete, let us suppose

In standard finite element software packages, the element tensor is
mapped entry-wise into a global sparse array using
the cell-node map

Like UFL, Slate relies on the grammar of the host-language: Python.
The entire Slate language is implemented as a Python module
which defines its types (classes) and operations on said types. Together,
this forms a high-level language for expressing dense linear algebra
on element tensors. The Slate language consists of two primary
abstractions for linear algebra:

terminal element tensors corresponding to multi-linear integral forms (matrices, vectors, and scalars) or assembled data (for example, coefficient vectors of a finite element function) and

expressions consisting of algebraic operations on terminal tensors.

In Slate, one associates a
tensor with data on a cell either by using a multi-linear form,
or assembled coefficient data:

Similarly to UFL,
Slate is capable of abstractly
representing arbitrary rank tensors.
However, only rank

Slate supports typical binary and
unary operations in linear algebra,
with a high-level syntax close to mathematics. At the time of
this paper, these include the following.

In Firedrake, Slate expressions are transformed into
low-level code by a

The compiler pass will generate a single “macro” kernel, which performs
the dense linear algebra operations represented in Slate.
The resulting code will also include (often multiple) function
calls to local assembly kernels generated by TSFC (Two Stage Form Compiler)

The Slate language wraps UFL objects describing the finite element system. The resulting Slate expressions are passed to a specialized linear algebra compiler, which produces a single macro kernel assembling the local contributions and executes the dense linear algebra represented in Slate. The kernels are passed to the Firedrake's PyOP2 interface, which wraps the Slate kernel in a mesh-iteration kernel. Parallel scheduling, code generation, and compilation occurs after the PyOP2 layer.

Most optimization of the resulting dense linear algebra code is handled
directly by Eigen. In the case of unary and binary operations such
as

For
more details on solving linear equations
in Eigen, see

We now present examples and discuss solution methods which require
element-wise manipulations of finite element systems and their specification
in Slate. We stress here that Slate is not limited to these model problems;
rather these examples are chosen for clarity and to demonstrate key
features of the Slate language.
For our discussion,
we use a model elliptic equation defined in a computational domain

To motivate our discussion in this section, we start by recalling the mixed method for Eqs. (

The mixed formulation of Eqs. (

Methods to efficiently invert such systems include

The Schur complement, while elliptic, is globally dense due to the fact that

The hybridization technique replaces the original system with a discontinuous
variant, decoupling the velocity degrees of freedom between cells.
This is done by replacing the discrete solution space
for

Next, Lagrange multipliers are introduced as an auxiliary variable in the space

Deriving the hybridizable mixed system is accomplished
through integration by parts over each element

The discrete matrix system arising from Eqs. (

Since both

The matrix

Once

If desired, the solutions can be improved further through local
post-processing. We highlight two such procedures
for

Figure

Firedrake code for solving Eq. (

The HDG method is a natural extension of discontinuous Galerkin (DG) discretizations. Here, we consider a specific HDG discretization, namely the
LDG-H method

Deriving the LDG-H formulation follows exactly from standard DG methods. All prognostic
variables are sought in the discontinuous spaces

The LDG-H method retains the advantages of standard DG methods
while also enabling the assembly of reduced linear systems through
static condensation. The matrix system arising from
Eqs. (

For both mixed

Our first example is a modified version of the procedure presented by

Let

At the time of this work, Firedrake does not
support the construction of such a finite element basis. However,
we can introduce Lagrange multipliers to enforce the orthogonality constraint.
The resulting local problem then becomes the following mixed system: find

This post-processing method produces a new approximation which
superconverges at a rate of

Example of local post-processing using Firedrake and Slate. Here, we locally solve the mixed system defined in Eqs. (

Our second example illustrates a procedure that uses the numerical flux of an HDG
discretization for Eqs. (

Let

Additionally, the divergence of

Slate enables static condensation approaches to be expressed very concisely. Nonetheless, the application of a particular approach to different variational problems using Slate still requires a certain amount of code repetition. By formulating each form of static condensation as a preconditioner, code can be written once and then applied to any mathematically suitable problem. Rather than writing the static condensation by hand, in many cases, it is sufficient to just select the appropriate, Slate-based, preconditioner.

For context, it is helpful to frame the problem
in the particular context of the solver library: PETSc.
Firedrake uses PETSc as its main solver abstraction
framework and can provide

Suppose we wish to solve a linear system:

We will denote the application of a particular Krylov subspace method (

The implementation of preconditioners for the systems considered in
this paper requires the manipulation not of assembled matrices but
rather their symbolic representation. To do this, we use the
preconditioning infrastructure developed by

As discussed in Sect.

More precisely, the incoming system has the form

The

In practice, the only globally coupled system requiring
iterative inversion is

By construction, this preconditioner is suitable for both hybridized mixed and HDG discretizations. It can also be used within other contexts, such as the static condensation
of continuous Galerkin discretizations

The preconditioner

More precisely, let

The application of

Since the hybridizable flux solution is constructed in the broken

We note here that assembly of the right-hand side for the

The situation we are given is that we have

These are the two broken parts of

We now present results utilizing the Slate DSL and our static
condensation preconditioners for a set of test problems. Since we
are using the interfaces outlined in Sect.

The verification of the generated code is performed using
parameter-sensitive convergence tests. The study consists
of running a variety of discretizations spanning the methods
outlined in Sect.

In this section, we take a closer look at the LDG-H method for the model elliptic
equation (sign-definite Helmholtz):

A continuous Galerkin (CG) discretization of the primal problem Eqs. (

Our aim here is not to compare the performance of HDG and CG, which has been investigated
elsewhere (for example, see

To invert the CG system, we use a conjugate gradient solver with
Hypre's BoomerAMG implementation of algebraic multigrid
(AMG) as a preconditioner

To avoid over-solving,
we iterate to a relative tolerance such that the

The total execution time is recorded for the CG and HDG solvers,
which includes the setup time for the AMG preconditioner, matrix assembly, and
the time to solution for the Krylov method. In the HDG case, we include all
setup costs, the time spent building the Schur complement for the traces,
local recovery of the scalar and flux approximations, and post-processing.
The

Comparison of continuous Galerkin and LDG-H solvers for the model three-dimensional
positive-definite Helmholtz equation. Panel

The HDG method of order

The HDG method requires many more degrees of freedom than CG or primal DG methods. This is largely due to the fact that the HDG method simultaneously approximates the primal solution and its velocity. The global matrix for the traces is significantly larger than the one for the CG system at low polynomial order. The execution time for HDG is then compounded by a more expensive global solve.

Breakdown of the CG

Breakdown of the raw timings for the HDG

Figure

Both trace operator and right-hand-side assembly are dominated by the costs of inverting a local square mixed matrix coupling the scalar and velocity unknowns, which is performed directly via an LU factorization. This is also the case for backwards substitution. They should all therefore be of the same magnitude in time spent. We observe that this is the case across all degrees, with times ranging between approximately 6 % and 11 % of total execution time for assembling the condensed system. Back-substitution takes roughly the same time as the static condensation and forward elimination stages (approximately 12 % of execution time on average). Finally, the additional cost of post-processing accrues negligible time (roughly 2 % of execution time across all degrees). This is a small cost for an increase in order of accuracy.

We note that the caching of local tensors does not occur. Each pass to perform the local eliminations and backwards reconstructions rebuilds the local element tensors. It is not clear at this time whether the performance gained from avoiding rebuilding the local operators will offset the memory costs of storing the local matrices. Moreover, in time-dependent problems where the operators may contain state-dependent variables, rebuilding local matrices will be necessary in each time step regardless.

A primary motivator for our interest in hybridizable methods
revolves around developing efficient solvers for problems in geophysical
flows. In this section, we present some results integrating the nonlinear,
rotating shallow water equations on the sphere using test case 5
(flow past an isolated mountain) from

The model equations we consider are the vector-invariant
rotating nonlinear shallow water system defined on
a two-dimensional spherical surface

The Picard updates

In staggered finite difference models, the standard
approach for solving Eq. (

As a test problem, we solve test case 5 of

The number of unknowns to be determined are summarized for each compatible finite element method. Resolution is the same for both methods.

We run for a total of 25 time steps, with a fixed number of four Picard iterations
in each time step. We compare the overall simulation time using two different solver
configurations for the implicit linear system. First, we use a flexible variant of the generalized minimal residual method (GMRES)

We use a flexible version of GMRES on the outer system since we use an additional Krylov solver to iteratively invert the Schur complement.

acting on the system Eq. (Next, we use only the application of our hybridization
preconditioner (no outer Krylov method),
which replaces the original linearized mixed system
with its hybridizable equivalent. After
hybridization, we have the following extended problem
for the Picard updates: find

The resulting three-field problem in
Eqs. (

Preconditioner solve times for a 25-step run with

Breakdown of the cost (average) of a single application of the preconditioned flexible GMRES method and hybridization preconditioner. Hybridization takes approximately the same time per iteration.

Table

We also measure the relative reductions in the problem residual of the linear system Eq. (

The test case was run up to day 15 on a coarser resolution
(20 480 simplicial cells with

As a final example, we consider the simplified atmospheric model obtained
from a linearization of the compressible Boussinesq equations in a
rotating domain:

Our domain consists of a spherical annulus, with the
mesh constructed from a horizontal “base” mesh of the
surface of a sphere of radius

Since our mesh has a natural tensor product structure, we
construct suitable finite element spaces constructed by taking
the tensor product of a horizontal space with a vertical space.
To ensure our discretization is “compatible,” we use the
one- and two-dimensional finite element de Rham complexes:

Snapshots (view from the northern pole) from the isolated mountain
test case. The surface height (m) at days 5, 10, and 15.
The snapshots were generated on a mesh with 20 480 simplicial cells, a

Vertical and horizontal spaces for the three-dimensional
compatible finite element discretization of the linear Boussinesq
model. The

A compatible finite element discretization of Eqs. (

The choice of

To obtain the discrete system, we simply multiply Eqs. (

The resulting matrix equations have the form

The primary difficulty is finding efficient solvers for Eq. (

One strategy proposed by

Our preferred strategy solves the hybridizable formulation
of the system Eq. (

Buoyancy perturbation (

In this final experiment, we repeat a similar study to that
presented in

Note that the range of Courant numbers used
in this paper exceeds what is typical in operational forecast settings
(typically between

Grid setup and discretizations for the acoustic Courant number study. The total number of degrees of freedom (dofs) for the mixed (velocity and pressure) and hybridizable (velocity, pressure, and trace) discretizations are shown in the last two columns (millions). The vertical resolution is fixed across all discretizations.

It was shown in

Number of Krylov iterations to invert the Helmholtz system using

Courant number parameter test run on a fully loaded compute node. Both figures display the hybridized
solver for each discretization, described in Table

For the lower-order methods, the number of iterations to invert

Hybridization avoids this problem entirely: we always construct an
exact Schur complement and only have to worry about solving the trace
system Eq. (

In Fig.

Implicitly treating the Coriolis term has been
discussed for semi-implicit discretizations
of large-scale geophysical flows

We have presented Slate, and shown how this language can be used
to create concise mathematical representations of localized linear algebra on the tensors
corresponding to finite element forms. We have shown how this DSL can be
used in tandem with UFL in Firedrake to implement
solution approaches making use of automated code generation for
static condensation, hybridization, and localized post-processing. Setup and configuration are done at runtime, allowing one to switch in different discretizations at will. In particular,
this framework alleviates much of the difficulty in implementing such methods
within intricate numerical code and paves the way for future low-level optimizations.
In this way, the framework in this paper can be used to help enable the rapid
development and exploration of new hybridization and static condensation techniques
for a wide class of problems.
We remark here that the reduction of global matrices via element-wise algebraic static
condensation, as described in

Our approach to preconditioner design revolves around its composable nature, in that
these Slate-based implementations can be seamlessly incorporated into complicated solution
schemes. In particular, there is current research in the design of dynamical cores for
numerical weather prediction using implementations of hybridization and static condensation
with Slate

In this paper, we have provided some examples of hybridization procedures for
compatible finite element discretizations of geophysical flows. These approaches
avoid the difficulty in constructing sparse approximations of dense elliptic operators.
Static condensation arising from hybridizable formulations can best be interpreted as
producing an

For some tessellation,

The time-stepping scheme follows a Picard iteration semi-implicit approach, where
predictive values of the relevant fields are determined via an explicit step of the
advection equations and corrective updates are generated by solving
an implicit linear system (linearized about a state of rest) for

The implicit midpoint rule time discretization of the nonlinear
rotating shallow water Eqs. (

One approach to construct the residual functionals

This process can be thought of as iteratively solving for the average velocity and depth
that satisfies the implicit midpoint rule discretization. Both Eqs. (

The contribution in this paper is available through open-source software provided by the Firedrake Project:

THG is the principal author and developer of the software presented in this paper and main author of the text. Authors LM and DAH assisted and guided the software abstraction as a domain-specific language and edited text. CJC contributed to the formulation of the geophysical fluid dynamics and the design of the numerical experiments and edited text.

David A. Ham is an executive editor of the journal. The other authors declare they have no other competing interests.

The authors would like to acknowledge funding from the Engineering and Physical Sciences Research Council (EPSRC) and the Natural Environment Research Council (NERC). The authors also wish to thank Andrew T. T. McRae for providing thoughtful comments on early drafts of this paper.

This research has been supported by the Engineering and Physical Sciences Research Council (grant nos. EP/M011054/1, EP/L000407/1, and EP/L016613/1) and the Natural Environment Research Council (grant no. NE/K008951/1).

This paper was edited by Simone Marras and reviewed by two anonymous referees.