One of the fundamental factors contributing to the spatiotemporal inaccuracy in climate modeling is the
mapping of solution field data between different discretizations and numerical grids used in the coupled component
models.
The typical climate computational workflow involves evaluation and serialization of the remapping weights during
the preprocessing step, which is then consumed by the coupled driver infrastructure during simulation to compute
field projections.
Tools like Earth System Modeling Framework (ESMF)

The submitted manuscript has been created by UChicago Argonne, LLC,
Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S.
Department of Energy Office of Science laboratory, is operated under
Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself,
and others acting on its behalf, a paid-up nonexclusive, irrevocable
worldwide license in said article to reproduce, prepare derivative
works, distribute copies to the public, and perform publicly and display
publicly, by or on behalf of the Government. The Department of Energy
will provide public access to these results of federally sponsored
research in accordance with the DOE Public Access Plan
(

Understanding Earth's climate evolution through robust and accurate modeling
of the intrinsically complex, coupled ocean–atmosphere–land–ice–biosphere models
requires extreme-scale computational power

An important consideration is that in addition to maintaining the overall discretization
accuracy of the solution during remapping, global conservation
and sometimes local element-wise conservation for critical quantities

Adaptive block-structured cubed-sphere or unstructured refinement of icosahedral/polygonal
meshes

E3SM coupled climate solver:

The hub-and-spoke centralized model as shown in Fig.

Compute the projection or remapping weights for a solution field from a source component physics to a target component physics as an “offline process”.

During runtime, the CIME coupled solver loads the remapping weights from a file and handles the partition-aware “communication and weight matrix” application to project coupled fields between components.

The first task in this workflow is currently accomplished through a variety of standard state-of-the-science
tools such as the Earth Science Modeling Framework (ESMF)

The paper is organized as follows. In Sect.

Conservative remapping of nonlinearly coupled solution fields is a critical task to
ensure consistency and accuracy in climate and numerical weather prediction simulations

Depending on whether (global or local) conservation is important, and if higher-order, monotone interpolators
are required, there are several consistent algorithmic options that can be used

NC/GC: solution interpolation approximations:

NC: (approximate or exact) nearest-neighbor interpolation;

NC/GC: radial basis function (RBF)

GC: consistent finite element (FE) interpolation (bilinear, biquadratic, etc.) with area renormalization.

LC: mass- (

embedded finite element (FE), finite difference (FD), and finite volume (FV) meshes in adaptive computations;

intersection-based field integrators with consistent higher-order discretization

constrained projections to ensure conservation

Typically, in climate applications, flux fields are interpolated using first-order (locally)
conservative interpolation, while other scalar fields use non-conservative but higher-order
interpolators (e.g., bilinear or biquadratic).
For scalar solutions that do not need to be conserved, consistent FE interpolation,
patch-wise reconstruction schemes

In general, remapping implementations have three distinct steps to accomplish the projection of solution fields from a source to a target grid. First, the target points of interest are identified and located in the source grid, such that the target cells are a subset of the covering (source) mesh. Next, an intersection between this covering (source) mesh and the target mesh is performed, in order to calculate the individual weight contribution to each target cell, without approximations to the component field discretizations that can be defined with arbitrary-order FV or FE basis. Finally, application of the weight matrix yields the projection required to conservatively transfer the data onto the target grid.

To illustrate some key differences between some NC to GC or LC schemes,
we show a 1-D Gaussian hill solution, projected onto
a coarse grid through linear basis interpolation and weighted least-squares (

An illustration comparing point interpolation vs.

While there is a delicate balance in optimizing the computational efficiency of these operations without sacrificing the
numerical accuracy or consistency of the procedure, several researchers have implemented
algorithms that are useful for a variety of problem domains.
In recent years, the growing interest to rigorously tackle coupled multiphysics applications
has led to research efforts focused on developing new regridding algorithms. The Data Transfer Kit
(DTK;

In Earth science applications, the state-of-the-science regridding tool that is often used by many
researchers is the ESMF (

Currently, the E3SM components are integrated together in a hub-and-spoke model
(Fig.

Similar to the CIME-MCT driver used by E3SM, OASIS3-MCT

ESMF and SCRIP traditionally handle only cell-centered data that target finite volume
discretizations (FV-to-FV projections), with first- or second-order conservation constraints.
Hence, generating remapping weights for
atmosphere–ocean grids with a spectral element (SE) source grid definition requires
generation of an intermediate and spectrally equivalent “dual” grid, which
matches the areas of the polygons to the weight of each Gauss–Lobatto–Legendre (GLL) nodes

To calculate remapping weights directly for high-order spectral element grids, E3SM uses the
TempestRemap C++ library (

Even though ESMF and OASIS3-MCT have been used in online remapping studies, weight generation as part of a preprocessing step currently remains the preferred workflow for many production climate models. While this decoupling provides flexibility in terms of choice of remapping tools, the data management of the mapping files for different discretizations, field constraints, and grids can render provenance, reproducibility, and experimentation a difficult task. It also precludes the ability to handle moving or dynamically adaptive meshes in coupled simulations. However, it should be noted that the shift of the remapping computation process from a preprocessing stage in the workflow, to the simulation stage, imposes additional onus on the users to better understand the underlying component grid properties, their decompositions, the solution fields being transferred, and the preferred options for computing the weights. This also raises interesting workflow modifications to ensure verification of the online weights such that consistency, conservation, and dissipation of key fields are within user-specified constraints. In the implementation discussed here, the online remapping computation uses the exact same input grids and specifications like the offline workflow, along with ability to write the weights to file, which can be used to run detailed verification studies as needed.

There are several challenges in scalably computing the regridding operators in parallel, since
it is imperative to have both a mesh- and partition-aware data structure to handle this part of the regridding workflow.
A few climate models have begun to calculate weights online as part of their regular operation.
The ICON GCM

In the E3SM workflow supported by CIME, the ESMF regridder understands the component grid definitions and generates the weight matrices (offline). The CIME driver loads these operators at runtime and places them in MCT data types, which treat them as discrete operators to compute the interpolation or projection of data on the target grids. Additional changes in conservation requirements or monotonicity of the field data cannot be imposed as a runtime or post-processing step in such a workflow. In the current work, we present a new infrastructure with scalable algorithms implemented using the MOAB mesh library and TempestRemap package to replace the ESMF-E3SM-MCT remapper/coupler workflow. A detailed review of the algorithmic approach used in the MOAB-TempestRemap (MBTR) workflow, along with the software interfaces exposed to E3SM, is presented next.

Efficient, conservative, and accurate multi-mesh solution transfer workflows

Fully online remapping capability within a complex ecosystem such as E3SM requires
a flexible infrastructure to generate the projection weights. In order to fulfill
these needs, we utilize the MOAB mesh data structure combined with the TempestRemap
libraries in order to provide an in-memory remapping layer to dynamically compute
the weight matrices during the setup phase of the simulations for static source–target
grid combinations. For dynamically adaptive and moving grids, the remapping operator
can be recomputed at runtime as needed. The introduction of such a software stack allows
higher-order conservation of fields while being able to transfer and maintain field
relations in parallel, within the context of the fully decomposed mesh view. This is
an improvement to the E3SM workflow where MCT is oblivious to the underlying mesh
data structure in the component models. Having a fully mesh-aware implementation
with element connectivity and adjacency information, along with parallel ghosting
and decomposition information, also provides opportunities to implement dynamic load-balancing
algorithms to gain optimal performance on large-scale machines. Without the mesh topology,
MCT is limited to performing trivial decompositions based on global ID spaces during mesh
migration operations. YAC interpolator

MOAB is a fully distributed, compact, array-based mesh data structure, and the local
entity lists are stored in ranges along with connectivity and ownership information,
rather than explicit lists, thereby leading to a high degree of memory compression.
The memory constraints per process scale well in parallel

In order to illustrate the online remapping algorithm implemented with the MOAB-TempestRemap
infrastructure, we define the following terms. Let

In the following sections, the new E3SM online remapping interface implemented with a combination of the MOAB and TempestRemap libraries is explained. Details regarding the algorithmic aspects to compute conservative, high-order remapping weights in parallel, without sacrificing discretization accuracy on next-generation hardware, are presented.

Within the E3SM simulation ecosystem, there are multiple component models (atmosphere–ocean–land–ice–runoff)
that are coupled to each other. While the MCT infrastructure primarily manages the global degree-of-freedom
(DoF) partitions without a notion of the underlying mesh, the new MOAB-based coupler infrastructure provides
the ability to natively interface to the component mesh and intricately understand the field DoF data layout
associated with each model. MOAB can recognize the difference between values on a cell center and values on a
cell edge or corner. In the current work, the MOAB mesh database has been used to create the relevant
integration abstraction for the High-Order Methods Modeling Environment (HOMME) atmosphere model

MOAB can handle the finite element zoo of elements on a sphere (triangles, quadrangles, and
polygons), making it an appropriate layer to store both the mesh layout (vertices, elements,
connectivity, adjacencies) and the parallel decomposition for the component models along with
information on shared and ghosted entities. While having a uniform partitioning methodology
across components may be advantageous for improving the efficiency of coupled climate simulations,
the parallel partition of the meshes are chosen according to the requirements in individual component
solvers. Figure

MOAB representation of partitioned component meshes.

The coupled field data that are to be remapped from the source grid to the target grid also need
to be serialized as part of the MOAB mesh database in terms of an internally contiguous MOAB
data storage structure named a “tag”

E3SM's driver supports multiple modes of partitioning the various components in the global processor space. This is
usually fine tuned based on the estimated computational load in each physics, according to the problem case definition.
A sample process-execution (PE) layout for a E3SM run on 9000 processes with atmosphere (ATM)
on 5400 and ocean (OCN) on 3600 processes is
shown in Fig.

Example E3SM process execution layout for a problem case.

For illustration, let

Migration strategies to
repartition from

We show an example of a decomposed ocean mesh (polygonal MPAS mesh) that is replicated in a E3SM problem case run
on two processes in Fig.

mesh migration from component to coupler involving communication between

computing the coverage mesh requiring gather/scatter of source mesh elements to cover local target elements.

Standard approaches to compute the intersection of two convex polygonal meshes involve the creation of a
Kd-tree

The intersection algorithm used in this paper follows the ideas from

While the advancing-front algorithm is not restricted to convex cells, the intersection computation is simpler if they are strictly convex. If concave polygons exist in the initial source or target meshes, they are recursively decomposed into simpler convex polygons, by splitting along interior diagonals. Note that the intersection between two convex polygons results in a strictly convex polygon. Hence, the underlying intersection algorithm remains robust to resolve even arbitrary non-convex meshes covering the same domain space.

Illustration of the advancing-front intersection algorithm.

Figure

This flooding-like advancing front needs a stable and robust methodology of intersecting
edges/segments in two cells that belong to different meshes. Any pair of segments that
intersect can appear in four different pairs of cells. A list of intersection points
is maintained on each target edge, so that the intersection points are unique. Also,
a geometric tolerance is used to merge intersection points that are close to each
other or if they are proximal to the original vertices in both meshes. Decisions
regarding whether points are inside, outside, or at the boundary of a convex enclosure
are handled separately. If necessary, more robust techniques such as adaptive
precision arithmetic procedures used in Triangle

Meshes that appear in climate applications are often on a sphere. Cell edges are
considered to be great circle arcs. A simple gnomonic projection is used to
project the edges on one of the six planes parallel to the coordinate axis and
tangent to the sphere

Existing infrastructure from MOAB

We select the target mesh as the driver for redistribution of the source mesh.
On each task, we first compute the bounding box of the local target mesh. This
information is then gathered and communicated to all coupler PEs and used for redistributing the local source mesh.
Cells that intersect the bounding boxes of other processors are sent to the
corresponding owner task using the aggregating crystal router algorithm
that is particularly efficient in performing all-to-all strategies with

This workflow guarantees that the target mesh on each processor is completely
enveloped by the covering mesh repartitioned from its original source mesh decomposition,
as shown in Fig.

Source coverage mesh fully covers local target mesh; local intersection proceeds between the source atmosphere (quadrangle) and the target ocean (polygonal) grids.

Once the relevant covering mesh is accumulated locally on each process, the
intersection computation can be carried out in parallel, completely independently,
using the advancing-front algorithm (Sect.

The parallel advancing-front algorithm presented here to globally compute the intersection supermesh can
be extended to expose finer-grained parallelism using hybrid-threaded (OpenMP) programming or a task-based
execution model, where each task handles a unique front in the computation queue. Such task or
hybrid-threaded parallelism can be employed in combination with the MPI-based mesh decompositions. Using local
partitions computed with Metis

For illustration, consider a scalar field

MOAB supports point-wise FEM interpolation (bilinear and higher-order spectral) with
local or global subset normalization

It is also possible to use the transpose of the remapping operator computed between a particular source and target component combination to project the solution back to the original source grid. Such an operation has the advantage of preserving the consistency and conservation metrics originally imposed in finding the remapping operator and reduces computation cost by avoiding recomputation of the weight matrix for the new directional pair. For example, when computing the remap operator between atmosphere and ocean models (with holes), it is advantageous to use the atmosphere model as the source grid, since the advancing-front seed computation may require multiple trials if the initial front begins within a hole in the source mesh. Given that the seed or the initial cell determination on the target mesh is chosen at random, the corresponding intersecting cell on the source mesh found through a linear search could be contained within a hole in the source mesh. In such a case, a new target cell is then chosen and the source cell search is repeated. Hence, multiple trials may be required for the advancing-front algorithm to start propagating, depending on the mesh topology and decomposition. Note that the linear search in the source mesh can easily be replaced with a Kd-tree data structure to provide better computational complexity for cases where both source and target meshes have many holes. Additionally, such transpose vector applications can also make the global coupling symmetric, which may have favorable implications when pursuing implicit temporal integration schemes.

The remapping algorithms presented in the previous section are exposed through a
combination of implementations in MOAB and TempestRemap libraries. Since both
libraries are written in C++, direct inheritance of key data structures such as the
GridElements (mesh) and OfflineMap (projection weights) are available to minimize
data movement between the libraries. Additionally, Fortran codes such as E3SM
can invoke computations of the intersection mesh and the remapping weights through
specialized language-agnostic interfaces in MOAB:

Using the

Once the remapping operator is serialized in memory for each coupled scalar and flux field, this operator is then used at every time step to compute the actual projection of the data.

Additionally, to facilitate offline generation of projection weights, a MOAB-based parallel
tool

Evaluating the performance of the in-memory MBTR remapping infrastructure requires recursive profiling and optimization to ensure scalability for large-scale simulations. In order to showcase the advantage of using the mesh-aware MOAB data structure as the MCT coupler replacement, we need to understand the per task performance of the regridder in addition to the parallel point locator scalability and overall time for remapping weight computation. Note that except for the weight application for each solution field from a source grid to a target grid, the in-memory copy of the component meshes, migration to coupler PEs, computation of intersection elements, and remapping weights is done only once during the setup phase in E3SM, per coupled component model pair.

We compare the total cost for computing the supermesh and the remapping weights for several source and target grid combinations through three different methods to determine the serial computational complexity.

ESMF: Kd-tree-based regridder and weight generation for first-/second-order
FV

TempestRemap: Kd-tree-based supermesh generation and conservative, monotonic,
high-order remap operator for FV

MBTempest: advancing-front intersection with MOAB and conservative weight generation with TempestRemap interfaces.

Comparison of serial regridding computation (supermesh and projection weight generation) between ESMF, TempestRemap, and MBTempest.

Figure

In addition to being able to compute the supermesh between

Setup phase: use Kd tree to build the search data structure to locate points corresponding to vertices in the target mesh on the source mesh.

Run phase: use the elements containing the located points to compute consistent interpolation onto target mesh vertices.

Studies were performed to evaluate the strong and weak scalability of the parallel
Kd-tree point search implementation in MOAB. The scalability results were generated
with the CIAN2 (

The performance tests were executed on the IBM BlueGene/Q Mira at 16 MPI ranks
per node, with 2 GB RAM per MPI rank, at up to 500 K MPI processes. The
strong scaling results and error convergence were computed with a grid size of

MOAB 3-D Kd-tree point location: strong scaling on Mira (BG/Q).

First, the root-mean-square (rms) error was measured in the bilinearly interpolated solution
against the analytical solution and plotted for different source and target mesh resolutions.
Figure

The full 3-D point location and interpolation operations provided by MOAB are comparable
to the implementation in the common remapping component used in the C-Coupler

The MBTR online weight generation workflow within E3SM was employed to verify and test the projection of real simulation data generated during the coupled atmosphere–ocean model runs. A choice was made to use the model-computed temperature on the lowest level of the atmosphere, since the heat fluxes that nonlinearly couple the atmosphere and ocean models are directly proportional to this interface temperature field. By convention, the fluxes are computed on the ocean mesh, and hence the atmosphere temperature must be interpolated onto MPAS polygonal mesh. We use this scenario as a test case for demonstrating the strong scalability results in this section.

The atmosphere run with approximately 4

Projection of the NE11 SE bottom atmospheric temperature field onto the MPAS ocean grid.

The strong scaling studies for computation of remapping weights to project a FV
solution field between CS grids of varying resolutions were performed on the Blues
large-scale cluster (with 16 Sandy Bridge Xeon E5-2670 2.6 GHz cores and 32 GB RAM
per node) at ANL and the Cori supercomputer at NERSC (with 64 Haswell Xeon E5-2698v3
2.3 GHz cores and 128 GB RAM per node). Figure

CS (

The relatively better scaling for MOAB on the Blues cluster is due to faster
hardware and memory bandwidth compared to the Cori machine. The strong scaling
efficiency approaches a plateau on Cori Haswell nodes as communication costs
for the coverage mesh computation start dominating the overall remapping processes,
especially in the limit of

To further evaluate the characteristics of in-memory remapping computation, along with cost of application of the weights during a transient simulation, a series of further studies was executed on the NERSC Cori system to determine the spectral projection of a real dataset between atmosphere and ocean components in E3SM. The source mesh contains fourth-order spectral element temperature data defined on Gauss–Lobatto quadrature nodes (cGLL discretization) of the CS mesh, and the projection is performed on a MPAS polygonal mesh with holes (FV discretization). A direct comparison to ESMF was unfeasible in this study since the traditional workflow requires the computation of a dual mesh transformation of the spectral grid. Hence, only timing for MBTR workflow is shown here.

Two specific cases were considered for this SE

Case A (NE30): 1

Case B (NE120):

The performance tests for each of these cases were launched with three different
process execution layouts for the atmosphere, ocean components, and the coupler.

Fully colocated PE layout:

Disjoint-ATM model PE layout:

Disjoint-OCN model PE layout:

Strong scaling on Cori for SE

A breakdown of computational time for key tasks on Cori with up to 1024
processes for both cases is tabulated in Table

The component-wise breakdown for the advancing-front intersection mesh, the parallel
communication graph for sending and receiving data between component and coupler,
and finally, the remapping weight generation for the SE

Strong scaling study for the NE30 and NE120 cases for spectral projection with Zoltan repartitioner on Cori.

Scaling of the communication kernels driven with the parallel graph computed with a
trivial redistribution and the Zoltan geometric (RCB) repartitioner for the NE120 case with

Another key aspect of the results shown in Fig.

In order to determine the effect of partitioning strategies described in
Fig.

Generally, operations involving SpMV products are memory bandwidth limited

SE

Understanding and controlling primary sources of errors in a coupled system dynamically will be key to achieving predictable and verifiable climate simulations on emerging architectures. Traditionally, the computational workflow for coupled climate simulations has involved two distinct steps, with an offline preprocessing phase using remapping tools to generate solution field projection weights (ESMF, TempestRemap, SCRIP), which are then consumed by the coupler to transfer field data between the component grids.

The offline steps include generating grid description files and running the offline tools with the problem-specific options. Additionally many of state-of-the-science tools such as ESMF and SCRIP require additional steps to specially handle interpolators from SE grids. Such workflows create bottlenecks that do not scale and can inhibit scientific research productivity. When experimenting with refined grids, a goal for E3SM, this tool chain has to exercised repeatedly. Additionally, when component meshes are dynamically modified, either through mesh adaptivity or dynamical mesh movement to track moving boundaries, the underlying remapping weights must be recomputed on the fly.

To overcome some of these limitations, we have presented scalable algorithms and software interfaces to create a direct component coupling with online regridding and weight generation tools. The remapping algorithms utilize the numerics exposed by TempestRemap and leverage the parallel mesh handling infrastructure in MOAB to create a scalable in-memory remapping infrastructure that can be integrated with existing coupled climate solvers. Such a methodology invalidates the need for dual grids, preserves higher-order spectral accuracy, and locally conserves the field data, in addition to monotonicity constraints, when transferring solutions between grids with non-matching resolutions.

The serial and parallel performance of the MOAB advancing-front intersection algorithm with
linear complexity (

Information on the availability of source code for the algorithmic infrastructure and models featured in this paper is tabulated below.

E3SM

MOAB

TempestRemap

The video supplements for the serial and parallel advancing-front mesh intersection
algorithms to compute the supermesh (

Serial advancing-front mesh intersection: intersection between CS and MPAS grids on a single
task is illustrated;

Parallel advancing-front mesh intersection: simultaneous parallel intersections between CS and
MPAS grids on two different tasks are illustrated side by side;

VSM and RJ wrote the paper (with comments from IG and JS). VSM and IG designed and implemented the MOAB integration with TempestRemap library, along with exposing the necessary infrastructure for online remapping through iMOAB interfaces. IG and JS configured the MOAB-TempestRemap remapper within E3SM and verified weight generation to transfer solution fields between atmosphere and ocean component models. VSM conducted numerical verification studies and executed both the serial and parallel scalability studies on Blues and Cori LCF machines to quantify performance characteristics of the remapping algorithms. The broader project idea was conceived by Andy Salinger (SNL), RJ, VSM, and IG.

The authors declare that they have no conflict of interest.

This research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the US Department of Energy under contract DEAC02-06CH11357 and resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the US Department of Energy under contract no. DEAC02-05CH11231. We gratefully acknowledge the computing resources provided on Blues, a high-performance computing cluster operated by the Laboratory Computing Resource Center at Argonne National Laboratory. We would also like to thank Dr. Paul Ullrich at University of California, Davis, for several helpful discussions regarding remapping schemes and implementations in TempestRemap.

This research has been supported by the Department of Energy, Labor and Economic Growth (grant no. FWP 66368). Funding for this work was provided by the Climate Model Development and Validation – Software Modernization project, and partially by the SciDAC Coupling Approaches for Next Generation Architectures (CANGA) project, which is funded by the US Department of Energy (DOE) and Office of Science Biological and Environmental Research programs. CANGA is also funded by the DOE Office of Advanced Scientific Computing Research (ASCR).

This paper was edited by Sophie Valcke and reviewed by two anonymous referees.