Introduction

GMD

Geoscientific Model Development

GMD

Geosci. Model Dev.

1991-9603

Copernicus GmbH

Göttingen, Germany

10.5194/gmd-8-1005-2015

libmpdata++ 1.0: a library of parallel MPDATA solvers for systems of generalised transport equations

Jaruga

ajaruga@igf.fuw.edu.pl Arabas

https://orcid.org/0000-0003-2361-0082

Jarecka

Pawlowska

hanna.pawlowska@igf.fuw.edu.pl

https://orcid.org/0000-0002-5345-778X

Smolarkiewicz

P. K.

Waruszewski

1Institute of Geophysics, Faculty of Physics, University of Warsaw, Warsaw, Poland 2National Center for Atmospheric Research, Boulder, CO, USA 3European Centre for Medium-Range Weather Forecasts, Reading, UK

A. Jaruga (ajaruga@igf.fuw.edu.pl) and H. Pawlowska (hanna.pawlowska@igf.fuw.edu.pl)

8April2015

8 4 10051032 18August2014 26November2014 2March2015 3March2015

This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/

This article is available from https://gmd.copernicus.org/articles/8/1005/2015/gmd-8-1005-2015.html

The full text article is available as a PDF file from https://gmd.copernicus.org/articles/8/1005/2015/gmd-8-1005-2015.pdf

This paper accompanies the first release of libmpdata++, a C++ library implementing the multi-dimensional positive-definite advection transport algorithm (MPDATA) on regular structured grid. The library offers basic numerical solvers for systems of generalised transport equations. The solvers are forward-in-time, conservative and non-linearly stable. The libmpdata++ library covers the basic second-order-accurate formulation of MPDATA, its third-order variant, the infinite-gauge option for variable-sign fields and a flux-corrected transport extension to guarantee non-oscillatory solutions. The library is equipped with a non-symmetric variational elliptic solver for implicit evaluation of pressure gradient terms. All solvers offer parallelisation through domain decomposition using shared-memory parallelisation.

The paper describes the library programming interface, and serves as a user guide. Supported options are illustrated with benchmarks discussed in the MPDATA literature. Benchmark descriptions include code snippets as well as quantitative representations of simulation results. Examples of applications include homogeneous transport in one, two and three dimensions in Cartesian and spherical domains; a shallow-water system compared with analytical solution (originally derived for a 2-D case); and a buoyant convection problem in an incompressible Boussinesq fluid with interfacial instability. All the examples are implemented out of the library tree. Regardless of the differences in the problem dimensionality, right-hand-side terms, boundary conditions and parallelisation approach, all the examples use the same unmodified library, which is a key goal of libmpdata++ design. The design, based on the principle of separation of concerns, prioritises the user and developer productivity. The libmpdata++ library is implemented in C++, making use of the Blitz++ multi-dimensional array containers, and is released as free/libre and open-source software.

Introduction

The MPDATA advection scheme introduced in has grown into a family of numerical algorithms for geosciences and beyond see for example. MPDATA stands for multi-dimensional positive-definite advection transport algorithm

In fact, MPDATA is sign-preserving, rather than merely positive-definite, but for historical reasons the name remains unchanged.

. It is a finite-difference/finite-volume algorithm for solving the generalised transport equation ∂t(Gψ)+∇⋅(Guψ)=GR. Equation () describes the advection of a scalar field ψ in a flow with velocity u. The field R on the right-hand side (rhs) is a total of source/sink terms. The scalar field G can represent the fluid density, the Jacobian of coordinate transformation or their product and satisfies the equation ∂t(G)+∇⋅(Gu)=0. In the homogeneous case (R≡0), MPDATA is at least second-order-accurate in space and time, conservative and non-linearly stable.

The history of MPDATA spans 3 decades: –, and is widely documented in the literature – see , and for reviews. Notwithstanding, from the authors' experience the software engineering aspects still overshadow the benefits of MPDATA. To facilitate the use of MPDATA schemes, hereby we present a new implementation of the MPDATA family of algorithms for regular structured grids – libmpdata++.

In the development of libmpdata++ we strive to comply with the best sought-after practices among the scientific community , in particular with the paradigm of maximising code reuse. This paradigm is embodied in the “open source computational libraries – the main foundation upon which academic and also a significant part of industrial computational research rests” .

The libmpdata++ has been developed in C++

In the C++11 revision of the language.

, making extensive use of object-oriented programming (OOP) and template programming. The primary goals when designing libmpdata++ were to maintain strict separation of concerns and to reproduce within the code the mathematical “blackboard abstractions” used for documenting numerical algorithms. The adopted design contributes to the readability, maintainability and conciseness of the code. The current development of libmpdata++ is an extension of the research on OOP implementation of the basic MPDATA scheme presented in .

The goal of this article is twofold: first, to document the library interface by providing usage examples and, second, to validate the correctness of the implementation by verifying the results against published benchmarks.

The structure of the paper is as follows. Section outlines the library design. The four sections that follow correspond to four types of equation systems solved by the implemented algorithms, namely homogeneous advective transport, inhomogeneous transport, transport with prognosed velocity, and systems featuring elliptic pressure equation. Each of these sections outlines the implemented algorithms, describes the library interface and provides usage examples. Each example is accompanied with a definition of the solved problem, description of the program code and discussion of the results. An index of libmpdata++ options documented in the article is provided in Appendix A.

The paper structure reflects the solver inheritance hierarchy in libmpdata++. All features discussed in preceding sections apply to the one that follows. The set of discussed problems was selected to match the tutorial structure of the paper. The presentation begins with simple examples focusing on the basic library interface. Subsequent examples use increasingly more complicated cases with the most complex reflecting potential for applications to cloud dynamics .

The library and programs used to generate all results presented in the paper are released as free and open-source software – see the section on code availability at the end of the paper.

Library design Dependencies and supported platforms

The libmpdata++ package is a header-only C++ library. It is built upon the Blitz++

see http://sf.net/projects/blitz/

array containers. We refer the reader to the Blitz++ documentation for description of the Blitz++ interface, to which the user is exposed while working with libmpdata++. The libmpdata++ core also depends on several components of the Boost

see http://boost.org/

library collection, however these are used internally only. Output handlers included in the library depend additionally on gnuplot-iostream

see http://gitorious.org/gnuplot-iostream/

and HDF5

see http://hdfgroup.org/HDF5/

, but their use is optional. Example programs discussed within this article require gnuplot

see http://gnuplot.info/

, ParaView

see http://paraview.org/

, Python, and the following Python packages: h5py

see http://h5py.org/

, matplotlib

see http://matplotlib.org/

and scipy

see http://scipy.org/

The library code requires a C++11-compliant compiler. In the current development workflow, we employ continuous integration on Linux with GNU g++

see http://gcc.gnu.org/

and LLVM clang++

see http://llvm.org/

compilers and on Apple OSX with the Apple clang++

see http://apple.com/xcode

compiler. Consequently, these are considered the supported platforms.

Components

Inheritance diagram of classes mentioned in the paper. Classes defined within libmpdata++ have their names surrounded with black frames. The coupled_harmosc class is an example of a user-defined class defined out of the library tree. The solid black lines show the inheritance relations. The output label depicts any of the output handlers available in libmpdata++.

Components of the library are grouped as follows.

Solvers:

mpdata intended for solving homogeneous transport problems (Sect. )

mpdata_rhs extending the above with rhs term handling (Sect. )

mpdata_rhs_vip adding prognosed-velocity support (Sect. )

mpdata_rhs_vip_prs further extending the above with elliptic pressure equation solvers (Sect. ).

Output handlers:

gnuplot offering direct communication with the gnuplot program with no intermediate output files

hdf5 offering basic HDF5 output compatible with netCDF

see http://www.unidata.ucar.edu/software/netcdf/

readers

hdf5_xdmf implementing the eXtensible Data Model and Format

see http://xdmf.org/

standard supported for instance by the ParaView visualisation tool.

Boundary conditions:

cyclic implementing periodic boundaries

open giving zero-divergence condition on domain edges

polar applicable with spherical coordinates.

Concurrency handlers:

serial for single-thread operation

cxx11_thread for multi-threading using C++11 Thread support library

boost_thread for multi-threading using Boost.Thread

openmp for multi-threading using OpenMP

threads that defaults to openmp if supported by the compiler and falls back to boost_thread otherwise.

Performing integration with libmpdata++ requires choosing one of the solvers, one output handler, one boundary condition per each domain edge and one concurrency handler.

The inheritance diagram in Fig. shows relationships between libmpdata++ solvers defined within the library. The diagram includes as well an example user-defined class coupled_harmosc defined out of the library tree. The mpdata solver is displayed at the top, as it is the base class for all other classes.

Computational domain and grid

The arrangement of the computational domain used in libmpdata++ is shown in Fig. . The initial condition for the dependent variable ψ is assumed to be known in nx×ny data points. The outermost data points are located at the boundaries of the domain.

The dual, staggered Arakawa-C grid used in libmpdata++ is shown in Fig. . In this spatial discretisation approach, the cell-mean values of the scalar fields ψ, and G reside in the centres of computational cells, – corresponding to the data points of the primary grid in Fig. – whereas the components of the velocity field u are specified at the cell edges of the dual grid in Fig. 3.

Schematic of a 2-D computational domain. Bullets mark the data points for the dependent variable ψ in Eq. (), solid lines depict edges of primary grid and dashed lines mark edges of dual grid in Fig. .

Error and progress reporting

There are several error-handling mechanisms used within libmpdata++.

First, there are sanity checks within the code implemented using static_assert() calls. These are reported during compilation, for instance when invalid values of compile-time parameters are supplied.

Second, there are available numerous run-time sanity checks, implemented using assert() calls. These are often time-consuming and are not intended to be executed in production runs. To disable them, one needs to compile the program using libmpdata++ with the -DNDEBUG compiler flag. Examples of such checks include detection of NaN values within the model state variables, which may be useful to trace origins of numerical instability problems.

Third, the user may chose to activate the Blitz++ debug mode that enables run-time array range checks. Activating Blitz++ debug mode requires compiling the program using libmpdata++ with the -DBZ_DEBUG flag and linking with libblitz.

Finally, libmpdata++ reports run-time errors by throwing std::runtime_error exceptions.

Simulation progress is communicated to the user by continuously updating the process threads' names with the percentage of work completed (can be observed e.g. by invoking top -H).

A schematic of a 2-D Arakawa-C grid. Bullets denote the cell centres and dashed lines denote the cell walls corresponding to the dual grid in Fig. .

Advective transport

The focus of this section is on the advection algorithm used within libmpdata++. Section provides a short introduction to the implemented MPDATA scheme. Section describes the library interface needed for the homogeneous transport cases. The following Sects. – show examples of usage of libmpdata++ along with the references to other MPDATA benchmarks.

Implemented algorithms

This subsection is intended to provide the reader with an outline of selected MPDATA features that correspond to the options presently available in libmpdata++. For the full derivation of the scheme and its options see the reviews in and , whereas for an extended discussion of stability, positivity and convexity see .

In the present implementation, it is assumed that G is constant in time. Consequently, the governing homogeneous transport Eq. () can be written as ∂tψ+1G∇⋅(Guψ)=0. This particular form is solved by the mpdata solver of libmpdata++.

The following paragraphs will focus on the algorithms used for handling Eq. (). The rules for applying source and sink terms are presented in Sect. .

Basic MPDATA

MPDATA is an, at least, second-order-accurate iterative scheme in which all iterations take the form of a first-order-accurate donor-cell pass alias upwind, upstream; cf.Sect. 20.1.3. For the one-dimensional

One-dimensional case was chosen for simplicity, multi-dimensional MPDATA formulæ can be found in Sect. 2.2.

case, after the discretisation in space (subscripts i) and time (superscripts n), the donor-cell pass applied to Eq. () yields ψin+1=ψin-1Gi[F(ψin,ψi+1n,Gi+1/2,ui+1/2n+1/2)-F(ψi-1n,ψin,Gi-1/2,ui-1/2n+1/2)]. The flux function F is defined as F(ψL,ψR,G,u)≡[u]+ψL+[u]-ψRGΔtΔx, where [u]+≡max(u,0) and [u]-≡min(u,0).

In the case of a time-varying velocity field, the velocity components are evaluated at an intermediate time level denoted by the n+1/2 superscript in Eq. (). Association of the velocity components with dual-cell edges is denoted by fractional indices i+1/2 and i-1/2; see Fig. .

Hereafter, GuΔtΔx is written compactly as GC, where C denotes the Courant number. GC is referred to as the advector, while the scalar field ψ as the advectee – the nomenclature adopted after .

Evaluation of Eq. () concludes the first pass of MPDATA. To compensate for the implicit diffusion of the donor-cell pass, the subsequent passes of MPDATA reuse Eqs. () and (), but with ψ replaced with the result of the preceding pass and u replaced with the “anti-diffusive” pseudo-velocity. The pseudo-velocity is analytically derived by expanding Eq. () in the second-order Taylor series about spatial point i and time level n, and representing the leading, dissipative truncation error as an advective flux; see for a derivation. A single corrective pass ensures second-order accuracy in time and space. Subsequent corrective passes decrease the amplitude of the leading error, within second-order accuracy. The one-dimensional formula for the basic antidiffusive advector is written as GCi+1/2k+1=GCi+1/2k-GCi+1/2k20.5(Gi+1+Gi)ψi+1k-ψikψi+1k+ψik, where k numbers MPDATA passes. For k=1, Ck is the flow-velocity-based Courant number, whereas for k>1, Ck is the pseudo-velocity-based Courant number. The number of corrective passes can be chosen within libmpdata++.

The library features two implementations of the donor-cell algorithm defined by Eqs. () and (). The default one is a “straightforward” summation. The alternative, more resource-intensive, is the compensated summation algorithm of which reduces round-off error arising when summing numbers of different magnitudes.

Third-order-accurate variant

Accounting for third-order terms in the Taylor series expansion while deriving the pseudo-velocity improves the accuracy of MPDATA. When G≡1, u=constant and three or more corrective passes are applied, the procedure ensures third-order accuracy in time and space. The formulæ for the third-order scheme, derived analytically in , can be found in Eq. 36.

Divergent-flow variant

In case of a divergent flow, the pseudo-velocity formulæ are augmented with an additional term proportional to the flow divergence. This additional term is implemented in libmpdata++ following Sect. 3.2(3).

Non-oscillatory option

Solutions obtained with the basic MPDATA are sign-preserving, and thus non-oscillatory near zero. Generally, however, they feature dispersive ripples characteristic of higher-order numerical schemes. These can be suppressed by limiting the pseudo-velocities, in the spirit of flux-corrected transport. Application of the limiters reduces somewhat the accuracy of the scheme , yet this loss is generally outweighed by ensuring non-oscillatory (or ripple-free) solutions. Noteworthy, because MPDATA is built upon the donor-cell scheme characterised by small phase error, the non-oscillatory corrections have to deal with errors in signal amplitude only. The non-oscillatory option is a default option within the libmpdata++. For the derivation and further discussion of the multi-dimensional non-oscillatory option see .

Variable-sign scalar fields

The basic MPDATA formulation assumes that the advected field ψ is exclusively either non-negative or non-positive. In particular, this assumption is evident in the ψ-fraction factor ψi+1k-ψikψi+1k+ψik of Eq. (), which can become unbounded in case of a variable-sign field. The libmpdata++ library includes implementations of two MPDATA options intended for simulating advection of variable-sign field.

The first method replaces ψ with |ψ| in all ψ-fraction factors that enter the pseudo-velocity expressions. This approach is robust but it reduces the solution quality where ψ crosses through zero; see Sect. 3.2(4) in .

The default method, is the “infinite-gauge” variant of the algorithm, a generalised one-step Lax–Wendroff (linear, oscillatory) limit of MPDATA at infinite constant background, discussed in Sect. 4.2. In practice, the infinite-gauge option of MPDATA is used with the non-oscillatory enhancement.

Library interface Compile-time parameters

Compile-time parameters include number of dimensions, number of equations and algorithm options. Most of the compile-time parameters are declared by defining integer constants within the compile-time parameter structure. Listing depicts a minimal definition that inherits from the ct_params_default_t structure containing default values for numerous parameters.

Example definition of compile-time parameters structure.

All solvers expect a structure with compile-time parameters as their first template parameter, as exemplified in Listing .

Example alias declaration combining solver- and compile-time parameters choice.

Choosing library components

The library components listed in Sect. are chosen through template parameters. First, the solver is equipped with an output mechanism by passing the solver type as a template parameter to the output type, as exemplified in Listing . The output classes inherit from solvers.

Example alias declaration of an output mechanism.

Second, the concurrency handlers expect solver class (equipped with output) as the first template parameter. Subsequent template parameters control boundary condition types on each of the domain edges (see Listing ).

Example alias declaration of a concurrency handler.

Run-time parameters

Run-time parameters include the grid size, number of MPDATA passes and output file name. The list of applicable run-time parameters is defined by fields of the rt_params_t structure. This structure is defined within each solver and extended when equipping the solver with an output mechanism. The concurrency handlers expect an instance of the run-time parameters structure as their constructor argument. Example code depicting how to set the run-time parameters and then instantiate a concurrency handler is presented in Listing .

Example run-time parameter structure declaration followed by a concurrency handler instantiation.

Public methods

The concurrency handlers act as controlling logic for the other components and, hence, the user is exposed to the public interface of these handlers only.

Listing contains signatures of methods implemented by each of the concurrency handlers.

Signatures of all the methods within libmpdata++ application programming interface.

The advectee() is an accessor method for the advected scalar fields. It can be used for setting the initial condition as well as for examining the solver state. It expects an index of the requested advectee as the argument (advected scalar fields are numbered from zero). This provides choice between different advected variables. The returned blitz::Array is zero-base indexed and has the same size as the computational grid (set with the grid_size field of the run-time parameters structure, see Listing ).

The advector() method allows accessing the components of the vector field of Courant numbers multiplied by the G factor (i.e. a Jacobian of coordinate transformation, a fluid density field or their product). The argument selects the vector field components numbered from zero. The size of the returned array depends on the component. It equals the grid size in all but the selected dimension in which it is reduced by 1 (i.e. nx×(ny-1) for the “y” component and so forth; cf. Fig. ).

The g_factor() is an accessor method for the G field. The returned array has the same size as the one returned by advectee(). The default value is set to G≡1 (for details, see Sect. ).

The advance() method launches the time-stepping logic of the solver advancing the solution by the number of time steps given as argument.

The panic_ptr() method returns a pointer to a Boolean variable that if set to true will cause the solver to stop the computations after the currently computed time step. This method may be used, for instance, to implement signal handling within programs using libmpdata++.

All multi-dimensional arrays used in libmpdata++ use the default Blitz++ “row-major” memory layout with the last dimension varying fastest. Domain decomposition for parallel computations is done over the first dimension only.

Basic example

The source code presented in this subsection is intended to serve as a minimal complete example on how to use libmpdata++. In other examples presented throughout the paper, only the fragments of code that differ significantly from the minimal example will be presented.

Simulation results generated by the code in Listing .

A usage example of libmpdata++. The listing contains the code needed to generate Fig. .

The example consists of an elemental transport problem for a one-dimensional, variable-sign field advected with a constant velocity. The simulation results using code in Listing are shown in Fig. . Spatial and temporal directions are depicted on the abscissa and ordinate, respectively. Cell-mean values of the transported field are shown on the applicate and are presented in compliance with the assumption of data points representing grid-cell means of the transported field.

The code in Listing begins with three include statements that reflect the choice of the library components: solver, concurrency handler and output mechanism. All compile-time parameters are grouped into a structure passed as a template parameter to the solver. Here, this structure is named ct_params_t and inherits from ct_params_default_t what results in assigning default values to parameters not defined within the inheriting class. The solvers expect the structure to contain a type real_t which controls the floating point format used. The two constants that do not have default values and need to be explicitly defined are n_dims and n_eqns. They control the dimensionality of the problem and the number of equations to be solved, respectively.

Choice between different solver types, output mechanisms and concurrency handlers is done via type alias declaration. Here, the basic mpdata solver is chosen which is then equipped with the gnuplot output mechanism. All output classes expect a solver class as their first template parameter, which is used to define the parent class (i.e. output classes inherit from solvers).

Classes representing concurrency handlers expect the output class and the boundary conditions as their template parameters. In the example, a basic serial handler is used and open boundary conditions on both ends of the domain are chosen.

The choice of run-time parameters is done by assigning values to the member fields of the rt_params_t structure defined within the solver class and augmented with additional fields by the output class. In this example, the instance of rt_params_t structure is named p, the grid size is set to 101 points and the output is set to be done every 20 time steps. An instance of the rt_params_t structure is expected as the constructor parameter for concurrency handlers.

The grid step dx is set to 0.1 and the number of time steps to 100. Initial values of the Courant number and the transported scalar fields are set by assigning them to the arrays returned by the advector() and advectee() methods. In this example, the Courant number equals 0.5 and the advected shape is described by the Witch of Agnesi formula y(x)=8a3/(x2+4a2) with the coefficient a=0.5. Initial shape is centred in the middle of the computational domain and is shifted downwards by 0.5. Finally, the actual integration is performed by calling the advance() method with the number of time steps as argument.

Example: advection scheme options

The following example is intended to present MPDATA advection scheme options described in Sect. . The way of choosing different options is discussed, and the calling sequence of the library interface is shown for the case of advecting multiple scalar fields.

The example consists of transporting two boxcar signals with different MPDATA options. In all tests, the first signal extends from 2 to 4 and the second signal extends from -1 to 1, to observe the solution for fixed-sign and variable-sign signals. Listing shows the compile-time parameters structure fields common to all cases presented within this example. The number of dimensions is set to 1 and the number of equations to solve is set to 2. Consistent with Listing from the basic example, p shown in Listing is an instance of rt_params_t structure with run-time parameters of the simulation. Setting the outfreq field to the number of time steps results in plotting the initial condition and the final state. The outvars field contains a map with a structure containing a variable name, here left empty, and unit defined for each of the advected scalar fields. Listing shows how to set initial values to multiple scalar fields using the advectee() method with an integer argument specifying the index of the equation in the solved system.

Compile-time parameters for the example presented in Sect. .

Run-time parameters for the example presented in Sect. .

Initial condition and velocity field for the example presented in Sect. .

Variable-sign scalar fields

The libmpdata++ library is equipped with two options for handling variable-sign fields; recall the discussion in Sect. . The option using absolute values is named abs, whereas the “infinite-gauge” option is dubbed iga. The option flags are defined in the opts namespace. The option choice is made by defining the opts field of the compile-time parameters structure, in analogy to n_dims or n_eqns.

Result of the simulation with the advection scheme option for variable-sign signal set to absolute value; cf. Listing .

Advection scheme options for Fig. , variable-sign option is set to absolute value.

In the first test, the choice of handling variable-sign signal is set to abs (Listing ). Figure shows the result of simulation with parameters set in Listings , , and . The final signal shows dispersive ripples characteristic of higher-order schemes. It is also evident that the ripple magnitude depends on the constant background, a manifestation of the scheme non-linearity. Furthermore, the final variable-sign signal features a bogus saddle point at the zero crossings (cf. Sect. ), and this can be eliminated by using the infinite-gauge (alias iga) option. Listing shows how to choose the iga option. Figure shows the result of the simulation with parameters set in Listings , , and . Although iga evinces more pronounced oscillations, their magnitudes do not depend on the constant background. This, together with the robust behaviour of iga when crossing zero, substantiates the discussion of Sect. on iga amounting to a linear limit of MPDATA.

Third-order-accurate variant

Choosing third-order variant enhances the accuracy of the scheme when used with more than two passes of MPDATA or with iga; recall Sect. . Option tot enables the third-order variant of the MPDATA scheme. Figure shows the result of the same test as in Figs. and but with MPDATA options set as in Listing . The resulting signal is evidently more accurate and symmetric, but the oscillations are still present.

As in Fig. but with variable-sign option set to “infinite-gauge”; cf. Listing .

Advection scheme options for Fig. , variable-sign option is set to “infinite-gauge”.

As in Fig. but with variable-sign option set to “infinite-gauge” and third-order-accurate variant; cf. Listing .

Advection scheme options for Fig. , variable-sign option is set to “infinite-gauge” and third-order accuracy variant is chosen.

Non-oscillatory option

To eliminate oscillations apparent in the preceding tests, the non-oscillatory (fct) option (Sect. ) needs to be chosen. This option can be used together with all other MPDATA options, such as basic scheme, variable-sign signals (abs or iga) and the third-order-accurate variant (tot).

Here, fct is selected together with iga; cf. Listing . This is the default setting, i.e. when inheriting from the default parameters structure, and not overriding the opts setting, as illustrated in Listing . Figure shows the corresponding results. The solutions for both fixed-sign and variable-sign signals have indistinguishable profiles and all of the dispersive ripples have been suppressed.

As in Fig. but with options set to infinite-gauge treatment of variable-sign signal and flux corrections; cf. Listing .

Advection scheme options for Fig. , variable-sign option is set to “infinite-gauge” and non-oscillatory option is enabled. This is the default setting in libmpdata++.

As in Fig. but with options set to infinite-gauge treatment of variable-sign signal, non-oscillatory option and third-order accuracy variant; cf. Listing .

Advection scheme options for Fig. , variable-sign option is set to “infinite-gauge”, non-oscillatory option is enabled and third-order accuracy variant is chosen.

To further enhance the accuracy of the solution, fct and iga can be combined with the tot variant; cf. Listing . The corresponding result is shown in Fig. . Enabling the third-order-accurate variant improves the symmetry of the solution, as compared to the results presented in Fig. .

Example: convergence tests in 1-D

In this subsection the convergence test originated in is used to quantify the accuracy of various MPDATA options.

The test consists of a series of one-dimensional simulations with Courant numbers C∈(0.05,0.1,0.15,0.2,…,0.85,0.9,0.95) and grid increments Δx∈Δxm20,Δxm21,Δxm22,Δxm23,Δxm24,Δxm25,Δxm26,Δxm27, where Δxm=1 is the maximal increment. The series amounts to 152 simulations for each option. In each simulation, the number of time steps NT and the number of grid cells NX is adjusted so that the total time T and total length of the domain X remain constant. The domain size X=44Δxm and simulation time T=1 are selected. The advective velocity is set to u=Δxm/T=1.

In each simulation, a Gaussian profile ψex(x)t=0=1σ2πexp⁡-(x-x0)22σ2 is advected, and the result of the simulation is compared with the exact solution ψex. The initial profiles and the exact solutions are calculated by analytically integrating function () over the grid-cell extents, to comply with the inherent MPDATA assumption of a data point representing the grid-cell mean of transported field. The dispersion parameter of the initial profile () is set to σ=1.5Δxm, while the profile is centred in the middle of the domain x0=0.5X.

As a measure of accuracy, a truncation-error function is introduced err(C,Δx)≡1T∑i=1NX[ψex(xi)-ψ(xi)]2/NXt=T. The results of the convergence test for the generic first-order-accurate donor-cell scheme, the basic MPDATA and its third-order-accurate variant are shown in Fig. a–c. Each figure displays, in polar coordinates, the base-two logarithm of the truncation-error function () for the entire series of 152 simulations. The radius and angle, respectively, r=ln⁡2ΔxΔxm+8,ϕ=Cπ2, indicate changes in grid increment and Courant number. Thus, closer to the origin are simulation results for finer grids, closer to the abscissa are points for small Courant numbers, and closer to the ordinate are points with Courant numbers approaching unity. The contour interval of dashed isolines and of the colour map is set to 1, corresponding to error reduction by the factor of 2. Lines of constant grid-cell size and constant Courant number are overlaid with white contours.

The result of the convergence test (a) for the donor-cell scheme, (b) for the basic MPDATA and (c) for the third-order-accurate variant.

As in Fig. (a) for three passes of MPDATA, (b) for two passes with non-oscillatory option, (c) for infinite-gauge option, and (d) for infinite-gauge with third-order-accurate variant.

The figures contain information on the convergence rate of MPDATA options. When moving along the lines of constant Courant number towards the origin, thus increasing the spatial and temporal resolution, the number of crossed dashed isolines determines the order of the scheme; cf. Sect. 8.1 in . Therefore, the results in Fig. a–c attest to the first-, second- and third-order asymptotic convergence rates, respectively. Furthermore, the shape of dashed isolines conveys the dependency of the solution accuracy on the Courant number. In particular, they show that at fixed spatial resolution the solution accuracy increases with the Courant number. Moreover, as the order of the convergence increases the isolines become more circular indicating more isotropic solution accuracy in the Courant number.

Figure b reproduces the solution in Fig. 1 of and, thus, verifies the libmpdata++ implementation. For further verification Fig. a and b shows results of the convergence test for (i) three-pass MPDATA (run-time solver parameter n_iters = 3), and (ii) for two-pass MPDATA with fct option. These results reproduce Figs. 2 and 3 from . Noteworthy, an interesting feature of Fig. a is the groove of the third-order convergence rate formed around ϕ=45∘, characteristic of MPDATA with three or more passes . Next, comparing Fig. b with Fig.b shows that the price to be paid for an oscillation-free result is a reduction in the convergence rate from 2 to ∼1.8; Sect. 4 in.

Figure c and d documents original results for the convergence test applied to the “infinite-gauge” limit of MPDATA. In particular, Fig. c shows that iga is as accurate as three-pass MPDATA, cf. Sect. 4 in, whereas Fig. d reveals that the third-order-accurate iga is more anisotropic in Courant number than the third-order-accurate standard MPDATA in Fig. c.

The convergence test results for the default setting of libmpdata++ (iga plus fct) are not shown, because they resemble results from Fig. b with somewhat enhanced accuracy for well-resolved fields (i.e. small grid cells).

The results of the example presented in Sect. ; only a quarter of the domain, centred over the cone's initial location, is shown. Abscissa and ordinate mark the spatial dimensions. Colours correspond to the amplitude of the advected field. Panel (a) shows initial condition of Sect. , (b) results for basic MPDATA with fct, (c) for MPDATA with three passes with fct and tot and (d) for the default setting of libmpdata++ (iga and fct).

Example: rotating cone in 2-D

This example introduces the libmpdata++ programming interface for two-dimensional simulations with the velocity field varying in space. Test results are compared with published MPDATA benchmarks. The example is based on the classical solid-body rotation test . The current setup follows . The initial condition features a cone centred around the point (x0,y0)=(50Δx,75Δy). The grid interval is Δx=Δy=1, and the domain size is 100Δx×100Δy – thus containing 101×101 data points (cf. Fig. ). The height of the cone is set to 4, the radius to 15Δx, and the background level to 1. The flow velocity is specified as (u,v)=ωy-yc,-(x-xc), where angular velocity ω=10-1 and (xc,yc) denotes coordinates of the domain centre. With time interval Δt=0.1, one full rotation requires 628 time steps. The total integration time corresponds to six full rotations.

Compile-time parameter settings for the rotating-cone test.

Run-time parameter responsible for setting the number of MPDATA passes in Fig. c.

Implementation of the setup using the libmpdata++ interface begins with definition of the compile-time parameters structure. The test features a single scalar field in a two-dimensional space, what is reflected in the values of n_dims and n_eqns set in Listing . In one of the test runs, the number of MPDATA passes (n_iters) is set to 3, instead of the default value of 2. Corresponding field of run-time parameters structure is shown in Listing . During instantiation of the concurrency handler, four boundary-condition settings (two per each dimension) are passed as template arguments. In this example, open boundary conditions (bcond::open) are set in both dimensions – see Listing .

The choice of the threads concurrency handler in Listing results in multi-threaded calculations – using OpenMP if the compiler supports it, or using Boost.Thread otherwise. The number of computational subdomains (and hence threads) is controlled by the OMP_NUM_THREADS environment variable, regardless if OpenMP or Boost.Thread implementation is used. The default is to use all CPUs/cores available in the system. Notably, replacing concurr::serial from the previous examples with concurr::threads is the only modification needed to enable domain decomposition via shared-memory parallelism.

Concurrency handler instantiation for the rotating-cone test.

The way the initial condition and the velocity field are set is shown in Listing . The Courant number components are specified using calls to the advector() method with the argument defining the component index.

Initial condition for the rotating-cone test.

The initial condition is displayed in Fig. a, and the results after total integration time are shown in Fig. b–d. All plots are centred around cone's initial location and show only a quarter of the computational domain. The isolines of the advected cone are plotted with 0.25 intervals. The results in Fig. b and c were obtained with the fct and the three-pass tot + fct MPDATA, respectively, whereas Fig. d shows test results for the default setting of libmpdata++. These results match those presented in Fig. 1 and Fig. 4 and Table 1. In particular, the rms errors – defined on the rhs of Eq. () – are 0.37×10-3, 0.11×10-3 and 0.27×10-3 for the fct, three-pass tot fct and the default libmpdata++ options, respectively.

Example: revolving sphere in 3-D

This example extends Sect. to three spatial dimensions. It exemplifies how to specify a three-dimensional setup using libmpdata++. Furthermore, the option is described for saving the simulation results to HDF5 files with XDMF annotations.

The setup follows : the domain size is 100×100×100, with a uniform grid consisting of 59 grid points in each direction. The time step is 0.036π. The initial condition is a sphere of radius 15 centred around the point (x0,y0,z0)=(50-25/3,50+25/3,50+25/3) with a constant density of 4. The sphere is rotating with constant angular velocity Ω=ω/3(1,1,1) of magnitude ω=0.1. The components of the advecting velocity field are (u,v,w)=(-Ωz(y-yc)+Ωy(z-zc),Ωz(x-xc)-Ωx(z-zc),-Ωy(x-xc)+Ωx(y-yc)), where the coordinates of the rotation centre are (xc,yc,zc)=(50,50,50). The test lasts for one revolution which takes 556 time steps.

Specifying the 3-D setup with the libmpdata++ programming interface calls starts by setting the n_dims field to 3 (Listing ). Listing shows the choice of recommended three-dimensional output handler hdf5_xdmf. This results in output consisting of HDF5 files with XDMF annotation that can be viewed, for example, with the ParaView visualisation software. This output is saved in a directory specified by the outdir field of the run-time parameters; see Listing .

Compile time parameter setting for the revolving-sphere test.

Alias declaration of an output mechanism for the revolving-sphere test.

Run-time parameters field specifying output directory for the revolving-sphere test.

Figure a shows the initial condition, Fig. b shows the results after one revolution for the default libmpdata++ options. The grey volume is composed of dual-grid cells (Sect. ) encompassing data points with cell-mean values of density greater than or equal to 1.

Obtained results can be compared with those presented in Figs. 9–13 and Table 4. In particular, for the default libmpdata++ setting, the rms error is 2.8×10-3, and it compares favourably with the L2 norm in their Table 4.

The results of the example presented in Sect. . The whole computational domain is shown. The grey volume encompasses data points with values of density greater or equal to 1. Panel (a) shows initial condition, (b) results for the default libmpdata++ options.

Example: 2-D advection on a sphere

This subsection concludes homogeneous transport examples with a 2-D solid-body rotation test on a spherical surface . The purpose of this example is to present methods for setting up the simulations in spherical coordinates.

The same method, used here to specify a Jacobian of coordinate transformation, can be applied to prescribe a variable-in-space fluid density.

Following only the case when the initial field rotates over the poles is presented. The initial condition is a cone centred around the point (3π/2,0) with height and radius equal to 1 and 7π/64, respectively. The wind field is given by u=-Usin⁡ϕcos⁡λ,v=Usin⁡λ, where λ and ϕ denote respectively longitude and latitude, and U=π/128. The computational domain [0,2π]×[-π/2,π/2] is resolved with 128×64 grid increments Δλ=Δϕ and is shifted by 0.5Δϕ so that there are no data points on the poles. The test is run for 5120 time steps corresponding to one revolution around the globe.

The advection equation in spherical coordinates has the form of the generalised transport Eq. () with the Jacobian of coordinate transformation G=cos⁡ϕ. In order to solve the generalised transport equation with G≢1 the nug option has to be set; see Listing .

Compile-time parameter field for the example presented in Sect. .

Boundary conditions in this example incorporate principles of differential geometry cf. chapter XIV in in the classical spherical latitude–longitude framework . They are cyclic (bcond::cyclic) in the zonal direction, whereas in the meridional direction they represent two degenerated charts (of the atlas composed of three) defining differentiation of dependent variables in vicinity of the poles (bcond::polar; Listing ). The setting of G is done using the g_factor() accessor method as shown in Listing ; note the shift in latitude by Δϕ/2.

Concurrency handler for the example presented in Sect. .

The Jacobian setting for the example presented in Sect. .

The results of the example presented in Sect. . The plots are centred over the cone's initial location and show the advected field plotted in spherical coordinates. Colours mark the amplitude of the advected field. Panel (a) shows the initial condition, (b) results for the default libmpdata++ options and (c) results for the three-pass MPDATA with fct and tot.

The initial condition for the test is plotted in Fig. a, whereas the results are displayed in Fig. b and c. All figures use orthographic projection, with the perspective centred at the initial condition (the true solution), with the contour interval 0.1. Figure b shows the result for the default libmpdata++ options. There is a visible deformation in the direction of motion, consistent with earlier Cartesian rotational tests. The result in Fig. c, obtained using three passes of MPDATA with fct and tot, shows reduced deformation and reproduces Fig. 6 in . Error norms were calculated following Eqs. 24a–e to take into account the effects of coordinate transformation. For instance, the “energy” conservation error (their ERR2) is -0.066 for the default libmpdata++ setting and -0.11 for the three-pass MPDATA with tot and fct, which agrees with the values presented in Table 1.

Inhomogeneous advective transport Implemented algorithms

As of the current release, libmpdata++ provides three ways of handling source terms in the inhomogeneous extension of Eq. (): ∂tψ+1G∇⋅(Guψ)=R. The available time integration schemes include the two variants of the first-order-accurate Euler-forward scheme (hereafter referred to as euler_a and euler_b) and the second-order-accurate Crank–Nicolson scheme (trapez). The Euler schemes are implemented to account for parameterised forcings (e.g. due to cloud microphysics), whereas the Crank–Nicolson scheme is standard for basic dynamics (e.g. pressure gradient, Coriolis and buoyancy forces). In both Euler schemes, while calculating the solver state at the time level n+1, the right-hand-side at the time level n is only needed. In the euler_a option (Eq. ), the source terms are computed and applied standardly after the advection: ψn+1=ADV(ψn)+ΔtRn. In the euler_b option (Eq. ), the source terms are computed and applied arguably in the Lagrangian spirit; Sect. 3.2 in before the advection ψn+1=ADV(ψn+ΔtRn). In the trapez option (Eq. ), half of the source terms are computed and applied as in the euler_a and half as in the euler_b arguably in the spirit of the Lagrangian trapezoidal rule; Sect. 2.2 in: ψn+1=ADV(ψn+0.5ΔtRn)+0.5ΔtRn+1.

Library interface

The logic for handling source terms is implemented in the mpdata_rhs solver that inherits from the mpdata class (Fig. ). Consequently, all options discussed in the preceding section apply. The choice of the source-term integration scheme is controlled by the rhs_scheme compile-time parameter with the valid values of euler_a, euler_b or trapez.

The user is expected to provide information on the source terms by defining a derived class of mpdata_rhs with the update_rhs() method overloaded. The update_rhs() signature is given in Listing , whereas the usage example is given in Sect. . The method is called by the solver with the following arguments:

a vector of arrays rhs storing the source terms for each equation of the integrated system,

a floating-point value dt with the time-step value,

an integer number at indicating if the source terms are to be computed at time level n (if at =0) or n+1 (if at =1).

Signature of the method used for defining source terms.

Calculation of forcings at the n+1 time level is needed if the rhs_scheme=trapez option is chosen. The case of at equal to zero is used in the Euler schemes and in the very first time step when using the trapez option (i.e. once per simulation). When the trapez option is used, the dt passed to the update_rhs() method equals half of the original time step.

The update_rhs() method is expected to first call parent_t::update_rhs() to zero out the source and sink terms stored in rhs. Later, it is expected to calculate the rhs terms in a given time step by summing all sources and sinks and “augment assign” them to the rhs field (e.g. using the += operator).

All elements of the rhs vector corresponding to subsequent equations in the system are expected to be modified in a single update_rhs() call.

Example: translating oscillator

The purpose of this example is to show how to include rhs terms in libmpdata++ by creating a user-defined class out of the library tree.

A system of two one-dimensional advection equations, ∂tψ+∂x(uoψ)=ωϕ,∂tϕ+∂x(uoϕ)=-ωψ, represents a harmonic oscillator translating with uo=constant; see Sect. 4.1 in for a discussion.

The implicit manner of prescribing forcings, similar to the one presented herein, is an archetype for integrating Coriolis force in .

Applying the trapezoidal rule to integrate the PDE (partial differential equation) system () leads to the following system of coupled implicit algebraic equations: ψin+1=ψi∗+0.5Δtωϕin+1,ϕin+1=ϕi∗-0.5Δtωψin+1, where ψi∗ and ϕi∗ stand for ψi∗=MPDATAψin+0.5Δtωϕin,C,ϕi∗=MPDATAϕin-0.5Δtωψin,C. Substituting in Eq. () ψin+1 with ϕin+1 and vice versa and then regrouping leads to ψin+1=ψi∗+0.5Δtωϕi∗1+(0.5Δtω)2,ϕin+1=ϕi∗-0.5Δtωψi∗1+(0.5Δtω)2. Implementation of forcing terms prescribed in Eq. () is presented in Listing . A new solver coupled_harmosc is defined by inheriting from the mpdata_rhs class. A member field omega is defined to store the frequency of oscillations.

Simulation results of the example presented in Sect. . Abscissa marks the spatial dimension and ordinate represents the oscillator amplitude. The oscillator state is plotted every 20 time steps.

Definition of the solver used in the example presented in Sect. .

The rhs terms are defined for both variables, ix::psi and ix::phi, within the update_rhs() method. The method implements both implicit and explicit formulæ, the two cases are switched by the at argument. Defining forcings for both n and n+1 cases allows using this class with both euler and trapez options. The current state of the model is obtained via a call to the state() method. Note how the formulæ defined in update_rhs() in case (1) loosely resemble the mathematical notation presented in Eq. (). The 0.5 is absent because the Δt passed as argument in trapez option is already divided by 2.

Next, the rt_params_t structure is augmented (by inheriting from parent's rt_params_t) with the omega. Lastly, the coupled_harmosc constructor is defined. Within it, the choice of the omega is handled by copying its value from the p.omega to omega member field and then checking if the user has altered the default value of 0.

Compile-time parameter structure for the example presented in Sect. .

For inhomogeneous transport, the rhs_scheme within the ct_params_t structure needs to be defined. In this example it is set to trapez (Listing ). MPDATA advection scheme options are set to default by inheriting from the ct_params_t_default structure. The structure ix allows calling advected variables by their labels, phi and psi, rather than integer numbers. Lastly, when defining the rt_params_t structure a value is assigned to the member field p.omega; see Listing .

Run-time parameter structure for the example presented in Sect. .

In the present example, the initial condition for ψ is defined as ψ(x)=0.5[1+cos⁡(2πx/100)] for x∈(50,150) and zero elsewhere. The initial condition for ϕ is set to zero.

The result of 1400 s of simulated time is shown in Fig. . Note that the solutions for both ψ and ϕ remain in phase and feature no substantial amplitude error. This contrasts with calculations using Euler-forward schemes (not shown). In particular, at the end of the simulation, the rms error is 1×10-7 and 1×10-18 for the analogous experiment with uo≡0 (not shown).

Transport with prognosed velocity Implemented algorithms

Whenever the velocity field changes in time, the second-order accuracy of the solution at n+1 requires an estimate of the advector at n+1/2. This is provided by linear extrapolations from n and n-1 values . Furthermore, when the velocity is a dependent variable of the model, Eq. () embodies equations of motion. Then the velocity (or momentum) components are treated as advected scalars (i.e. advectees) and are predicted at the centres of the dual-grid cells (Fig. ). The advector field is then interpolated linearly to the centres of the cell walls.

Library interface

The algorithms for interpolating in space and extrapolating in time the advector field from the model variables are defined in the mpdata_rhs_vip class and all user-created solvers with time-varying velocity must inherit from this class.

The transported fields may represent either velocity or momenta. In the latter case the prognosed velocity components are calculated as ratios of two advectee fields (e.g. momentum components and density). The index of the advectee that forms the common denominator for all velocity components should be assigned to vip_den. The vip_i, vip_j and vip_k store the index of the advected fields appearing in the numerators for each velocity component. These velocity components are then used to calculate the advector field. In cases when the velocity components are model variables (as in the example of Sect. ), the common denominator is redundant and the value -1 should be assigned to vip_den.

For systems where numerators and denominators can uniformly approach zeros, the vip_eps value is defined to prevent divisions by zero. Then, if the denominator at a given grid-point is less than the vip_eps, the resulting advector is set to zero therein. The default setting (represented with vip_eps set to 0) gives no protection from divisions by zeros. Any user-defined vip_eps > 0 activates the above algorithm.

The vip_i, vip_j, vip_k and vip_den are expected to be members of the compile-time parameters structure ct_params_t of the mpdata_rhs_vip class. The vip_eps value is a run-time parameter.

As of the current release, the prognosed-velocity features of libmpdata++ are implemented for constant G≡1 only.

Example: 1-D shallow-water system

The aim of this example is to show how to define simulations with prognosed velocity field. The necessary compile-time and run-time parameters as well as the user-defined class with source and sink terms are discussed. The obtained results are compared with the analytical solution and a published MPDATA benchmark.

The idealised system of 1-D inviscid shallow-water equations is considered, with both the surface friction and background rotation neglected. The simulated physical scenario is a slab-symmetric parabolic drop spreading under gravity; see for a general context and for the bespoke analytical solutions. The corresponding governing equations take the dimensionless form ∂th+∂x(uh)=0,∂t(uh)+∂x(uuh)=-h∂xh, where h is a normalised depth of the fluid layer and u is a normalised velocity. Following the selected velocity scale is uo=(gho)1/2, where ho is the initial height of the drop and g denotes the gravitational acceleration. The characteristic timescale is to=a/uo, where a denotes the initial half-width of the drop. At the initial time a drop is confined to |x|≤1 and centred about x=0, h(x,t=0)=1-x2,for|x|≤10,for|x|>1. The time step is set to 0.01 and the grid spacing is set to 0.05. The crux of the test is a synchronous solution for the depth and momentum near the drop edge that accurately diagnoses the velocity.

The definition of the rhs terms for Sect. is presented in Listing . Only the method for calculating the forcing terms is shown; for the full out-of-the-library-tree definition of source-terms see Listing . As in Listing , the definition in Listing attempts to follow the mathematical notation. Because of the use of the grad function, the nabla namespace is included.

Method for calculating source and sink terms in the example presented in Sect. .

Compile-time parameters for the example presented in Sect. .

Listing specifies the compile-time parameters structure. Because fluid flow in this example is divergent the opts::dfl correction is enabled; cf. Sect. . The rhs_scheme is set to trapez.

Because the equation for h is homogeneous, the momentum forcing at n+1 time level can be readily evaluated after advecting h.

Within the ix structure, the equation indices are assigned. Furthermore, the recipe for calculating the velocity is defined by assigning the indices to vip_i and vip_den. Lack of the rhs terms is specified by toggling the nth bit of the hints_norhs field, where n is the index of the homogeneous equation. This prevents the unnecessary summation of zeros.

Listing shows the run-time parameters structure. The value of gravitational acceleration p.g is set to 1 to follow the dimensionless notation of Eq. (), and the vip_eps is set arbitrarily to 10-8.

Run-time parameters for the example presented in Sect. .

The results of the test are plotted in Fig. . Figure a shows the initial condition (black) and the analytical solution for t=3 (blue). Solid lines mark the fluid depth and the dashed line the velocity. The remaining two panels show numerical results

Similar to advector field evaluation discussed in Sec. the vip_eps value was used as cut-off value to prevent divisions by zero when calculating velocity field.

at t=3 for different MPDATA options (red) plotted over the top panel. Figure b shows the solution with options abs and fct, whereas Fig. c shows the solution obtained with options iga and fct.

All presented results are free of apparent artefacts near the drop edge. The abs+fct in the central panel compares well with Fig. 7b in , whereas the iga+fct solution in the bottom panel closely reproduces the analytical result. The rms error, on the rhs of Eq. (), at the end of the simulation is 5.77×10-4 for abs+fct and 1.87×10-4 for iga+fct options.

Simulation results of the example presented in Sect. . Solid lines represent fluid height and dashed lines represent fluid velocity. Initial condition is plotted in black, analytical solution in blue and test results in red. (a) shows the initial condition and analytical solution at t=3. (b) and (c) show numerical results plotted over (a) obtained with options abs + fct and iga + fct, respectively.

Example: 2-D shallow-water system

The 2-D shallow-water test discussed here is an original axis-symmetric extension of the 1-D slab-symmetric test in Sect. . The corresponding dimensionless equations take the form ∂th+∂x(uh)+∂y(vh)=0,∂t(uh)+∂x(uuh)+∂y(vuh)=-h∂xh,∂t(vh)+∂x(uvh)+∂y(vvh)=-h∂yh. As in the 1-D case, h stands for the fluid height and u and v are the velocity components in x and y directions, respectively. Again, the initial condition consists of a parabolic drop centred at the origin and confined to x2+y2≤1, h(x,y,t=0)=1-x2-y2,forx2+y2≤10,forx2+y2>1. Following the method presented by and , the analytical solution of the system () can be obtained as h(x,y,t)=1λ2-x2+y2λ4,u(x,t)=xλ˙λ,v(y,t)=yλ˙λ. Here λ(t) is half-width of the drop, evolving according to λ(t)=2t2+1 and λ˙≡dλ/dt is the velocity of the leading edge.

Figure shows a perspective display of drop height at t=3, whereas Fig. shows the profiles of velocity and height of the drop. Similarly to Fig. , the top panel shows the initial condition (black) and analytical solution for t=3 (blue). Central and bottom panels show corresponding numerical results at t=3 (red). Solid lines represent the fluid height and the dashed lines the velocity. The central panel shows the solution with options abs and fct, whereas the bottom panel shows the solution with options iga and fct. As in the 1-D case, the velocity field near the drop edge is regular and the iga+fct result closely follows the analytical solution. The rms error for abs and fct equals 1.60×10-4 and for abs and iga 0.70×10-4; see for a discussion.

Drop height at t=3 of the example presented in Sect. .

Systems with elliptic pressure equation Implemented algorithms

The libmpdata++ library includes an implicit representation of pressure gradient terms for incompressible fluid equations. This necessitates the solution of an elliptic Poisson problem for pressure. The elliptic problem is solved after applying all explicitly known forcings to ensure a non-divergent velocity field at the end of each time step. As of the current release, the library is equipped with the minimal- and conjugate-residual variational iterative solvers. For the derivation of used schemes and further discussion of the elliptic problem see , and references therein.

The same as in Fig. but for a cross section of the two-dimensional case.

Library interface

The methods for solving the elliptic problem are implemented in the mpdata_rhs_vip_prs class (Fig. ). This class inherits from the mpdata_rhs_vip class. Therefore, the way to specify other source terms as well as the time-varying velocity field remains unchanged.

The choice of elliptic solver is controlled by setting the compile-time parameter prs_scheme to mr and cr for the minimal-residual and conjugate-residual solver, respectively. The iterations within the elliptic solver stop when the divergence of the velocity field is lower than a threshold tolerance set by a run-time parameter prs_tol cf..

Example: Boussinesq convection

The goal of this example is to show the user interface for simulations featuring an elliptic pressure equation. The governing PDE system consists of momentum, potential temperature, and mass-continuity equations for an ideal, 2-D, incompressible Boussinesq fluid ∂tv+∇⋅(v⊗v)=-∇π-gθ′θo,∂tθ+∇⋅(vθ)=0,∇⋅v=0. Here, v=(u,w) denotes the velocity field, π is the pressure perturbation about the hydrostatic reference state normalised by the reference density ρo constant in the Boussinesq model and g is the gravitational acceleration. Furthermore, θ′ represents the potential temperature perturbation about the reference state θo=constant, and ⊗ denotes the tensor product.

Combining the velocity prediction from the momentum equation, according to Eq. (), with the mass continuity Eq. () leads to the elliptic Poisson problem: -1ρo∇⋅ρov^-0.5Δt∇π=0, where v^ is the velocity field after the advection summed with all the explicitly known source terms at time level n+1, namely buoyancy in this example.

Because the potential temperature Eq. () is homogeneous, the buoyancy at the n+1 time level can be readily evaluated after advecting θ.

In Eq. () the pressure perturbation field π is unknown, and it needs to be adjusted such that the final velocity field v^-0.5Δt∇π satisfies the mass continuity Eq. (). Denoting 0.5Δtπ as ϕ allows symbolising Eq. () using standard notation for linear sparse problems : L(ϕ)-R=0. The setup of the test follows . It consists of a circular potential temperature anomaly of radius 250 m, embedded in a neutrally stratified quiescent environment, with θo=300K, in the domain resolved with 200×200 grid cells of the size dx=dy=10 m. The initial anomaly θ′=0.5K is centred in the horizontal, 260 m above the lower boundary. The value of g is set to 9.81 ms-1. The time step is set to Δt=0.75 s and the simulation takes 800 time steps.

Compile-time parameters for the example presented in Sect. .

Listing shows the compile-time parameters structure. The time integration scheme for the buoyancy forcing is set to trapez, as the user has a choice of the algorithm. However, as of the current release, the elliptic problem formulation requires forcings to be independent of velocity if handled using the trapez scheme. The implicit pressure gradient terms are always integrated with the trapezoidal rule (), regardless of the rhs_scheme setting. In Listing the elliptic solver option is set to the conjugate-residual scheme cr. The vip_den is set to -1, because here the velocity components are the model kinematic variables; cf. the discussion in second paragraph of Sect. .

The convergence threshold of the elliptic solver, ∇⋅(v)≤ε, is set to 10-7 via the run-time parameter prs_tol (Listing ).

Run-time parameter field setting the accuracy of the pressure solver.

Listing shows the buoyancy forcing definition.

Method for calculating source and sink terms in the example presented in Sect. .

The results of the example presented in Sect. . Abscissa and ordinate mark the spatial dimensions. Colours correspond to potential temperature values. Panel (a) shows results from the 200th, (b) from the 600th and (c) from the 800th time step.

The evolved θ fields after 200, 600 and 800 time steps are shown in Fig. a–c. These results correspond to plots from Fig. 3 in and illustrate that libmpdata++ captures the interfacial instabilities and sharp gradients, including small turbulent structures in Fig. c. Yet, the solutions contain small (imperceptible in the plots) under- and overshoots, developing at the rate of Δθ/Δt∼Δtθo∇⋅(v). These oscillations depend on the magnitude of the residual errors, ∇⋅v≠0, controlled by the convergence threshold prs_tol. For substantiation, Table displays the magnitude of such spurious extrema δθmax – defined as the larger from the maximal magnitudes of normalised under- and overshoots with respect to their initial values – against prs_tol at the time of Fig. c. Note that δθmax is bounded by prs_tol(× 800Δt).

The conservation errors for θ′ and (θ′)2 are defined as err1=∑θ′-∑θo′∑θo′100%, err2=∑(θ′)2-∑(θo′)2∑(θo′)2100%, where θo′ indicates the initial perturbation and ∑ stands for summing over the whole computational domain. At the end of the simulation err1≈1×10-11 is orders of magnitude smaller than in semi-Lagrangian calculations of , whereas err2=-14 matches their value, reflecting the implicit LES (ILES) property of non-oscillatory numerics; see , and references therein.

Maximal spurious extrema of the θ field after 800 time steps for various values of the convergence threshold prs_tol.

prs_tol

10-5

10-7

10-9

δθmax

3×10-4

8×10-6

1×10-7

Remarks

In this paper the first release of libmpdata++ was introduced. Versatility of the user interface as well as the correctness of the implementation were illustrated with a series of examples with increasing degree of physical, mathematical and programming complexity. Starting from elementary advection in the Cartesian domain, through passive advection on the sphere, through slab- and axis-symmetric water drop spreading under gravity, to buoyant convection in an incompressible Boussinesq fluid, the accompanying discussions included code snippets, description of the user interface and comparison with previously published benchmarks.

Our priority in the development of libmpdata++ is the researcher productivity. In case of scientific software such as libmpdata++, the researchers are both users and developers of the library. The adherence to the principle of separation of concerns and employment of programming techniques that promote code conciseness – e.g. the current release consists of less than 10k lines of code – contribute to the developers' productivity. The user productivity is amplified by ensuring that the release of the library is accompanied with example-rich documentation. Both the users and developers benefit from the free/libre open-source software release of the library.

Our experience with the current version of libmpdata++ indicates that the embraced object-oriented techniques and modular design of the library generally do not come as a trade-off for performance. On small grids, however, there is a noticeable overhead compared to the original Fortran implementation. For example, in serial runs, up to 5-times longer execution times were measured for the 3-D revolving-sphere tests discussed in Sect. (593 grid). The relative performance improves with increasing grid size, reaching execution times on a par with the original Fortran implementation on the (6×59)3 grid. On the other hand, the separation of concerns obtained with the object-oriented design of the library allowed equipping the code with the multi-threading mechanism, without any substantial changes in the numerics code. Noteworthy, for all 2-D and 3-D examples presented in the paper, a minimum of fivefold speed-up is obtained when executing on six threads. The library is in active development and improvements in performance are expected. Furthermore, equipping the library with distributed-memory parallelisation is planned for a subsequent release.

Index of options

Tables , and provide a reference of libmpdata++ options documented in the article.

Fields of compile-time parameter structures.

Parameter Possible values Relevant section Relevant listing Short description available in mpdata and inheriting classes opts combinations of abs, dfl, fct, iga, nkh, nug, tot, eps, npa , , , MPDATA algorithm options (see Table ). Options can be combined with the “|” operator. real_t float, double , Floating point format used. n_dims integer constant , , , Dimensionality of the solved problem. n_eqns integer constant , , Number of advected variables (number of the solved equations). available in mpdata_rhs and inheriting classes rhs_scheme euler_a, euler_b, trapez , Source/sink term integration scheme hint_norhs integer constant interpreted as a bit field indexed by equation number Flag for equations with no source/sink terms (to avoid summation of zeros when calculating source terms). available in mpdata_rhs_vip and inheriting classes vip_i, vip_j, vip_k integer constant , Indices of advected variables representing velocity or momentum components in up to three dimensions. vip_den integer constant , Optional index of density-like advected variable by which the above-defined momenta are divided to obtain velocity. available in mpdata_rhs_vip_prs and inheriting classes prs_scheme solvers::mr, solvers::cr Elliptic pressure solver algorithm type (minimal-residual or conjugate-residual).

Fields of run-time parameter structures.

Parameter Possible values Relevant sections Relevant listings Short description available in mpdata and inheriting classes n_iters integer constant , Number of corrective iterations performed within the MPDATA algorithm. One iteration results in a donor-cell scheme. Two (the default) or more iterations result in MPDATA scheme. grid_size array of integer constants , Number of grid points per each dimension. available in mpdata_rhs and inheriting classes dt floating-point constant , Time step. available in mpdata_rhs_vip and inheriting classes vip_eps floating-point constant Cut-off value for preventing divisions by zero when calculating velocity field from momenta (for simulations in which the advected variables represent momenta and not velocity). di, dj, dk floating-point constant Grid spacing. available in mpdata_rhs_vip_prs and inheriting classes prs_tol floating-point constant Tolerance of the elliptic pressure solver. common to all output handlers outfreq integer number , Output interval (in number of time steps). The default value is set to 1, resulting in output performed in every time step. outdir string Directory where output files are saved. outvars map associating equation indices with pairs of strings representing names and units of advected variables List of variables to include in the output files. Mandatory for simulations with more than one advected field.

Options of MPDATA defined through the compile-time parameter opts (see Listings –).

Option Default Relevant section Short description abs , , , , , Using absolute values in “pseudo-velocity” formulation. (One of the two possible options for handling variable-sign signals.) dfl , , Augmenting the “pseudo-velocity” formulæ with a term proportional to flow divergence. (To be used with divergent flows only.) fct ✓ , , , Non-oscillatory option of MPDATA. iga ✓ , , , , , Linear limit of MPDATA algorithm at infinite constant background. (One of the two possible options for handling variable-sign signals.) khn Employing Kahan summation algorithm in donor-cell calls of MPDATA algorithm. nug Accounting for non-constant density of the fluid and/or coordinate transformation. tot , , , , Accounting for third-order terms in “pseudo-velocity” formulæ. pfc Protecting from divisions by zero in ψ-fraction factors (as the last term in Eq. ()) by conditionally assigning zeros to all grid points for which the denominator equals zero. The default is to augment the denominator with a small positive number ϵ (∼10-7 for single precision and ∼10-16 for double precision). The default behaviour requires the signal to be non-negative unless iga or abs is selected. npa Evaluating [u]+ as (u+|u|)/2 instead of max(u,0) (and analogously for [u]-; see Eq. )

Code availability

The library is released under the GNU General Public License v3.0. The 1.0 release of the library accompanying this publication is available for download as an electronic supplement to the paper and tagged as “1.0.1” at the project repository; see project website for a list of pointers to relevant resources: http://libmpdataxx.igf.fuw.edu.pl/.

All example programs needed to generate plots and error norms discussed in the paper are shipped with libmpdata++ and are located in the “paper_2015_GMD” folder of the release tarball and the public code repository. To allow automatic regression testing, reference data in the form of both model output (e.g. hdf5 files) and calculated error norms (text files) are stored in “refdata” subfolders. Execution of test programs and verification of the output against reference data is automated using CMake/CTest and is a part of the continuous-integration workflow used in development of the library. It takes ca. 15 min to execute all the discussed example programs on commodity hardware (e.g. a multi-core laptop or a virtual machine in a cloud-computing system).

The Supplement related to this article is available online at doi:10.5194/gmd-8-1005-2015-supplement.

Acknowledgements

Personal reviews from Christian Kühnlein and Willem Deconinck as well as the reviews from Christian Jacobs and Douglas Jacobsen helped to improve the presentation. Development of libmpdata++ was funded by Poland's National Science Centre (Narodowe Centrum Nauki) (decisions no. 2011/01/N/ST10/01483 and 2012/06/M/ST10/00434). P. K. Smolarkiewicz acknowledges support by funding received from the European Research Council under the European Union's Seventh Framework Programme (FP7/2012/ERC grant agreement no. 320375). Part of the work was carried out during visits of A. Jaruga to the National Center for Atmospheric Research (NCAR) in Boulder, Colorado, USA, and to the European Centre for Medium-Range Weather Forecasts (ECMWF), Reading, UK. Part of the work was carried out during visits of D. Jarecka to NCAR funded by Polish Ministry of Science and Higher Education – project no. 1119/MOB/13/2014/0. NCAR is operated by the University Corporation for Atmospheric Research. Figures were generated using gnuplot, ParaView and matplotlib. The authors acknowledge computational time granted by the Center for Cooperative Work on Computational Science, University of Hyogo, Japan. Development of libmpdata++ continuously benefits from the computational services offered by Travis at their continuous-integration platform. Edited by: H. Weller

References Arabas et al.(2014)Arabas, Jarecka, Jaruga, and Fijałkowski

Arabas, S., Jarecka, D., Jaruga, A., and Fijałkowski, M.: Formula translation in Blitz++, NumPy and modern Fortran: a case study of the language choice tradeoffs, Sci. Prog., 22, 201–222, 10.3233/SPR-140379, 2014.

Arakawa and Lamb(1977)

Arakawa, A. and Lamb, V. R.: Computational design of the basic dynamical process of the UCLA general circulation model, in: General Circulation Models of the Atmosphere, vol. 17 of Methods in Computational Physics: Advances in Research and Applications, Elsevier, 173–265, 10.1016/B978-0-12-460817-7.50009-4, 1977.

Bangerth and Heister(2013)

Bangerth, W. and Heister, T.: What makes computational open source software libraries successful?, Comp. Sci. & Discuss., 6, 015010, 10.1088/1749-4699/6/1/015010, 2013.

Charbonneau and Smolarkiewicz(2013)

Charbonneau, P. and Smolarkiewicz, P.: Modeling the solar dynamo, Science, 340, 42–43, 10.1126/science.1235954, 2013.

Cotter et al.(2002)Cotter, Smolarkiewicz, and Szczyrba

Cotter, C. S., Smolarkiewicz, P. K., and Szczyrba, I. N.: A viscoelastic fluid model for brain injuries, Int. J. Numer. Meth. Fl., 40, 303–311, 10.1002/fld.287, 2002.

Frei(1993)

Frei, C.: Dynamics of a two-dimensional ribbon of shallow water on an f-plane, Tellus A, 45, 44–53, 10.1034/j.1600-0870.1993.00004.x, 1993.

Grabowski and Smolarkiewicz(2002)

Grabowski, W. and Smolarkiewicz, P.: A multiscale anelastic model for meteorological research, Mon. Weather Rev., 130, 939–955, 10.1175/1520-0493(2002)130<0939:AMAMFM>2.0.CO;2, 2002.

Hyman et al.(2012)Hyman, Smolarkiewicz, and Winter

Hyman, J., Smolarkiewicz, P., and Winter, C.: Heterogeneities of flow in stochastically generated porous media, Phys. Rev. E, 86, 056701, 10.1103/PhysRevE.86.056701, 2012.

Jarecka et al.(2015)Jarecka, Jaruga, and Smolarkiewicz

Jarecka, D., Jaruga, A., and Smolarkiewicz, P.: A spreading drop of shallow water, J. Comput. Phys., 289, 53–61, 10.1016/j.jcp.2015.02.003, 2015.

Kahan(1965)

Kahan, W.: Pracniques: further remarks on reducing truncation errors, Comm. ACM, 8, p. 40, 10.1145/363707.363723, 1965.

Kühnlein et al.(2012)Kühnlein, Smolarkiewicz, and Dörnbrack

Kühnlein, C., Smolarkiewicz, P., and Dörnbrack, A.: Modelling atmospheric flows with adaptive moving meshes, J. Comput. Phys., 231, 2741–2763, 10.1016/j.jcp.2011.12.012, 2012.

Margolin and Smolarkiewicz(1998)

Margolin, L. and Smolarkiewicz, P.: Antidiffusive velocities for multipass donor cell advection, J. Sci. Comput., 20, 907–929, 10.1137/S106482759324700X, 1998.

Maurin(1980)

Maurin, K.: Analysis Part II: Integration, Distributions, Holomorphic Functions, Tensor and Harmonic Analysis, Reidel, 1980.

Molenkamp(1968)

Molenkamp, C.: Accuracy of finite-difference methods applied to the advection equation., J. Appl. Meteorol., 7, 160–167, 10.1175/1520-0450(1968)007<0160:AOFDMA>2.0.CO;2, 1968.

Ortiz and Smolarkiewicz(2009)

Ortiz, P. and Smolarkiewicz, P. K.: Coupling the dynamics of boundary layers and evolutionary dunes, Phys. Rev., 79, 041307, 10.1103/PhysRevE.79.041307, 2009.

Press et al.(2007)Press, Teukolsky, Vetterling, and Flannery

Press, W., Teukolsky, S., Vetterling, W., and Flannery, B.: Numerical recipes. The art of scientific computing, 3rd edn., Cambridge University Press, 2007.

Prusa et al.(2008)Prusa, Smolarkiewicz, and Wyszogrodzki

Prusa, J., Smolarkiewicz, P., and Wyszogrodzki, A.: EULAG, a computational model for multiscale flows, Comput. Fluids, 37, 1193–1207, 10.1016/j.compfluid.2007.12.001, 2008.

Randall(2013)

Randall, D.: Lectures on numerical modelling of the atmosphere, available at: http://kiwi.atmos.colostate.edu/group/dave/at604pdf/Chapter_11.pdf (last access: October 2014), 2013.

Schär and Smith(1993)

Schär, C. and Smith, R. B.: Shallow-water flow past isolated topography. I – Vorticity production and wake formation, J. Atmos. Sci., 50, 1373–1412, 10.1175/1520-0469(1993)050<1373:SWFPIT>2.0.CO;2, 1993.

Schär and Smolarkiewicz(1996)

Schär, C. and Smolarkiewicz, P.: A synchronous and iterative flux-correction formalism for coupled transport equations, J. Comput. Phys., 128, 101–120, 10.1006/jcph.1996.0198, 1996.

Smolarkiewicz(1983)

Smolarkiewicz, P.: A simple positive definite advection scheme with small implicit diffusion, Mon. Weather Rev., 111, 479–486, 10.1175/1520-0493(1983)111<0479:ASPDAS>2.0.CO;2, 1983.

Smolarkiewicz(1984)

Smolarkiewicz, P.: A fully multidimensional positive definite advection transport algorithm with small implicit diffusion, J. Comput. Phys., 54, 325–362, 10.1016/0021-9991(84)90121-9, 1984.

Smolarkiewicz(2006)

Smolarkiewicz, P.: Multidimensional positive definite advection transport algorithm: an overview, Int. J. Numer. Meth. Fl., 50, 1123–1144, 10.1002/fld.1071, 2006.

Smolarkiewicz and Clark(1986)

Smolarkiewicz, P. and Clark, T.: The multidimensional positive definite advection transport algorithm – further development and applications, J. Computat. Phys., 67, 396–438, 10.1016/0021-9991(86)90270-6, 1986.

Smolarkiewicz and Grabowski(1990)

Smolarkiewicz, P. and Grabowski, W.: The multidimensional positive definite advection transport algorithm: Nonoscillatory option, J. Comput. Phys., 86, 355–375, 10.1016/0021-9991(90)90105-A, 1990.

Smolarkiewicz and Margolin(1994)

Smolarkiewicz, P. and Margolin, L.: Variational elliptic solver for atmospheric applications, Tech. Rep. LA-12712-MS, Los Alamos National Lab., 10.2172/10130964, 1994.

Smolarkiewicz and Margolin(1998)

Smolarkiewicz, P. and Margolin, L.: MPDATA: a finite-difference solver for geophysical flows, J. Comput. Phys., 140, 459–480, 10.1006/jcph.1998.5901, 1998.

Smolarkiewicz and Pudykiewicz(1992)

Smolarkiewicz, P. and Pudykiewicz, J.: A class of semi-Lagrangian approximations for fluids, J. Atmos. Sci., 49, 2082–2096, 10.1175/1520-0469(1992)049<2082:ACOSLA>2.0.CO;2, 1992.

Smolarkiewicz and Rasch(1991)

Smolarkiewicz, P. and Rasch, P.: Monotone advection on the sphere: an Eulerian versus semi-Lagrangian approach, J. Atmos. Sci., 48, 793–810, 10.1175/1520-0469(1991)048<0793:MAOTSA>2.0.CO;2, 1991.

Smolarkiewicz and Szmelter(2005)

Smolarkiewicz, P. and Szmelter, J.: MPDATA: an edge-based unstructured-grid formulation, J. Comp. Phys., 206, 624–649, 10.1016/j.jcp.2004.12.021, 2005.

Smolarkiewicz and Szmelter(2009)

Smolarkiewicz, P. and Szmelter, J.: Iterated upwind schemes for gas dynamics, J. Comput. Phys., 228, 33–54, 10.1016/j.jcp.2008.08.008, 2009.

Smolarkiewicz and Szmelter(2011)

Smolarkiewicz, P. and Szmelter, J.: A nonhydrostatic unstructured-mesh soundproof model for simulation of internal gravity waves, Acta Geophys., 59, 1109–1134, 10.2478/s11600-011-0043-z, 2011.

Smolarkiewicz et al.(1997)Smolarkiewicz, Grubišić, and Margolin

Smolarkiewicz, P. K., Grubišić, V., and Margolin, L. G.: On forward-in-time differencing for fluids: stopping criteria for iterative solutions of anelastic pressure equations, Mon. Weather Rev., 125, 647–654, 10.1175/1520-0493(1997)125<0647:OFITDF>2.0.CO;2, 1997.

Smolarkiewicz et al.(2014)Smolarkiewicz, Kühnlein, and Wedi

Smolarkiewicz, P., Kühnlein, C., and Wedi, N.: A consistent framework for discrete integrations of soundproof and compressible PDEs of atmospheric dynamics, J. Comput. Phys., 263, 185–205, 10.1016/j.jcp.2014.01.031, 2014.

Szmelter and Smolarkiewicz(2010)

Szmelter, J. and Smolarkiewicz, P.: An edge-based unstructured mesh discretisation in geospherical framework, J. Comput. Phys., 229, 4980–4995, 10.1016/j.jcp.2010.03.017, 2010.

Veldhuizen(2006)

Veldhuizen, T.: Blitz++ user's guide. A C++ class library for scientific computing, Tech. rep., available at: http://blitz.sf.net/resources/blitz-0.9.pdf (last access: October 2014), 2006.

Williamson and Rasch(1989)

Williamson, D. and Rasch, P.: Two-dimensional semi-Lagrangian transport with shape-preserving interpolation, Mon. Weather Rev., 117, 102–129, 10.1175/1520-0493(1989)117<0102:TDSLTW>2.0.CO;2, 1989.

Wilson et al.(2014)Wilson, Aruliah, Brown, Chue Hong, Davis, Guy, Haddock, Huff, Mitchell, Plumbley, Waugh, White, and Wilson

Wilson, G., Aruliah, D. A., Brown, C. T., Chue Hong, N. P., Davis, M., Guy, R. T., Haddock, S. H. D., Huff, K., Mitchell, I. M., Plumbley, M., Waugh, B., White, E. P., and Wilson, P.: Best practices for scientific computing, PLoS Biol., 12, e1001745, 10.1371/journal.pbio.1001745, 2014.

</app></app-group></back> </article>