Articles | Volume 15, issue 15
Model description paper
12 Aug 2022
Model description paper |  | 12 Aug 2022

Large-eddy simulations with ClimateMachine v0.2.0: a new open-source code for atmospheric simulations on GPUs and CPUs

Akshay Sridhar, Yassine Tissaoui, Simone Marras, Zhaoyi Shen, Charles Kawczynski, Simon Byrne, Kiran Pamnany, Maciej Waruszewski, Thomas H. Gibson, Jeremy E. Kozdon, Valentin Churavy, Lucas C. Wilcox, Francis X. Giraldo, and Tapio Schneider

We introduce ClimateMachine, a new open-source atmosphere modeling framework which uses the Julia language and is designed to be scalable on central processing units (CPUs) and graphics processing units (GPUs). ClimateMachine uses a common framework both for coarser-resolution global simulations and for high-resolution, limited-area large-eddy simulations (LESs). Here, we demonstrate the LES configuration of the atmosphere model in canonical benchmark cases and atmospheric flows using a total energy-conserving nodal discontinuous Galerkin (DG) discretization of the governing equations. Resolution dependence, conservation characteristics, and scaling metrics are examined in comparison with existing LES codes. They demonstrate the utility of ClimateMachine as a modeling tool for limited-area LES flow configurations.

1 Introduction

Hybrid computer architectures and the need to exploit the power of graphics processing units (GPUs) are increasingly driving developments in atmosphere and climate modeling (e.g., Schalkwijk et al.2012; Palmer2014; Schalkwijk et al.2015; Marras et al.2015; Abdi et al.2017b, a; Fuhrer et al.2018; Schär et al.2020). The sheer computing power available on modern hardware architectures presents opportunities to accelerate atmosphere and climate modeling. However, exploiting this computing power requires re-coding atmosphere and climate models to an extent not seen in decades, and portable performance and scaling across different platforms remain difficult to achieve (Fuhrer et al.2014; Balaji2021).

In this paper, we introduce ClimateMachine, a new open-source atmosphere model written in the Julia programming language (Bezanson et al.2017) using the Julia Message Passing Interface (MPI) to provide a computational framework that is portable across CPU and GPU architectures. The use of Julia aims to increase accessibility and utility of ClimateMachine as a simulation tool. Julia, developed with the aim of bridging the gap between productivity (e.g., Python) and performance (e.g., Fortran) languages (Bezanson et al.2018), is a strong candidate for an application such as atmospheric large-eddy simulations (LESs). Features such as package composability, GPU programming tools, and interactive programming interfaces contribute to its usability in scientific computing. Compilation time, incompatibilities between Julia MPI programs and interactive read–eval–print loops (REPLs), and the discovery of code issues during runtime due to Julia's dynamic typing are general drawbacks for new users of this language. Despite these drawbacks, a recent assessment by Lin and McIntosh-Smith (2021) further supports the suitability of Julia as a high-performance computing language. The atmospheric model presented here is designed to be usable across a range of physical process scales from large-eddy simulations (LESs) with meter-scale resolution to global circulation models (GCMs) with horizontal resolutions of tens of kilometers, as in a few other recent models (Dipankar et al.2015). We focus on the LES configuration of ClimateMachine in this paper.

Since the pioneering work on turbulence in stratified flows by Lilly (1962), on horizontal subgrid-scale dissipation in GCMs by Smagorinsky (1963), and the subsequent extension of such models to LESs (Lilly1966), several models have been developed to improve the ability of LESs to model atmospheric turbulence: from the extensive work by Deardorff in the 1970s and 1980s (Deardorff1970, 1974, 1976, 1980), Moeng in the 1980s and beyond (Moeng1984; Moeng and Wyngaard1988; Sullivan et al.1994; Moeng et al.2003), to Stevens, Teixeira, Mellado, and others in the last 2 decades (Stevens et al.2003, 2005; Savic-Jovcic and Stevens2008; Matheou et al.2011; Pressel et al.2015; Matheou2016; Matheou and Teixeira2019; Mellado2017; Mellado et al.2018). LES results in canonical flows are sensitive to the fine details of the equations used to represent the flow dynamics, the viscous dissipation, the thermodynamics, and the numerical methods used to solve them (Ghosal1996; Chow and Moin2003; Kurowski et al.2014), especially in the case of cloud simulations (Stevens et al.2005; Siebesma et al.2003; Schalkwijk et al.2012, 2015; Schneider et al.2019; Pressel et al.2015, 2017).

One distinguishing aspect of the ClimateMachine LES is that it uses a nodal discontinuous Galerkin (DG) formulation to approximate the Navier–Stokes equations for compressible flow (Giraldo et al.2002; Hesthaven and Warburton2008a; Giraldo and Restelli2008; Kopriva2009; Kelly and Giraldo2012; Giraldo2020). The DG method is a spectral-element generalization of finite-volume methods. It lends itself well to modern high-performance computing architectures because its communication overhead is low, enabling scaling on many-core processors including GPUs (Abdi et al.2017b). Another important consideration within ClimateMachine is the use of total energy of moist air as a prognostic variable, ensuring energetic consistency of the simulations. We demonstrate that the ClimateMachine LES can be successfully used to simulate canonical LES benchmarks, including simulations of flows over mountains and different cloud and boundary layer regimes (e.g., Straka et al.1993; Schär et al.2002; Stevens et al.2005).

In what follows, we describe the conceptual and numerical foundations and governing equations of ClimateMachine and demonstrate the model in a set of standard two- and three-dimensional benchmark simulations. Section 2 begins by highlighting the governing equations. Their numerical approximation through the DG representation is described in Sect. 3. Section 4 presents subgrid-scale models used in the LES to represent under-resolved flow physics, with results from key benchmarks presented in Sect. 5. Conservation properties are examined in Sect. 6, and performance on CPU and GPU hardware is described in Sects. 7 and 8, respectively. Section 9 contains closing remarks. Additional details about the model, boundary conditions, statistical definitions, and computer hardware are summarized in the Appendices.

2 Governing equations

2.1 Working fluid

The working fluid of the atmosphere model is moist, potentially cloudy air, considered to be an ideal mixture of dry air, water vapor, and condensed water (liquid and ice) in clouds. Dry air and water vapor are taken to be ideal gases. The specific volume of the cloud condensate is neglected relative to that of the gas phases (it is a factor of 103 less than that of the gas phases). All gas phases are assumed to have the same temperature and are advected with the same velocity u=(u,v,w)T. Cloud condensate is assumed to sediment relative to gaseous phases slowly enough to be in thermal equilibrium with the surrounding fluid.

The density of the moist air is denoted by ρ. We use the following notation for the mass fractions of the moist air mixture (mass of a constituent divided by the total mass of the working fluid).

  • qd: dry air mass fraction

  • qv: water vapor specific humidity

  • ql: liquid water specific humidity

  • qi: ice specific humidity

  • qc=ql+qi: condensate specific humidity

  • qt=qv+qc: total specific humidity

Because this enumerates all constituents of the working fluid, we have qt+qd=1. In Earth's atmosphere, the water vapor specific humidity qv dominates the total specific humidity qt and is usually 𝒪(10−2) or smaller; the condensate specific humidity is typically 𝒪(10−4). Hence, water is a trace constituent of the atmosphere, and only a small fraction of atmospheric water is in condensed phases. The working fluid pressure is the sum of the partial pressures of dry air and water vapor such that p=ρ(Rdqd+Rvqv)T, where Rd is the specific gas constant of dry air, and Rv is the specific gas constant of water vapor.

2.2 Mass balance

Moist air mass satisfies the conservation equation:

(1) ρ t + ( ρ u ) = ρ S ^ q t .

Moist air mass is not exactly conserved where precipitation forms, sublimates, or evaporates, where water diffuses, or where condensate sediments relative to the gas phases (Bott2008; Romps2008). The right-hand side involves the local source and/or sink of water mass S^qt owing to such nonconservative processes, which we take into account although it is small because water is a trace constituent of the atmosphere.

2.3 Total water balance

Total water satisfies the balance equation:

(2) ( ρ q t ) t + ( ρ q t u ) = ρ S q t - ( ρ d q t ) + ( ρ q c w c k ^ ) ρ S ^ q t .

Here, the source–sink Sqt arises from evaporation or sublimation of precipitation and formation of precipitation. Diffusive fluxes of moisture are captured by dqt. The effective sedimentation velocity of cloud condensate wc is defined such that

(3) q c w c = q l w l + q i w i ,

with wl and wi defined to be positive downward (k^ being the upward-pointing unit vector). The right-hand side ρS^qt of the total water balance equation is the same as the right-hand side of the mass balance in Eq. (1).

2.4 Momentum balance

The coordinate-independent form of the conservation law for momentum is

(4) ( ρ u ) t + ρ u u + ( p - p r ) I 3 = - ( ρ - ρ r ) Φ - 2 Ω × ρ u - ( ρ τ ) - d q t ρ u + q c w c k ^ ρ u + ρ F u ,

where I3 is the rank-3 identity matrix, Φ is the effective gravitational potential including centrifugal accelerations, τ is a viscous and/or subgrid-scale (SGS) momentum flux tensor, and Fu (typically with Fuu<0 so that Fu represents a momentum sink) is any other drag force per unit mass that may be applied, for example, at the lower boundary. The term involving the planetary angular velocity Ω accounts for Coriolis forces. To improve numerical stability, we have factored out a reference state with a pressure pr(z) and density ρr(z) that depend only on altitude z and are in hydrostatic balance so that they satisfy


The tensor involving the diffusive flux dqt of water on the right-hand side of Eq. (4) represents the momentum flux carried by water that is diffusing; this term is usually very small, but we take it into account.

2.5 Energy balance

The specification of a thermodynamic or energy conservation equation closes the equations of motion for the working fluid. We use the total specific energy, etot, as the prognostic variable. Total energy is conserved in reversible moist processes such as phase transitions of water.

Total energy satisfies the conservation law (Romps2008; Bott2008):

(5) ( ρ e tot ) t + ( ρ e tot + p ) u = - ( ρ F R ) - [ ρ ( J + D ) ] + ρ Q + ρ W c k ^ - ( u ρ τ ) - j { v , l , i } ( I j + Φ ) ρ C ( q j q p ) - M ,

where the total specific energy etot is defined by

(6) e tot = 1 2 u 2 + Φ + I .

The constituents (dry air and moisture components) here are assumed to be moving with the same velocity u (that is, we neglect, as is common, the diffusive and sedimentation fluxes of water in the kinetic energy). The constituents are also assumed to be in thermal equilibrium at the same temperature T so that the specific internal energy of moist air is the weighted sum of the specific energies of the constituents dry air (Id), water vapor (Iv), liquid water (Il), and ice (Ii):

(7) I ( T , q ) = ( 1 - q t ) I d ( T ) + q v I v ( T ) + q l I l ( T ) + q i I i ( T ) ,



Here, cvk for k{d, v, l, i} represents isochoric specific heat capacities for the appropriate species denoted by k; they are taken to be constant. The reference specific internal energy Iv,0 is the difference in specific internal energy between vapor and liquid at the arbitrary reference temperature T0; Ii,0 is the difference in specific internal energy between ice and liquid at T0 (Romps2008). The reference internal energies are related to specific latent heats of vaporization and fusion, Lv,0 and Lf,0, at the reference temperature T0 through


The values of the thermodynamic constants we use are listed in Table 1.

Table 1Thermodynamic constants in CLIMAParameters.

Download Print Version | Download XLSX

Furthermore, the flux FR is the radiative energy flux per unit mass; J is the conductive energy flux per unit mass, and D is the specific enthalpy flux associated with the diffusive flux of water:

(11) D = ( e v tot + R v T ) d q v + e l tot d q l + e i tot d q i .

The flux uρτ is the energy flux associated with the viscous and/or SGS turbulent momentum flux; Q is any internal energy source (e.g., external diabatic heating). The flux

(12) W c = q l e l tot w l + q i e i tot w i

represents the downward energy flux due to sedimenting condensate.

The terms involving ρC(qjqp) (j{ v, l, i }) represent the loss of internal and potential energy of moist air masses owing to precipitation formation; the kinetic energy loss is neglected, consistent with the neglect of the source and/or sink associated with precipitation formation in the momentum balance in Eq. (4). Additional energy sinks involve the energy loss owing to heat transfer from the working fluid to precipitation as it falls through air and possibly melts at the freezing level (Raymond2013); the associated energy sources and sinks are generally provided by a microphysics parameterization and are subsumed in the term M.

2.6 Equation of state

Pressure p is calculated from the ideal gas law:

(13) p = ρ R m T ,

where Rm is the gas “constant” of moist air,

(14) R m ( q ) = R d ( 1 - q t ) + R v q v = R d 1 + ( ε dv - 1 ) q t - ε dv q c ,

with the ratio of the gas constants of water vapor and of dry air εdv=Rv/Rd.

2.7 Saturation adjustment

Gibbs' phase rule states that in thermodynamic equilibrium, the temperature T and liquid and ice specific humidities ql and qi can be obtained from the three thermodynamic state variables density ρ, total water specific humidity qt, and internal energy I. Thus, the above equations suffice to completely specify the thermodynamic state of the working fluid given ρ, qt, and I, with the latter obtained from the total energy via its definition in Eq. (6).

Obtaining the temperature and condensate specific humidities from the state variables ρ, qt, and I is the problem of finding the root T of

(15) I * ( T ; ρ , q t ) - I = 0 ,

where I*(T;ρ,qt) is the internal energy at equilibrium, when the air is either unsaturated and there is no condensate (qv=qt), or water vapor is in saturation and the saturation excess qtqv is apportioned, according to temperature T, among the condensed phases ql and qi. We solve this nonlinear “saturation adjustment” problem by Newton iterations with analytical gradients (see Tao et al.1989; Pressel et al.2015). To obtain the saturation vapor pressure and derived functions needed in this calculation, we assume all isochoric heat capacities to be constant (i.e., we assume the gases to be calorically perfect); with this assumption, the Clausius–Clapeyron can be integrated analytically, resulting in a closed-form expression for the saturation vapor pressure (Romps2008).

This procedure allows the use of total moisture qt as the sole prognostic variable but confines the system to the assumption of equilibrium thermodynamics. Alternatively, using explicit tracers for the condensate specific humidities ql and qi allows non-equilibrium thermodynamics to be considered and mixed-phase processes to be explicitly modeled.

3 Discretization of the governing equations

3.1 Space discretization

The governing equations are discretized in space via a nodal DG approximation. To describe the DG procedure, we recast the Eqs. (1)–(5) in divergence form as

(16) Y t = - ( F 1 + F 2 ) + S ( Y ) ,

where Y=[ρ,ρu,ρetot,ρqt]T is an abstract vector of state variables, F1 contains the fluxes not involving gradients of state variables and functions thereof, F2 contains the fluxes involving gradients of state variables (e.g., diffusive fluxes), and 𝓢(Y) contains the sources.

The DG solution of Eq. (16) is approximated on the finite-dimensional counterpart Ωh of the flow domain Ω, which consists of NΩe non-overlapping hexahedral elements Ωe such that


where a superscript h indicates the discrete analog of a continuous quantity. By virtue of tensor-product operations allowed on hexahedral elements and the ability to rely on inexact quadrature when elements of order greater than 3 are utilized, high-order Galerkin methods are particularly attractive for operation-intensive solutions (Kelly and Giraldo2012). Within each element, the finite-dimensional approximation of Y(x,t) is given by the expansion

(17) Y e h ( x , t ) = α = 1 ( N + 1 ) 3 ψ α e ( x ) Y α e ( t ) ,

where (N+1)3 is the number of collocation points within the three-dimensional element of order N, and ψαe represents the interpolation polynomials evaluated at local point α inside element e.

From now on, the subscript or superscript e is omitted with the understanding that all operations are executed element-wise unless otherwise stated. Furthermore, the physical elements in the x=(x,y,z) space are mapped to a reference element ξ=(ξ,η,ζ). The three-dimensional basis functions ψα result from the one-dimensional functions Li(ξ), Lj(η), and Lk(ζ) as the tensor product:


Each function L is a one-dimensional (1D) Lagrange polynomial defined on the 1D reference element [-1,1]. The Lagrange function evaluated at points i along the ξ direction within the element is


where ξi represents the N+1 co-located interpolation points along ξ. The polynomials Lj and Lk in the other two directions η and ζ are built in the same way. The N+1 interpolation points may be chosen in a variety of ways (Deville et al.2002; Karniadakis and Sherwin1999); here we choose Legendre–Gauss–Lobatto (LGL) points (Giraldo and Restelli2008). The Kronecker δ property of the Lagrange polynomials is such that


in 1D, which, in three dimensions (3D), translates to

(18) Ψ α ( ξ a , η b , ζ c ) = δ i a δ j b δ k c .

This allows us to reduce the operation count as follows. We construct the space and time derivatives as


By virtue of the 3D Kronecker δ property, the spatial derivatives of the basis functions appearing here are given by


Using this property reduces the operation count significantly since we only require 3N operations instead of the N3 operations otherwise needed to compute the derivatives at a given node (Abdi et al.2017b).

The operators defined on the reference elements are mapped onto the physical space by means of the transformation

(24) ψ = J - 1 ^ ψ ,

where =(x,y,z)T and ^=(ξ,η,ζ)T, and


is the inverse Jacobian of the transformation from physical space to the reference element.

The DG approximation of the differential Eq. (16) is constructed by multiplying, within each element, the equation by the test function ψα and then integrating over the element volume Ωe such that

(25) Ω e ψ α t Y + F 1 ( Y ) + F 2 ( Y , Y ) d Ω e = Ω e ψ α S ( Y ) d Ω e ,

where ψα within each element belongs to the function space of square integrable piecewise polynomials of order N (i.e., ψL2). By definition, these functions are discontinuous across element boundaries; differentiability is not globally required but only within each element (Hesthaven and Warburton2008b). Integrating the divergence term by parts yields

(26) Ω e Ψ α t Y d Ω e + Γ e Ψ α n F 1 * ( Y ) d Γ e - Ω e Ψ α F 1 ( Y ) d Ω e - Ω e Ψ α F 2 ( Y , Y ) d Ω e = Ω e Ψ α S ( Y ) d Ω e ,

where Ωe and Γe are the volume and boundary of each element, respectively, n is the outward-facing unit vector orthogonal to each element face, and F1* is a numerical flux. The imposition of the numerical fluxes across element boundaries is the numerical mechanism that promotes continuity of the discontinuous solution across the elements. The numerical fluxes are calculated as the approximate solution to a Riemann problem across two neighboring elements. ClimateMachine currently implements the Rusanov (1961), Roe (1981), and Harten–Lax–van Leer contact (HLLC) (Toro et al.1994; Harten1983) numerical fluxes. The Rusanov flux, for instance, is constructed as

(27) n F 1 * ( Y ) = n 2 F 1 ( Y - ) + F 1 ( Y + ) + n λ Γ e Y - - Y + ,

where Y is the state at the internal interface of element e, Y+ is the state at the external interface of e, and λΓe(Y-,Y+) is an estimate of the maximum flow speed (e.g., the maximum eigenvalue of the Jacobian of the flux F1 with respect to the state variables, which is the speed of sound).

Because the second-order derivatives in ∇⋅F2 cannot be directly built with the weak variational formulation if a discontinuous function space is used (Bassi and Rebay1997), an auxiliary variable Ỹ is introduced such that


which can then be discretized via DG as

(30) Ω e Ψ α Y d Ω e Γ e Ψ α n μ Y ̃ * - μ Y ̃ d Γ e + Ω e Ψ α μ Y ̃ d Ω e .

Here, Ỹ* is approximated via centered flux like in Bassi and Rebay (1997). We also refer to Abdi et al. (2017b) for more details.

For algorithmic efficiency, inexact quadrature is used to calculate the integrals above. By virtue of inexact integration and of Eqs. (17), (19), and (24), the variational DG equations yield the semi-discrete matrix problem:

(31) d Y i e d t = - j , i T F j e + S i e + w i s | J | i s w i e | J | i e n i s F e - F * i ,

where wis represents interpolation weights. The algebraic details to obtain this expression can be found in Giraldo and Restelli (2008), where the s superscript indicates a value that is defined on the element boundary surface. The system (Eq. 31) is integrated on each element with respect to time.

In order to achieve good parallel scaling it is necessary to overlap communication and computation to the fullest extent possible. With DG (and all element-based Galerkin methods) this can be naturally achieved by splitting Eq. (31) into terms that arise from the approximation of volume integrals and surface integrals. All volume contributions can be calculated independently of element-to-element communication regardless of the order of the spatial approximation, as can surface integrals that are not on elements which share boundaries across MPI ranks. Thus, in the code, we start with Message Passing Interface (MPI) communication, do all volume calculations and surface calculations for elements not on boundaries shared across ranks, and then apply surface calculations for elements on the rank boundaries after communication operations have been completed. This approach makes DG naturally effective with respect to parallel computing as previously shown by, e.g., Müller et al. (2018) on CPUs and Abdi et al. (2017b) on GPUs. At high order, element-based Galerkin methods such as DG require fewer neighboring degrees of freedom than high-order finite-difference and finite-volume discretizations.

3.2 Time discretization

ClimateMachine provides a suite of time integrators consisting of explicit Runge–Kunge methods as well as low-storage (Carpenter and Kennedy1994; Niegemann et al.2012), strong stability-preserving (Shu and Osher1988), and additive Runge–Kutta (ARK) implicit–explicit (IMEX) methods (Giraldo et al.2013; Kennedy and Carpenter2019).

The benchmarks presented in this paper with isotropic grid spacing are run using the fourth-order 14-stage method of Niegemann et al. (2012), which has a large explicit time-stepping stability region. One of the benchmarks, however, uses a highly anisotropic grid, which benefits from the use of a 1D IMEX approximation; there we use a variant of the horizontally explicit, vertically implicit (HEVI) schemes by Bao et al. (2015). While 3D IMEX is also an option, its performance in terms of time to solution is ultimately limited by the availability of scalable 3D implicit solver algorithms.

4 Subgrid-scale models

The governing equations are resolved with the discretizations presented in Sect. 3. This leaves unresolved but dynamically significant scales on the computational grid that must be modeled with three-dimensional SGS models. We note that, while filtering over the grid volume is typically a formal requirement for LES variables, no explicit filter kernel is applied to the equations here. The resolution of the prognostic variables on a discretized grid constitutes an implicit filtering operation, with an equivalent length scale given by the grid resolution. In general, SGS fluxes are modeled as diffusive fluxes, which capture down-gradient transport of conservable scalar quantities assuming that mixing lengths are small compared with the scales over which the gradients of the scalars vary. We address the physical form of the diffusive flux components in Eqs. (1)–(5), following which we describe standard models of subgrid-scale turbulence available for use in ClimateMachine.

The diffusive turbulent stress tensor τ is represented in terms of the symmetric rate of strain tensor S such that

(32) S ( u ) = 1 2 u + u T ,


(33) τ = - ( 2 ν t S ) .

Here, νt is a turbulent viscosity tensor whose components are typically orders of magnitude larger than the molecular viscosity and are a function of the velocity gradient tensor.

The diffusive flux dqt of total water specific humidity in Eq. (2) is modeled as

(34) d q t = - ( D t q t ) ,

where 𝓓t is a turbulent diffusivity vector. The turbulent diffusivity 𝓓t is related to the turbulent viscosity tensor νt via the turbulent Prandtl number such that

(35) D t = diag ( ν t ) Pr t ,

where Prt takes a typical value of Prt=1/3.

The unresolved flux of total enthalpy htot results in a diffusive subgrid flux term of the form

(36) J + D = - ( D t h tot ) ,

where J is the thermal diffusion flux analogous to the molecular conductive heat flux, and D is the energy flux carried by water vapor, defined in Eq. (11). For energetic consistency, we use the same turbulent diffusivity 𝓓t for moist enthalpy and water.

4.1 Smagorinsky–Lilly model

The turbulent eddy viscosity νt in the model by Lilly (1962) and Smagorinsky (1963) (SL henceforth) is defined by means of the magnitude of the rate of the strain tensor S, whose components are Sij=12uixj+ujxi, according to

(37) ν t = ( C s Δ ) 2 2 S i j S i j

for i,j=1,2,3; Cs is a constant Smagorinsky coefficient usually within the range 0.12<Cs<0.21, and Δ is the LES filter width. Inside each hexahedral element of order N and side lengths Lx,y,z along the x,y, and z directions, the effective grid resolution is Δ(x,y,z)=Lx,y,z/N, which is the average distance between two consecutive nodal points. We use an isotropic eddy viscosity tensor νt in LESs, with its components defined by Eq. (37). Similar eddy viscosity models can be found in other recent models of compressible atmospheric flows (e.g., Jähn et al.2015; Shi et al.2018).

4.2 Vreman eddy viscosity model

The SGS model developed by Vreman (2004) is of interest because of its robustness across flow regimes and because it has low dissipation near wall boundaries and in transitional flows. Its computational complexity is similar to the classical SL model. While the Vreman model is extensively used in engineering LESs, it is uncommon in atmospheric flows, for which a constant coefficient SL or the one-equation turbulent kinetic energy (TKE) model by Deardorff (1970, 1980) is the most common choice (see, e.g., Stevens et al.2005).

The turbulent eddy viscosity of this model depends on first-order derivatives of velocities and is given by

(38a) ν t = 2.5 C s 2 B β u i , j u i , j ,



Here, summation over repeated indices i,j{1,2,3} is implied, Cs is the constant Smagorinsky coefficient, and ui represents the components of the resolved-scale velocity vector so that ui,j is the velocity gradient tensor. The mixing lengths Δm can be determined as the grid spacing in the direction implied by subscript m; in this paper ΔmLx,y,z/N. Near-surface corrections to the characteristic length (e.g., Mason and Callen1986) have not been applied to either subgrid-scale model in the present work.

Richardson correction in stable regions of the atmosphere. To account for the atmospheric stability and, effectively, reduce turbulence generation to zero in stably stratified atmospheres, we multiply the eddy viscosity by the correction factor

(39) f b = 1 R i 0 , max ( 0 , 1 - R i / P r t ) 1 / 4 R i > 0 ,



Here, Prt=1/3 is a constant turbulent Prandtl number, and θv is the virtual potential temperature for a given specific humidity qt and specific liquid water content ql (see, e.g., Deardorff1980). This temperature is computed consistently with the moist phase partitioning (saturation adjustment) in the system and is therefore a reasonable measure of the effect of buoyancy on flow stratification. This is applied to both turbulence closures discussed in this paper.

4.3 Numerical stability

When high-order Galerkin methods are used to solve nonlinear advection-dominated problems, spurious Gibbs oscillations may affect the solution and need to be addressed. ClimateMachine provides a set of spectral filters, cut-off filters, and artificial diffusion methods to remove these oscillations. While filters may be effective, we found that stabilizing the LES solution by means of the SGS eddy viscosity alone is effective and robust; this is in agreement with results shown by Marras et al. (2015), Marras and Giraldo (2015), and Reddy et al. (2022) in the case of continuous Galerkin methods. Other methods of stabilizing numerical solutions are discussed by Light and Durran (2016) and Yu et al. (2015). This approach stems from the idea that under-resolved scales are responsible for the numerical oscillations observed in solutions through the introduction of unphysical artifacts in properties of the solution variables. Detailed analyses of the interactions of subgrid-scale models and filtering techniques with DG numerics in the context of atmospheric flows will be presented in a forthcoming paper.

5 Numerical experiments and discussion

We first demonstrate the convergence of the numerical solution to the Euler equations with an isentropic vortex advection problem. Following this, we demonstrate results from the ClimateMachine using standard benchmark problems including (1) dry rising thermal bubble in a neutrally stratified atmosphere, (2) dry density current, (3) hydrostatic and nonhydrostatic mountain-triggered linear gravity waves, (4) the Barbados Oceanographic and Meteorological Experiment (BOMEX), and (5) decaying Taylor–Green vortex in a triply periodic domain.

5.1 Advection of an isentropic vortex

To demonstrate convergence of the numerical solutions, we consider the two-dimensional dry isentropic vortex advection problem with fourth-order polynomials, consistent with the benchmark cases shown in the sections that follow. We consider the pure advection of a vortex in a domain with edge lengths L=0.1 m, with a vortex radius R=0.005 m. The initial velocity profile is given by

(40) u = ( u 0 cos α + δ u , u 0 sin α + δ v ) ,

with a prescribed translational velocity u0=150 m s−1, translation angle α=π/4, and a vortex speed vs=50 m s−1 so that the velocity perturbations are given by

(41) δ u = - v s x y R exp ( r 2 R ) 2 ,

(42) δ v = v s x x R exp ( r 2 R ) 2 .

The perturbations are computed with an offset coordinate given by


where x and y represent Cartesian coordinates, to ensure that the resulting functions are periodic. The initial temperature profile is given by

(45) T = T ( 1 - κ d v s 2 2 ρ p exp ( - r R ) 2 ) ,

with a reference temperature of T=300 K, reference pressure p=105 Pa, and the corresponding density ρ computed from the ideal gas law. The simulation time is given by tend=L/(10u0). We consider fourth-order polynomials, with increasing refinement given by the addition of elements in the domain discretization, to generate the Euclidean distance between the initial state and the time-stepped solution state at tend=10-4 s, as shown in Fig. 1. The results demonstrate consistent convergence across three numerical flux methods for element numbers ranging from 5–320. The convergence rate approaches 5 for the fourth-order polynomials considered in this study.

Figure 1Euclidean distance between initial and final solution vectors of the prognostic variables generated using Rusanov (octagons), Roe (squares), and Harten–Lax–van Leer contact (HLLC) numerical fluxes. Final solutions are evaluated at tend=10-4 s based on a translational velocity of 150 m s−1. For each numerical flux method, we consider Ne=[5,10,20,40,80,160,320] elements with fourth-order polynomials. The dashed line represents Ne-5 convergence.


Tests presented in Sect. 5.25.5 are executed in a 3D domain even when the problem is two-dimensional, in which case we have effectively zero tendencies in the third dimension; this setup is identified as 2.5D in what follows.

5.2 2.5D rising thermal bubble in a neutrally stratified atmosphere

A neutrally stratified atmosphere with uniform background potential temperature θ0=300 K is perturbed by a circular bubble of warmer air. The hydrostatic background pressure decreases with z as

(46) p = p 0 1 - g c pd θ 0 z c pd / R d

in a domain Ω=[0,10000]×[-,]×[0,10000] m3. The perturbation is as defined in Ahmad and Lindeman (2007),

(47) Δ θ = θ c 1.0 - r r 0 if r r 0 = 2000 m ,

where r=(x-xc)2+(z-zc)2, (xc,zc)=(5000,2000) m, and θc=2 K. The initial velocity field is zero everywhere. Periodic boundary conditions are used along y, and solid walls with impenetrable, free-slip boundary conditions are used in the x and z directions. Detailed information on boundary conditions for all test cases is provided in Appendix A. Five runs are performed at effective uniform resolutions Δx=Δz=250, 125, 62.5, and 31.25, and 15.625 m, with polynomial order N=4. Potential temperature θ and the two velocity components u and w are plotted at t=1000 s in Fig. 2 for the grid resolution of 15.625 m, which represents a reference solution for comparison with solutions at coarser resolutions. The value of the maximum potential temperature perturbation Δθmax and of the horizontal and vertical velocity components agree with the 125 m resolution results shown by Ahmad and Lindeman (2007). The grid dependence of the solution is shown in Fig. 3, where potential temperature is plotted for Δx=Δz=31.25, 62.5, 125, and 250 m. While the solution is visibly more dissipative at coarser resolutions, the bubble's leading edge position (and hence propagation speed) is not sensitive to the grid resolution.

The SL and Vreman closures are used to model diffusive fluxes in this problem. The solutions show no discernible differences, and only the SL solution is shown. A visual comparison of the two becomes more meaningful when shear triggers mixing, which is shown for the density current test in Sect. 5.3. Although DG inherits an implicit numerical diffusion as an effect of the numerical flux calculation across elements, under-resolved advection-dominated problems still require a dissipation or filtering mechanism to preserve the solution's stability. In the case of Marras et al. (2015), a dynamically adaptive SGS model was used, whereas a Boyd–Vandeven filter (Boyd1996; Vandeven1991) was used by Giraldo and Restelli (2008).

Figure 22.5D rising thermal bubble with effective resolution Δx=Δz=15.625 m and N=4. Panels depict (a) potential temperature θ, (b) horizontal velocity u, and (c) vertical velocity w at t=1000 s in Ω=10×10 km2.


Figure 32.5D rising thermal bubble solution with N=4 at decreasing effective resolution. Grid convergence of potential temperature θ at four different resolutions to be compared against the 15.625 m resolution results shown in Fig. 2. (a) Δx=Δz=31.25 m, (b) 62.5 m, (c) 125 m, and (d) 250 m. Although the solution is visibly more dissipated at coarser resolution, the bubble's leading edge (and hence propagation speed) is not affected.


5.3 2.5D density current in a neutrally stratified atmosphere

The density current problem by Straka et al. (1993) is used to test the LES framework in a flow with Kelvin–Helmholtz instabilities. As for the rising thermal bubble, the background initial state is in hydrostatic equilibrium at uniform potential temperature θ0=300 K. A perturbation of θ centered on (xc,zc)=(0,3000) m and with radii (rx,rz)=(4000,2000) m is given by the function

(48) Δ θ = θ c 2 1 + cos ( π c r ) if r 1 ,

where θc=-15 K and r=(x-xc)/rx2+(z-zc)/rz2 in the domain Ω=[0,25600]×[-,]×[0,6400] m3. Periodic boundary conditions are used along y; impenetrable free-slip conditions are imposed in x and z. The flow is initially stationary.

To reach solution grid convergence, this test is classically executed with a constant kinematic viscosity ν=75 m2 s−1. Increasingly finer structures are resolved when the resolution increases (Marras et al.2012, 2015). A measure of solution fidelity is the front position, which we compare against other models in Appendix D for different resolutions. The ClimateMachine results show quantitative agreement with respect to the frontal location from a range of models with varying spatial discretizations at resolutions ranging from 12.5 to 100 m. This demonstrates the scope for capturing small-scale flow features with the numerics described in Sect. 3.

The structure of potential temperature at the final time t=900 s is shown in Fig. 4 for the Vreman and SL solutions. When Rusanov is the chosen numerical flux (Fig. 4a and b), the solutions are very similar although Vreman is visibly less dissipative for a prescribed value of the Smagorinsky coefficient Cs=0.18, with finer scales of motion apparent in the contours of potential temperature. Further quantitative analysis of the dissipative properties of the SGS models is reported in Sect. 5.6. Despite Vreman being less dissipative than SL, the Roe (Fig. 4c) and HLLC (Fig. 4d) fluxes contribute to additional numerical diffusion when compared to Rusanov fluxes. Detailed analysis of the interaction between numerical fluxes and subgrid-scale models will be presented in future articles.

Figure 42.5D density current. Potential temperature θ (K) at t=900 s computed with an effective DG resolution of Δx=Δz=12.5 m with domain extents shown in meters. (a) Rusanov numerical flux with SL SGS. (b) Rusanov numerical flux with Vreman SGS. (c) Roe numerical flux with Vreman SGS. (d) HLLC numerical flux with Vreman SGS. The color scale ranges from θ=285 to 300 K. A shared color bar for all plots is shown in panel (a) for clarity.


5.4 Passive transport over warped grids

To verify the correct behavior of the DG implementation in the presence of topographic features, the simple passive advection test described by Schär et al. (2002) is used. The conservation law for the diffusive transport of a passive tracer χ is

(49) ( ρ χ ) t + ( ρ u χ ) = - ( ρ d χ ) ,

which is approximated via DG in the same way as Eq. (16). For a scalar tracer variable χ, we model diffusive fluxes dχ such that

(50) d χ = - ( δ χ D t χ ) ,

where δχ relates the ratio of turbulent diffusivity of the tracer to that of the energy and moisture variables. For this test case, however, tracer diffusivity is set to zero to assess the stability and transport properties when using a warped grid. The volume grid in ClimateMachine is built by stacking elements above the surface and warping them around the terrain profile. To reduce the element distortion across the domain, a linear grid damping function is used such that a topography-conforming surface of nodal points near the domain's bottom surface decays to a horizontal plane at higher altitudes (Gal-Chen and Somerville1975).

The initial scalar field χ is described by an elliptical perturbation centered on (xc,zc)=(25,9) km with radii (rx,rz)=(25,3) km such that

(51) χ = χ 0 cos 2 π r 2 , for  r 1 , 0 otherwise  ,

where χ0=1 and r=(x-xc)/rx2+(z-zc)/rz2 in the domain Ω=[0,150000]×[-,]×[0,30000] m3. An effective uniform grid resolution Δx=Δz=500 m is used for this test. The initial velocity profile is given by

(52) u ( z ) = u 0 1 , for  z z 2 sin π 2 z - z 1 z 2 - z 1 , for  z 1 z < z 2 , 0 , for  z < z 1 ,

where u0=10 m s−1, z1=4 km, and z2=5 km.

The topography is defined by the function

(53) z sfc ( x ) = h 0 cos 2 π ( x - x 0 ) 2 a cos 2 π ( x - x 0 ) λ for  | x - x 0 | a 0 , for  | x - x 0 | > a ,

where h0=3 km, a=25 km, λ=8 km, and x0=75 km.

The contours of χ in Fig. 5 show minimal distortion in spite of the warped elements directly above the topographical feature, indicating that the DG transformation metrics from physical to logical space in the presence of topography do not adversely affect the solution. Deviation from the initial profile of tracer magnitudes is between 4 % and +2 % when the tracer is above the topographical feature, with maximum deviation amplitudes of 5 % and +3 % at the end of the test, showing a favorable comparison with the hybrid and SLEVE coordinate results presented in Schär et al. (2002).

Figure 5Solution of the passive transport of scalar χ at three different time instances, with contours of tracer quantity χ (from 0, outermost contour, to 1, innermost contour) overlaid on a representation of nodal points on the underlying mesh. Tracer advection is driven by a prescribed velocity profile from left to right. As it crosses the deformed grid above the mountain ridge, only a minimal distortion of tracer contours is observed, which is completely recovered back to a smooth solution downwind of the ridge.


5.5 Mountain-triggered gravity waves

To assess the correct implementation of a Rayleigh sponge layer to attenuate fast, upward-propagating gravity waves before they reach the top of the domain, two steady-state mountain-triggered gravity wave problems suggested by Smith (1980) are solved. The sponge layer is described in Appendix A2. These problems consist of a flow that moves eastward with uniform horizontal velocity u=(u,0,0) m s−1 in a doubly periodic domain. The flow impinges against a mountain of height hm and base length a centered at xc as

(54) z sfc ( x ) = h m a 2 ( x - x c ) 2 + a 2 .

The background state is in hydrostatic balance with Brunt–Väisälä frequency 𝒩 such that


for a given surface potential temperature θsfc=Tsfc. The hydrostatically balanced pressure is

(55) p = p sfc 1 + g 2 c pd θ sfc N 2 exp ( - z N 2 g ) - 1 c pd / R d ,

which yields, by means of the ideal gas law, the background density:

(56) ρ = p sfc θ R d p p sfc c vd / c pd .

These tests are affected by spurious oscillations that appear approximately 5000 s into the simulation. In the absence of shear, because of the free-slip bottom and top boundaries, the SGS models are unable to introduce sufficient diffusion to remove the Gibbs modes so that an exponential filter (Hesthaven and Warburton2008b) of order 64 was applied on the velocity field to remove spurious modes. The filter assumes the form of

(57) σ ( η ) = e - α η s ,

where s is the filter order, η is a function of the polynomial order, and α=-log(εM) is a parameter that controls the smallest value of the filter function for machine precision εM. In double precision, α≈36. The filter in this form is applied to perturbations of the prognostic variables from the balanced background state.

5.5.1 Linear hydrostatic

The linear hydrostatic case proposed by Smith (1979) consists of a neutrally stratified isothermal atmosphere with θ=θsfc=250 K. The background atmosphere is isothermal with temperature T0, resulting in a Brunt–Väisälä frequency of


The flow moves in a periodic channel along the x direction with velocity u=(20ms-1,0,0) over a mountain with hm=1 m and a=10 000 m. A Rayleigh absorbing layer is added at zs=25 km with relaxation coefficient α=0.5 s−1, power γ=2, and domain top ztop=30 km (see Appendix A2 for details). The domain extends from 0 to 240 km in the horizontal direction. The steady-state solution at t=15 000 s is shown in Fig. 6a. It is consistent with the DG results shown by Giraldo and Restelli (2008).

5.5.2 Linear nonhydrostatic

The linear nonhydrostatic mountain waves are forced by a flow of uniform horizontal velocity u=(10ms-1,0,0) over a mountain with hm=1 m and a=1000 m. The domain extends from 0 to 144 km in the horizontal direction and is 30 km high.

The steady-state solution at t=18 000 s is shown in Fig. 6b. It is consistent with results shown by Giraldo and Restelli (2008).

Figure 6Vertical velocity w for two linear mountain wave tests. Velocity contours are shown in the range from -4×10-3 m s−1 (blue) to 4×10-3 m s−1 (red). (a) Hydrostatic test at t=15 000 s, with an effective horizontal resolution Δx=500 m and effective vertical resolution Δz=240 m with N=4. (b) Nonhydrostatic test at t=18 000 s, with an effective horizontal resolution Δx=200 m and effective vertical resolution Δz=160 m with N=4.


5.6 Decaying Taylor–Green vortex

The decaying Taylor–Green vortex (TGV) is a classical test to estimate the dissipative properties of turbulence models in the absence of solid boundaries. The gravity-free flow is initialized in a triply periodic cube of dimensions [-π,π]3. The solenoidal initial velocity field u0=(u0,v0,w0) is defined as


with initial pressure

(61) p 0 = p + ρ 0 U 0 2 16 ( 2 + cos ( k z ) ) ( cos ( 2 k x ) + cos ( 2 k y ) ) ,

where k is the wavenumber, U0=100 m s−1, ρ0=1.178 kg m−3, and p=101 325 Pa. Fourth-order polynomials are used for all simulations considered in this section.

We first consider the volume-averaged kinetic energy, which provides insight into the dissipation characteristics of the flow with respect to nondimensionalized time t*=kU0t. In integral form, the kinetic energy can be written as

(62) E k = 1 2 | u | 2 = 1 2 Ω h Ω h u u d Ω h ,

where 〈⋅〉 denotes a volumetric average over the volume Ωh. If the flow is inviscid, the kinetic energy should be conserved. This is only valid if the numerics or SGS models do not introduce numerical dissipation or if all flow scales are well resolved. As such, the time series of kinetic energy is a metric that shows the point along the simulation at which the solution becomes under-resolved. The kinetic energy dissipation rate is the second quantity of interest, which allows us to quantify the rate of decay of kinetic energy over time. This is defined as

(63) ϵ = - d E k d t .

A third quantity of interest for this analysis is enstrophy, which is defined as the square of the vorticity norm:

(64) ω 2 = | | × u | | 2 .

The enstrophy of a fully resolved flow should go to infinity if the flow is inviscid. Therefore, enstrophy can be used as a criterion to estimate the effect of numerical dissipation.

By means of a three-dimensional fast Fourier transform (FFT) of the velocity field, the kinetic energy spectrum is calculated as

(65) E ( k ) = 0 2 π 0 π 0 K A ( k x , υ , ζ ) k 2 sin ( ζ ) d k d υ d ζ ,

where K=2π/L, L is the characteristic length, A is a three-dimensional array of Fourier mode amplitudes, υ=kz/k, ζ=tan-1(ky/kx), and k=kx2+ky2+kz2. The TGV flow is simulated using both the SL and Vreman models on grids with 323, 643, 1283, and 1923 points. Figure 7 shows a 3D visualization of the flow at two different nondimensional times using zero Q-criterion isosurfaces, which identify balance between rotation and shear in the flow (Hunt et al.1988). As the flow evolves, the flow generates smaller and smaller-scale vortices. Eventually, the flow becomes under-resolved, making it impossible to conserve kinetic energy. As the flow continues to evolve, an instability occurs, which causes the disintegration of the vortex sheet. After this point, the TGV's dynamics are controlled by the interaction of small-scale vortical structures formed by vortex stretching.

Figure 7Taylor–Green vortex. Isosurfaces of zero Q-criterion (these identify surfaces where the vorticity norm is identical to the strain rate magnitude) on a 1923 grid, corresponding to 48 elements of order 4 at (a) t*=4 and (b) t*=60. Plots are shown for the Vreman solution. The isosurfaces of the Q-criterion are colored by dimensionless kinetic energy.


Results for the coarse-resolution simulations are presented in Fig. 8, and those for the fine-resolution simulations are shown in Fig. 9. Figure 8a shows that the kinetic energy changes with time for the 323 resolution simulations are distinctly different from their higher-resolution counterparts in Fig. 9a. The severe under-resolution of the flow seems to generate much larger amounts of dissipation early on. This is further demonstrated when comparing Figs. 8b and 9b. Beyond 643 resolution, the peaks in dissipation are larger as the resolution increases, since smaller vortices can be resolved before the instability eventually happens. The 323 simulations (Fig. 8b) have larger peaks than even the 1923 (Fig. 9b) simulations, demonstrating that the under-resolution of the vortex structures leads to different, more dissipative early-flow behavior.

Figure 9b shows that the 1283 and 1923 simulations using both SGS closures are characterized by a peak in the kinetic energy dissipation at t*=9. Brachet et al. (1983) and Brachet (1991) demonstrate this result for their direct numerical simulations (DNSs) of the Taylor–Green vortex for a Reynolds number Re=U0/kν3000. Examining Fig. 8b, the 643 simulations suggest a dissipation peak time of t*=8, and the 323 simulations suggest a dissipation peak time of t*=6. The underprediction of the time at which the peak occurs is due to an inability to resolve the vortices that appear early on in the flow's evolution at extremely coarse resolutions. This leads to the early appearance of the instability which causes the dissipation peak. Furthermore, we see that the kinetic energy decay occurs sooner for the lower-resolution simulations as a result of increased dissipation from the SGS models.

Figure 8Evolution of volumetrically averaged (a) kinetic energy represented on a semi-logarithmic scale, (b) kinetic energy dissipation, and (c) enstrophy, each computed on 323 and 643 grids with the Vreman and SL SGS models.


Figure 9Evolution of volumetrically averaged quantities in the numerical solution to the Taylor–Green vortex problem computed on 1283 and 1923 grids with the Vreman and SL SGS models.


Figure 9c also shows that the flow's enstrophy behaves as expected, with peak values at t*=9 coinciding with peaks in kinetic energy. The higher-resolution simulations are able to reach a higher enstrophy than the lower-resolution simulations as they are naturally able to resolve more vortical motion. On the other hand, the choice of SL or Vreman models seems to have very little impact on the ability to resolve more small-scale eddies for low resolutions. However, the SL SGS scheme leads to higher enstrophy than the Vreman SGS scheme, increasingly so as resolution increases.

Figure 10 shows the kinetic energy spectra obtained for the higher-resolution simulations used for this test. All simulations present peaks in their respective spectra at k=2 to 4, which persist even as the flow evolves over time. These peaks are explained by Drikakis et al. (2007) as being imprinted on the spectra by the initial velocity field. Furthermore, as the flow evolves, all spectra show a slope close to the theoretical k-5/3 of homogeneous turbulence and eventually decay towards a k−3 slope at higher wavenumbers. This behavior is consistent with the DNS results of Brachet et al. (1983), who showed, using DNS, that this transition occurs around k=60.

Figure 10Kinetic energy spectra obtained using (a) 1283 points and SL, (b) 1283 points and Vreman, (c) 1923 points and SL, and (d) 1923 points and Vreman.


5.7 Barbados Oceanographic and Meteorological Experiment (BOMEX)

BOMEX features a shallow-cumulus-topped boundary layer as described in Holland and Rasmusson (1973). The setup of this test follows Siebesma et al. (2003). The initial profiles are characterized by a well-mixed sub-cloud layer below 500 m, a cumulus layer between 500 and 1500 m, an inversion layer up to 2000 m, and a free troposphere above. Large-scale forcing includes prescribed large-scale subsidence, horizontal advective drying, radiative cooling, and pressure gradient effects. Sensible and latent heat fluxes at the surface are prescribed to SHF=n(ρJsfc)=9.5 W m−2 and LHF=n(ρDsfc)=147.2 W m−2. Additional detail on the application of boundary conditions is presented in Appendix A. The domain, Ω=6400×6400×3000 m3, is doubly periodic in the x and y directions. A Rayleigh sponge layer (see Appendix A2 for details) is applied along the z direction to damp upward-propagating gravity waves. On the bottom surface, a momentum drag forcing is applied (see Appendix A1). The effective horizontal and vertical resolutions are Δx=Δy=100 m and Δz=40 m, respectively. The simulation time is 6 h.

Figure 11 shows the vertical profiles of the domain mean thermodynamic and turbulence properties over the last hour of the simulations. Figure 12 shows the time series of liquid water paths (LWPs), cloud cover, and turbulence kinetic energy. The time-averaged results of the vertical profiles of θl,ql, cloud fraction, and qv=qt-ql during the last hour are in good agreement with the same quantities presented by Siebesma et al. (2003). The SGS model does not have much effect on the simulation characteristics, except that the Vreman SGS model produces a stronger peak in the variance of vertical velocity, ww, near the cloud top. Although the difference is mild, it is possibly due to the low dissipation nature of the Vreman's model. Excluding the first hour of flow spin-up, the results compare well with PyCLES (Pressel et al.2015) and fall within the ensemble range shown in Fig. 2 of Siebesma et al. (2003). PyCLES solves the anelastic conservation equations with entropy as the prognostic thermodynamic variable using weighted essentially non-oscillatory (WENO) numerics. Differences in the initial conditions (as seen in the initial LWP in Fig. 12) are likely contributing to the initial mismatch in horizontally averaged properties of moisture and turbulence. Details on the computation of horizontally averaged profiles can be found in Appendix B.

A large-domain simulation of BOMEX with effective horizontal resolution Δx=Δy=50 m and vertical resolution Δz=20 m in a 30×30×3.6 km3 domain was executed using 16 GPUs on the Google Cloud Platform. The simulation was executed using 1D IMEX time integration (1D implicit in the vertical direction and 2D explicit in the horizontal direction) at maximum horizontal advective Courant number C=0.9. A visualization of instantaneous shallow cumulus structures is shown in Fig. 13.

Figure 11BOMEX. Profile of the mean state of liquid potential temperature, total specific humidity, cloud fraction, liquid water specific humidity, and variance of the vertical velocity fluctuations averaged along the last hour of the simulation. The solutions with PyCLES and ClimateMachine were calculated with effective grid resolution Δx=Δy=100 m and Δz=40 m. Results calculated with Vreman (blue lines) and SL SGS (orange lines) models in ClimateMachine are compared against results of PyCLES (grey lines) in its paired SGS modality using the SL model.


Figure 12BOMEX. From left to right, time series of horizontally averaged LWP, cloud cover, and turbulence kinetic energy diagnosed from ClimateMachine using Vreman and SL. These results are consistent with ensemble results presented in Fig. 2 of the intercomparison study by Siebesma et al. (2003). Line colors are as in Fig. 11.


Figure 13BOMEX. Instantaneous visualization of the shallow cumulus structures on a 30km×30km×3.6 km domain. The white shading of the volume rendering of the cloud corresponds to a maximum value of ql=2.5×10-4 kg kg−1; a maximum qt=1.8×10-2 kg kg−1 is shown on the bottom surface in light yellow shading. The simulation was executed using 16 GPUs on the Google Cloud Platform.


6 Verification of conservation property of prognostic variables

We respectively define the time-dependent normalized total mass and energy changes as

(66) Δ M ( t ) = Ω ρ ( t ) - ρ ( t 0 ) d Ω Ω ρ ( t 0 ) d Ω


(67) Δ ρ e tot ( t ) = Ω ρ e tot ( t ) - ρ e tot ( t 0 ) d Ω Ω ρ e tot ( t 0 ) d Ω ,

where t0 indicates the initial time and Ω is the full domain. Figure 14 shows ΔM(t) and Δρetot(t) for a 1 h simulation of a moist rising thermal bubble. The SL eddy viscosity model was used to represent under-resolved diffusive fluxes. This simulation was run using free-slip boundary conditions with adiabatic walls. We note that the loss of energy and mass in the system is contained to 𝒪(10−15): that is, numerical roundoff error. This result highlights a key benefit of the general formulation of the prognostic conservation equations in flux form, which guarantees conservation properties up to source or sink contributions.

Figure 14Time evolution of the mass and total energy loss (relative change when compared against initial conditions) for a moist thermal bubble simulation. Blue line: relative change in energy; orange line: relative change in mass.


7 CPU strong scaling

Demonstration of favorable scaling capabilities across multiple hardware types is critical to the utility of ClimateMachine as a competitive tool for large-eddy simulations. Toward this, we first examine strong scaling on CPU architectures. The rising thermal bubble problem described in Sect. 5.2 is used as the test problem, with its domain extents modified to form an 8.192 km3 cube, with an effective nodal resolution of 32 m to ensure that CPU memory on a single rank is maximally loaded. For tests with multiple MPI ranks, each rank resides on a unique node to ensure communication overhead is appropriately represented; in practice, one would expect to use multiple ranks per node. Scaling across multiple threads is not assessed in the present work. Figure 15 shows the speed-up in time to solution for 10 time integration steps of the test problem with Nranks{1,2,4,8,16,32} for both dry and moist simulations. We exclude checkpoint, diagnostic, and periodic runtime output steps from time-to-solution measurements. In both dry and moist simulations, we see a speed-up of approximately 19.7 when using 32 ranks compared with the corresponding single-rank simulation.

A single-rank GPU run of the test problem on a 6.144 km3 domain with 32 m effective resolution (restricted by GPU memory capacity) has a wall-clock time for 10 integration steps of 314 s. The wall-clock time for a 32-rank CPU run was 449 s for a problem 2.37 times larger.

This provides an estimate for a comparison between CPU and GPU hardware performance. However, the balance between memory bandwidth limits and computing operation limits guides the maximum scaling possible on the GPU hardware relative to its CPU counterpart, so this cannot be interpreted as a direct comparison across hardware types. Based on the present results, we conclude that it is more feasible to pursue strong scaling improvements on CPU hardware than on GPU hardware. Further optimization and exploration of scaling in ClimateMachine are ongoing work. Additional details on the hardware used for scaling tests can be found in Appendix C.

Figure 15Speed-up of the time integration (solver) step relative to the time to solution for a single-rank simulation of the rising thermal bubble problem in an 8.192 km3 domain with a uniform effective resolution of 32 m (with fourth-order polynomials). Blue circles: dry simulation; orange squares: moist simulation.


8 GPU weak scaling

To test the multi-GPU scalability of ClimateMachine, we first execute a BOMEX setup that is sufficiently large to saturate one GPU. The single-GPU execution represents the baseline from which we calculate the average time per time step denoted by t1. We then expand the domain size to match an increase in the number of GPUs and measure the average time per time step. Our scaling is then obtained as the ratio t1/tn, with tn being the average time per time step obtained with n GPUs. The results are obtained using up to 16 NVIDIA Tesla V100 GPUs running Julia version 1.4.2, CUDA 10.0, and CUDA-aware OpenMPI 4.0.3. Figure 16 shows excellent weak scaling for up to 16 GPUs on Google Cloud Platform resources. Over 95 % weak scaling was achieved with 1D IMEX time integration and over 98 % for the simulation with explicit time stepping. This is an encouraging result and supports the ability to prototype smaller problem setups and deploy larger simulations in the ClimateMachine limited-area configuration at identical resolutions without significantly compromising the time to solution.

Figure 16BOMEX weak scaling using 1D IMEX versus fully explicit time integration.


9 Conclusions

This paper introduced and assessed the LES configuration of ClimateMachine, a new Julia-language simulation framework designed for parallel CPU and GPU architectures. Notable features of this LES framework are the following:

  • conservative flux form model equations for mass, momentum, total energy, and total moisture to ensure global conservation of dynamical variables of interest (up to nonconservative source or sink processes);

  • discontinuous Galerkin discretization with element-wise evaluation of the approximations to volume and interface integrals, resulting in reduced time to solution due to MPI operations;

  • application of model equations to the solution of benchmark problems in typical LES codes, including atmospheric flows in the shallow cumulus regime (BOMEX); and

  • demonstration of strong scaling on CPUs with up to 32 MPI ranks (speed-up of 19.7 in time to solution) and weak scaling with up to 16 GPUs (95 %–98 %) in both dry and moist simulation configurations.

Appendix A: Boundary conditions

A1 Solid walls and wall fluxes

A1.1 Momentum

Rigid surfaces are considered impenetrable such that the wall-normal component of the velocity vanishes at rigid boundaries by imposing un=0. The viscous sublayer is not explicitly resolved, and a momentum sink is applied to model the effect of wall-shear stresses. While the wall-normal advective momentum flux vanishes, the wall-normal viscous or SGS momentum flux, also known as the bulk surface stress (units of Pa),

(A1) n ( ρ τ ) = - n 2 ρ ν t S ( u p ) ,

is not necessarily negligible. Here, S(up)=(up+upT)/2 is the strain rate tensor of the near-surface wall-parallel velocity, up. We note that, throughout Appendix A, n refers to the inward-pointing normal vector at domain boundaries, distinct from the prior definition of the element-interface normal vector in Sect. 3.

In the case of free-slip conditions at a solid surface (indicated by subscript “sfc”), there is no viscous or SGS momentum transfer between the atmosphere and the surface, such that


Because the momentum flux tensor depends linearly on velocity derivatives, this amounts to homogeneous Neumann boundary conditions on velocity components parallel to the surface. On the other hand, viscous drag is imposed by the classical aerodynamic drag law,

(A2) n ( ρ τ ) = - ρ C d , int u p , int u p , int ,

where the quantities with subscript int are evaluated at an interior point xint adjacent to the surface. The drag coefficient,


can depend parametrically on state variables Y(x,t) and on the position xint of the interior point relative to the surface. In the present implementation, the plane of interior points relevant to boundary flux evaluation is interpreted as the first layer of interior nodes in the surface-adjacent elements. The drag law boundary condition amounts to inhomogeneous Neumann boundary conditions on velocity components parallel to the surface.

A1.2 Specific humidity

As for momentum, the advective specific humidity fluxes normal to a rigid surface vanish, but the diffusive or SGS specific humidity fluxes normal to the surface may not vanish. Normal components of SGS fluxes of condensate, ql, are generally set to zero at boundaries,

(A3) n ( ρ d q l ) | sfc = 0 ,

implying homogeneous Neumann boundary conditions (nql=0) on the condensate specific humidities. The total SGS specific humidity flux then reduces to the vapor flux at the surface. SGS turbulent deposition of condensate (fog) on the surface can in principle occur; representing this would require nonzero condensate fluxes at the surface. With the assumption of zero condensate boundary fluxes, we have

(A4) n ( ρ d q t ) | sfc = n ( ρ d q v ) | sfc ,

where evaporation (measured in kg m−2 s−1), or condensation if negative, is given by

(A5) E = n ( ρ d q v ) | sfc = - n ( ρ D t q t ) | sfc .

Evaporation can be zero (water impermeable) or it can be given as a function of Y at the surface according to

(A6) E = n ( ρ d q v ) | sfc = E ( Y ; x sfc , t ) ,

which, numerically, translates into an inhomogeneous Neumann boundary condition on the vapor specific humidity.

A1.3 Energy

As for momentum and humidity, the advective energy fluxes normal to a rigid surface vanish, but the diffusive or SGS flux of total enthalpy, htot, normal to the surface (units of W m−2),

(A7) n ρ ( J + D ) = - n ( ρ D t h tot ) ,

may not vanish. Because the kinetic energy contribution to the total enthalpy flux near a surface is usually 3–4 orders of magnitude smaller than the thermal and potential energy components, it is generally neglected so that the total enthalpy flux J+D reduces to a flux of moist static energy MSE=h+Φ. The surface can be insulating, in which case the SGS transfer of total enthalpy between the atmosphere and the surface is zero:

(A8) n ρ ( J + D ) sfc = 0 ,

which, from a numerical point of view, translates to a homogeneous Neumann condition on the total enthalpy (nhtot=0) or, by neglecting kinetic energy, on MSE such that (nMSE=0). With known MSE, the moist static energy flux (MSEF) at the surface is given by

(A9) n ρ ( J + D ) sfc = MSEF ( Y ; x sfc , t ) = - ρ C h , int u p , int ( MSE int - MSE sfc ) .

with Ch,int the thermal exchange coefficient.

The value of ρn(J+D) can also be assigned by the summation of given LHF and SHF, as done in the case of BOMEX described in Sect. 5.

A2 Non-reflecting top boundary

To prevent the reflection of fast, upward-propagating gravity waves at the top boundary, a Rayleigh damping sponge is added to the right-hand side of the momentum equation (see Sect. 5). The damping in the momentum equations takes the form

(A10) ρ u t = - τ s ρ ( u - u relax ) ,

where urelax is a specified background velocity to which the flow is relaxed within the absorbing layer, with a characteristic relaxation coefficient τs. Of the many alternative options known for τs (e.g., Durran and Klemp1983), the default in ClimateMachine is

(A11) τ s = α sin γ 1 2 z - z s z top - z s  for  z > z s ,

where the absorbing sponge layer starts at z=zs, γ is a positive even power typically set to 2, and α>0 is a relaxation coefficient typically of order 𝒪(1 s−1).

A3 Numerical implementation

For the boundary velocity corresponding to the impenetrable wall condition, we use the following reflecting condition:


The no-slip condition follows from Eq. (A12) by setting all components of ubc=0. Boundary conditions on a scalar χ are similarly specified as follows:

(A14) χ + = 2 χ bc - χ - .

Nonzero mass flux boundary conditions can be imposed at penetrable or free surfaces by applying the transmissive boundary condition:

(A15) u + = u - .

Diffusive fluxes are applied by a direct specification of the wall-normal fluxes, and over-specified boundary conditions are avoided by using only the interior () gradients and first-order fluxes.

Appendix B: Statistics

Since the flow is compressible, we use density-weighted Favre averages following Canuto (1997) when computing horizontally averaged statistics. For a scalar ϕ, the density-weighted average ϕ¯ at a given height level z is defined by

(B1) ϕ ¯ = ρ ϕ ρ ,

where 〈⋅〉 denotes a horizontal mean. All calculations of horizontal statistics are done on the DG nodal mesh to avoid introducing interpolation errors (Yamaguchi and Feingold2012), with metric terms accounted for in the descriptions of diagnostic variables. The density-weighted vertical eddy flux for a variable ϕ is defined by

(B2) w ϕ = ρ w ϕ ρ ,


(B3) ϕ = ϕ - ϕ ¯

denotes the deviation from the density-weighted mean. The variance can then be defined as

(B4) ϕ 2 = ρ ϕ 2 ρ .
Appendix C: Hardware

This section summarizes the hardware characteristics for the primary computing resources used in tests throughout this paper. This is particularly relevant to the data presented in Sect. 7. Compute nodes for the CPU tests were 14-core Intel Xeon (2.4 GHz), with a maximum memory capacity of 1.5 TB. GPU nodes on this cluster were of 14-core Intel Broadwell (2.4 GHz) type with 28 cores per node and 256 GB of memory per node. Compute nodes on the Google Cloud Platform leverage Tesla V100 GPUs available for general-purpose use.

Appendix D: Supplementary results

This section provides additional information on the comparison of the density current benchmark in ClimateMachine with existing literature references in Table D1.

(Giraldo and Restelli2008)(Giraldo and Restelli2008)(Marras et al.2015)(Marras et al.2015)(Marras et al.2015)(Marras et al.2015)(Marras et al.2015)(Marras et al.2015)(Marras et al.2015)(Marras et al.2013)(Marras et al.2013)(Marras et al.2013)(Marras et al.2013)(Ahmad and Lindeman2007)(Straka et al.1993)

Table D1Summary of frontal locations for the density current test case from existing literature. Results tabulated are of the front location at t=900 s. The results are reported for the following models: ClimateMachine with SL and Vreman, finite-element model (FEM) variational multiscale stabilization (VMS), f-wave, filtered spectral elements (SEs), filtered discontinuous Galerkin (DG), and the piecewise parabolic method (PPM). NUMA is the Nonhydrostatic Unified Model of the Atmosphere.

Download Print Version | Download XLSX

Code availability

ClimateMachine is an open-source framework that is maintained on GitHub: (last access: 20 June 2022). Documentation for installing and running ClimateMachine is available at (last access: 20 June 2022). The version used in this paper is v0.2.0, which can be downloaded from (Climate Modeling Alliance2020) or (last access: 20 June 2022). The parameters included in Table 1 are available in the accompanying CLIMAParameters.jl Julia package. The code is released under the Apache License, version 2.0

Data availability

The test cases presented in the paper are available in the following directory locations in the source code.

  • Sect. 5.1: /test/Numerics/DGMethods/Euler/isentropicvortex.jl

  • Sect. 5.2: /tutorials/Atmos/risingbubble.jl

  • Sect. 5.3: /tutorials/Atmos/densitycurrent.jl

  • Sect. 5.4: /experiments/AtmosLES/schar_scalar_advection.jl

  • Sect. 5.5: /experiments/AtmosLES/agnesi_hs_lin.jl and  /experiments/AtmosLES/agnesi_nh_lin.jl

  • Sect. 5.6: /experiments/AtmosLES/taylor_green.jl

  • Sect. 5.7: /experiments/AtmosLES/bomex_model.jl and  /experiments/AtmosLES/bomex_les.jl

Configuration instructions and methods for resolution, numerical flux schemes, boundary conditions, and turbulence model specification can be found in the code documentation at (Climate Modeling Alliance2020). Instructions for installing and launching simulations can be accessed at (last access: 20 June 2022).

Author contributions

AS contributed to analysis, methodology, software, and writing – review and editing. YT contributed to analysis, visualization, software, and writing – review and editing. SM contributed to conceptualization, methodology, software, and writing – original draft preparation, review, and editing. ZS contributed to software, analysis, visualization, and writing – review and editing. CK developed software. SB developed software. KP developed software and contributed to the analysis. MW was responsible for methodology and software. THG contributed to methodology, software, and writing – review and editing. JEK contributed to conceptualization, methodology, and software. VC developed software. LCW contributed to conceptualization, methodology, and software. FXG contributed to conceptualization, methodology, software, and writing – review and editing. TS contributed to conceptualization, methodology, software, project administration, and writing – original draft preparation, review, and editing.

Competing interests

At least one of the co-authors is a member of the editorial board of Geoscientific Model Development. The peer-review process was guided by an independent editor, and the authors have no other competing interests to declare.


Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


The computations presented here were conducted at the Resnick High-Performance Computing Center, a facility supported by the Resnick Sustainability Institute at the California Institute of Technology (formerly known as the Central HPC Cluster, with partial support by a grant from the Gordon and Betty Moore Foundation), and on the Google Cloud Platform with in-kind support by Google. We thank the Google team for their assistance with operations on the Google Cloud Platform.

Financial support

This research was made possible by the generosity of Eric and Wendy Schmidt by recommendation of the Schmidt Futures program and by the Paul G. Allen Family Foundation, Charles Trimble, the Audi Environmental Foundation, the Heising-Simons Foundation, and the National Science Foundation (grants AGS-1835860 and AGS-1835881). Additionally, Valentin Churavy was supported by the Defense Advanced Research Projects Agency (DARPA, agreement HR0011-20-9-0016) and by the NSF (grant OAC-1835443). Part of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration.

Review statement

This paper was edited by Travis O'Brien and reviewed by two anonymous referees.


Abdi, D. S., Giraldo, F. X., Constantinescu, E., Lester III, C., Wilcox, L., and Warburton, T.: Acceleration of the Implicit-Explicit Non-Hydrostatic Unified Model of the Atmosphere (NUMA) on Manycore Processors, Int. J. High Perform. C., 33, 242–267,, 2017a. a

Abdi, D. S., Wilcox, L. C., Warburton, T. C., and Giraldo, F. X.: A GPU-accelerated continuous and discontinuous Galerkin non-hydrostatic atmospheric model, Int. J. High Perform. C., 33, 81–109,, 2017b. a, b, c, d, e

Ahmad, N. and Lindeman, J.: Euler solutions using flux-based wave decomposition, Int. J. Numer. Meth. Fl., 54, 47–72,, 2007. a, b, c

Balaji, V.: Climbing down Charney's ladder: machine learning and the post-Dennard era of computational climate science, Philos. T. Roy. Soc. A, 379, 20200085,, 2021. a

Bao, L., Klöfkorn, R., and Nair, R. D.: Horizontally Explicit and Vertically Implicit (HEVI) Time Discretization Scheme for a Discontinuous Galerkin Nonhydrostatic Model, Mon. Weather Rev., 143, 972–990,, 2015. a

Bassi, F. and Rebay, S.: A high-order discontinuous Galerkin finite element method solution of the 2d Euler equations, J. Comput. Phys., 138, 251–285,, 1997. a, b

Bezanson, J., Edelman, A., Karpinski, S., and Shah, V. B.: Julia: A Fresh Approach to Numerical Computing, SIAM Rev., 59, 65–98,, 2017. a

Bezanson, J., Chen, J., Chung, B., Karpinski, S., Shah, V. B., Vitek, J., and Zoubritzky, L.: Julia: Dynamism and Performance Reconciled by Design, Proc. ACM Program. Lang., 2, 1–23,, 2018. a

Bott, A.: Theoretical considerations on the mass and energy consistent treatment of precipitation in cloudy atmospheres, Atmos. Res., 89, 252–269, 2008. a, b

Boyd, J. P.: The erfc-log filter and the asymptotics of the Euler and Vandeven sequence accelerations, edited by: Ilin, A. V. and Scott, L. R., Proceedings of the Third International Conference on Spectral and High Order Methods, Houston Journal of Mathematics, 267–276, 1996. a

Brachet, M. E.: Direct simulation of three-dimensional turbulence in the Taylor-Green vortex, Fluid Dyn. Res., 8, 1–8,, 1991. a

Brachet, M. E., Meiron, D. I., Orszag, A., Nickel, B. G., Morf, R. H., and Frisch, U.: Small-scale structure of the Taylor-Green vortex, J. Fluid Mech., 130, 411–452,, 1983. a, b

Canuto, V. M.: Compressible turbulence, Astrophys. J., 482, 827–851,, 1997. a

Carpenter, M. H. and Kennedy, C. A.: Fourth-order 2N-storage Runge-Kutta schemes, Tech. Rep. NASA TM-109112, National Aeronautics and Space Administration, Langley Research Center, Hampton, VA, 1994. a

Chow, F. K. and Moin, P.: A further study of numerical errors in large-eddy simulations, J. Comput. Phys., 184, 366–380,, 2003. a

Climate Modeling Alliance: ClimateMachine.jl (0.2.0), Zenodo [code],, 2020. a, b

Deardorff, J. W.: A numerical study of three-dimensional turbulent channel flow at large Reynolds numbers, J. Fluid Mech., 41, 452–480,, 1970. a, b

Deardorff, J. W.: Three-dimensional numerical study of the height and mean structure of a heated planetary boundary layer, Bound. Lay. Meteorol., 7, 81–106,, 1974. a

Deardorff, J. W.: Usefulness of liquid-water potential temperature in a shallow-cloud model, J. Appl. Meteorol., 15, 98–102,<0098:UOLWPT>2.0.CO;2, 1976. a

Deardorff, J. W.: Stratocumulus-capped mixed layers derived from a three-dimensional model, Bound. Lay. Meteorol., 18, 495–527,, 1980. a, b, c

Deville, M. O., Fischer, P. F., and Mund, E. H.: High-order methods for incompressible fluid flow, Cambridge University Press,, 2002. a

Dipankar, A., Stevens, B., Heinze, R., Moseley, C., Zängl, G., Giorgetta, M., and Brdar, S.: Large eddy simulation using the general circulation model ICON, J. Adv. Model. Earth Sy., 7, 963–986,, 2015. a

Drikakis, D., Fureby, C., and Youngs, F.: Simulation of transition and turbulence decay in the Taylor–Green vortex, J. Turbulence, 8, 1–12,, 2007. a

Durran, D. and Klemp, J.: A compressible model for the simulation of moist mountain waves, Mon. Weather Rev., 111, 2341–2361,<2341:ACMFTS>2.0.CO;2, 1983. a

Toro, E. F., Spruce, M., and Speares, W.: Restoration of the Contact Surface in the HLL–Riemann Solver, Shock Waves, 4, 25–34,, 1994. a

Fuhrer, O., Osuna, C., Lapillonne, X., Gysi, T., Cumming, B., Bianco, M., Arteaga, A., and Schulthess, T. C.: Towards a performance portable, architecture agnostic implementation strategy for weather and climate models, Supercomput. Front. Inn., 1, 45–62,, 2014. a

Fuhrer, O., Chadha, T., Hoefler, T., Kwasniewski, G., Lapillonne, X., Leutwyler, D., Lüthi, D., Osuna, C., Schär, C., Schulthess, T. C., and Vogt, H.: Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0, Geosci. Model Dev., 11, 1665–1681,, 2018. a

Gal-Chen, T. and Somerville, R.: Numerical solution of the Navier-Stokes equations with topography, J. Comput. Phys., 17, 276–310,, 1975. a

Ghosal, S.: An Analysis of Numerical Errors in Large-Eddy Simulations of Turbulence, J. Comput. Phys., 125, 187–206,, 1996. a

Giraldo, F. X.: An Introduction to Element-based Galerkin Methods on Tensor-Product Bases: Analysis, Algorithms, and Applications, Springer,, 2020. a

Giraldo, F. X. and Restelli, M.: A study of spectral element and discontinuous Galerkin methods for the Navier-Stokes equations in nonhydrostatic mesoscale atmospheric modeling: Equation sets and test cases, J. Comput. Phys., 227, 3849–3877,, 2008. a, b, c, d, e, f, g, h

Giraldo, F. X., Hesthaven, J. S., and Warburton, T.: Nodal high-order discontinuous Galerkin methods for spherical shallow water equations, J. Comput. Phys., 181, 499–525,, 2002. a

Giraldo, F. X., Kelly, J. F., and Constantinescu, E. M.: Implicit-explicit formulations of a three-dimensional nonhydrostatic unified model of the atmosphere (NUMA), SIAM J. Sci. Comput., 35, B1162–B1194,, 2013. a

Harten, A.: High resolution schemes for hyperbolic conservation laws, J. Comput. Phys., 49, 357–393,, 1983. a

Hesthaven, J. and Warburton, T.: Nodal discontinuous Galerkin method, Algorithms, analysis and applications, Springer,, 2008a. a

Hesthaven, J. S. and Warburton, T.: Nodal discontinuous Galerkin methods: algorithms, analysis, and applications, vol. 54, Springer-Verlag New York Inc,, 2008b. a, b

Holland, J. Z. and Rasmusson, E. M.: Measurements of the atmospheric mass, energy, and momentum budgets over a 500-kilometer square of tropical ocean, Mon. Weather Rev, 101, 44–57,<0044:MOTAME>2.3.CO;2, 1973. a

Hunt, J. C. R., Wray, A., and Moin, P.: Eddies, stream, and convergence zones in turbulent flows, Tech. Rep. CTR-S88, Center for Turbulence Research Report CTR-S88, Stanford University, 1988. a

Jähn, M., Knoth, O., König, M., and Vogelsberg, U.: ASAM v2.7: a compressible atmospheric model with a Cartesian cut cell approach, Geosci. Model Dev., 8, 317–340,, 2015. a

Karniadakis, G. and Sherwin, S.: Spectral/hp element methods for CFD, Oxford University Press,, 1999. a

Kelly, J. F. and Giraldo, F. X.: Continuous and discontinuous Galerkin methods for a scalable three-dimensional nonhydrostatic atmospheric model: limited-area mode, J. Comput. Phys., 231, 7988–8008,, 2012. a, b

Kennedy, C. A. and Carpenter, M. H.: Higher-order additive Runge–Kutta schemes for ordinary differential equations, Appl. Numer. Math., 136, 183–205,, 2019. a

Kopriva, D. A.: Implementing spectral methods for partial differential equations: Algorithms for scientists and engineers, Springer Science & Business Media,, 2009. a

Kurowski, M., Grabowski, W. W., and Smolarkiewicz, P. K.: Anelastic and compressible simulation of moist deep convection, J. Atmos. Sci., 71, 3767–3787,, 2014. a

Light, D. and Durran, D.: Preserving Nonnegativity in Discontinuous Galerkin Approximations to Scalar Transport via Truncation and Mass Aware Rescaling (TMAR), Mon. Weather Rev., 144, 4771–4786,, 2016. a

Lilly, D. K.: On the numerical simulation of buoyant convection, Tellus, 14, 148–172,, 1962. a, b

Lilly, D. K.: On the application of the eddy viscosity concept in the inertial sub-range of turbulence, NCAR manuscript, 123,, 1966. a

Lin, W.-C. and McIntosh-Smith, S.: Comparing Julia to Performance Portable Parallel Programming Models for HPC, in: 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), 94–105,, 2021. a

Marras, S. and Giraldo, F. X.: A parameter-free dynamic alternative to hyper-viscosity for coupled transport equations: application to the simulation of 3D squall lines using spectral elements, J. Comput. Phys., 283, 360–373,, 2015. a

Marras, S., Kelly, J. F., Giraldo, F. X., and Vázquez, M.: Variational multiscale stabilization of high-order spectral elements for the advection-diffusion equation, J. Comput. Phys., 231, 7187–7213, 2012. a

Marras, S., Moragues, M., Vázquez, M., Jorba, O., and Houzeaux, G.: A Variational Multiscale Stabilized Finite Element Method for the Solution of the Euler Equations of Nonhydrostatic Stratified Flows, J. Comput. Phys., 236, 380–407, 2013. a, b, c, d

Marras, S., Kelly, J. F., Moragues, M., Müller, A., Kopera, M. A., Vázquez, M., Giraldo, F. X., Houzeaux, G., and Jorba, O.: A Variational Multiscale Stabilized Finite Element Method for the Solution of the Euler Equations of Nonhydrostatic Stratified Flows, Arch. Comput. Methods Eng., 23, 673–722,, 2015. a

Marras, S., Nazarov, M., and Giraldo, F. X.: Stabilized high-order Galerkin methods based on a parameter-free dynamic SGS model for LES, J. Comput. Phys., 301, 77–101,, 2015. a, b, c, d, e, f, g, h, i, j

Mason, P. J. and Callen, N. S.: On the magnitude of the subgrid-scale eddy coefficient in large-eddy simulations of turbulent channel flow, J. Fluid Mech., 162, 439–462,, 1986. a

Matheou, G.: Numerical discretization and subgrid-scale model effects on large-eddy simulations of a stable boundary layer, Q. J. Roy. Meteor. Soc., 142, 3050–3062, 2016. a

Matheou, G. and Teixeira, J.: Sensitivity to Physical and Numerical Aspects of Large-Eddy Simulation of Stratocumulus, Mon. Weather Rev., 147, 2621–2639, 2019. a

Matheou, G., Chung, D., Nuijens, L., Stevens, B., and Teixeira, J.: On the fidelity of large-eddy simulation of shallow precipitating cumulus convection, Mon. Weather. Rev., 139, 2918–2939,, 2011. a

Mellado, J.: Cloud-Top Entrainment in Stratocumulus Clouds, Annu. Rev. Fluid Mech., 49, 145–169, 2017. a

Mellado, J. P., Bretherton, C. S., Stevens, B., and Wyant, M. C.: DNS and LES for Simulating Stratocumulus: Better Together, J. Adv. Model. Earth Sy., 10, 1421–1438,, 2018. a

Moeng, C., McWilliams, J., Rotunno, R., Sullivan, P., and Weil, J.: Investigating 2D modelling of atmospheric convection in the PBL, J. Atmos. Sci., 61, 889–903, 2003. a

Moeng, C. H.: A Large-Eddy simulation model for the study of planetary boundary-layer turbulence, J. Atmos. Sci., 41, 2052–2062, 1984. a

Moeng, C. H. and Wyngaard, J. C.: Spectral analysis of large-eddy simulations of the convective boundary layer, J. Atmos. Sci., 45, 3573–3587, 1988. a

Müller, A., Kopera, M., Marras, S., Wilcox, L., Isaac, T., and Giraldo, F.: Strong scaling for numerical weather prediction at petascale with the atmospheric model NUMA, Int. J. High Perform. C., 33, 411–426,, 2018. a

Niegemann, J., Diehl, R., and Busch, K.: Efficient low-storage Runge–Kutta schemes with optimized stability regions, J. Comput. Phys., 231, 364–372,, 2012. a, b

Palmer, T.: Climate forecasting: build high-resolution global climate models, Nature, 515, 338–339,, 2014. a

Pressel, K. G., Kaul, C. M., Schneider, T., Tan, Z., and Mishra, S.: Large-eddy simulation in an anelastic framework with closed water and entropy balances, J. Adv. Model. Earth Sy., 7, 1425–1456,, 2015. a, b, c, d

Pressel, K. G., Mishra, S., Schneider, T., Kaul, C. M., and Tan, Z.: Numerics and Subgrid-Scale Modeling in Large Eddy Simulations of Stratocumulus Clouds, J. Adv. Model. Earth Sy., 9, 1342–1365,, 2017. a

Raymond, D. J.: Sources and sinks of entropy in the atmosphere, J. Adv. Model. Earth Sy., 5, 755–763, 2013. a

Reddy, S., Tissaoui, Y., De Bragan ça Alves, F., Marras, S., and Giraldo, F.: Comparison of Sub-Grid Scale Models for Large-Eddy Simulation Using a High-Order Spectral Element Approximation of the Compressible Navier-Stokes Equations at Low Mach Number, J. Comput. Phys.,, in review, 2022. a

Roe, P.: Approximate Riemann Solvers, Parameter Vectors, and Difference Schemes, J. Comput. Phys., 43, 357–372,, 1981. a

Romps, D. M.: The dry-entropy budget of a moist atmosphere, J. Atmos. Sci., 65, 3779–3799,, 2008. a, b, c, d

Rusanov, V.: Calculation of Interaction of Non–Steady Shock Waves with obstacles, USSR Comp. Math. Math., 1, 267–279,, 1961. a

Savic-Jovcic, V. and Stevens, B.: The structure and mesoscale organization of precipitating stratocumulus, J. Atmos. Sci., 65, 1587–1605,, 2008. a

Schalkwijk, J., Griffith, E., Post, H., and Jonker, H. J. J.: High performance simulations of turbulent clouds on a desktop PC: Exploiting the GPU, B. Am. Meteorol. Soc., 93, 307–314,, 2012. a, b

Schalkwijk, J., Jonker, H., Siebesma, A., and Bosveld, F.: A year-long Large-Eddy Simulation of the weather over Cabauw: an overview, Mon. Weather Rev., 143, 828–844,, 2015. a, b

Schär, C., Leuenberger, D., Fuhrer, O., Luthic, D., and Girard, C.: A new terrain-following vertical coordinate formulation for atmospheric prediction models, Mon. Weather Rev., 130, 2459–2480,<2459:ANTFVC>2.0.CO;2, 2002. a, b, c

Schär, C., Fuhrer, O., Arteaga, A., Ban, N., Charpilloz, C., Di Girolamo, S., Hentgen, L., Hoefler, T., Lapillonne, X., Leutwyler, D., Osterried, K., Panosetti, D., Rüdisühli, S., Schlemmer, L., Schulthess, T. C., Sprenger, M., Ubbiali, S., and Wernli, H.: Kilometer-Scale Climate Models: Prospects and Challenges, B. Am. Meteorol. Soc., 101, E567–E587,, 2020. a

Schneider, T., Kaul, C., and Pressel, K.: Possible climate transitions from breakup of stratocumulus decks under greenhouse warming, Nat. Geosci., 12, 163–167,, 2019. a

Shi, X., Chow, F. K., Street, R. L., and Bryan, G. H.: An Evaluation of LES Turbulence Models for Scalar Mixing in the Stratocumulus-Capped Boundary Layer, J. Atmos. Sci., 75, 1499–1507,, 2018. a

Shu, C.-W. and Osher, S.: Efficient implementation of essentially non-oscillatory shock-capturing schemes, J. Comput. Phys., 77, 439–471,, 1988. a

Siebesma, A. P., Bretherton, C. S., Brown, A., Chlond, A., Cuxart, J., Duynkerke, P. G., Jiang, H., Khairoutdinov, M., Lewellen, D., Moeng, C.-H., Sanchez, E., Stevens, B., and Stevens, D. E.: A large eddy simulation intercomparison study of shallow cumulus convection, J. Atmos. Sci., 60, 1201–1219,<1201:ALESIS>2.0.CO;2, 2003. a, b, c, d, e

Smagorinsky, J.: General Circulation Experiments with the Primitive Equations: I. The basic experiement, Mon. Weather Rev., 91, 99–164,<0099:GCEWTP>2.3.CO;2, 1963. a, b

Smith, R.: Linear theory of stratified hydrostatic flow past an isolated mountain, Tellus, 32, 348–364,, 1980. a

Smith, R. B.: The influence of mountains on the atmosphere, Adv. Geophys., 21, 87–230,, 1979. a

Stevens, B., Lenschow, D. H., Vali, G., Gerber, H., Bandy, A., Blomquist, B., Brenguier, J.-L., Bretherton, C. S., Burnet, F., Campos, T., Chai, S., Faloona, I., Friesen, D., Haimov, S., Laursen, K., Lilly, D. K., Loehrer, S. M., Malinowski, S. P., Morley, B., Petters, M. D., Rogers, D. C., Russell, L., Savic-Jovcic, V., Snider, J. R., Straub, D., Szumowski, M. J., Takagi, H., Thornton, D. C., Tschudi, M., Twohy, C., Wetzel, M., and van Zanten, M. C.: Dynamics and chemistry of marine stratocumulus–DYCOMS-II, B. Am. Meteorol. Soc., 84, 579–593,, 2003.  a

Stevens, B., Moeng, C.-H., Ackerman, A. S., Bretherton, C. S., Chlond, A., de Roode, S., Edwards, J., Golaz, J.-C., Jiang, H., Khairoutdinov, M., Kirkpatrick, M. O., Lewellen, D. C., Lock, A., Müller, F., Stevens, D. E., Whelan, E., and Zhu, P.: Evaluation of Large-Eddy Simulations via Observations of Nocturnal Marine Stratocumulus, Mon. Weather Rev., 133, 1443–1462,, 2005. a, b, c, d

Straka, J., Wilhelmson, R., Wicker, L., Anderson, J., and Droegemeier, K.: Numerical solution of a nonlinear density current: a benchmark solution and comparisons, Int. J. Numer. Meth. Fl., 17, 1–22,, 1993. a, b, c

Sullivan, P., McWilliams, J., and Moeng, C.: A subgrid-scale model for large-eddy simulation of planetary boundary-layer flows, Bound.-Lay. Meteorol., 71, 247–276,, 1994. a

Tao, W.-K., Simpson, J., and McCumber, M.: An Ice-Water Saturation Adjustment, Mon. Weather Rev., 117, 231–235, 1989. a

Vandeven, H.: Family of spectral filters for discontinuous problems, J. Sci. Comput., 6, 159–192,, 1991. a

Vreman, A.: An eddy-viscosity subgrid-scale model for turbulent shear flow: algebraic theory and applications, Phys. Fluid., 16, 3670–3681,, 2004. a

Yamaguchi, T. and Feingold, G.: Technical note: Large-eddy simulation of cloudy boundary layer with the Advanced Research WRF model, J. Adv. Model. Earth Sy., 4, M09003,, 2012. a

Yu, M. L., Giraldo, F. X., Peng, M., and Wang, Z. J.: Localized Artificial Viscosity Stabilization of Discontinuous Galerkin Methods for Nonhydrostatic Mesoscale Atmospheric Modeling, Mon. Weather Rev., 143, 4823–4845,, 2015. a

Short summary
ClimateMachine is a new open-source Julia-language atmospheric modeling code. We describe its limited-area configuration and the model equations, and we demonstrate applicability through benchmark problems, including atmospheric flow in the shallow cumulus regime. We show that the discontinuous Galerkin numerics and model equations allow global conservation of key variables (up to sources and sinks). We assess CPU strong scaling and GPU weak scaling to show its suitability for large simulations.