PortUrb: a performance portable, high-order, moist  atmospheric large eddy simulation model with  variable-friction immersed boundaries

Norman, Matthew; Gopalakrishnan Meena, Muralikrishnan; Gottiparthi, Kalyan; Koukpaizan, Nicholson; Nichols, Stephen

doi:10.5194/gmd-18-9605-2025

Articles | Volume 18, issue 23

https://doi.org/10.5194/gmd-18-9605-2025

Articles | Volume 18, issue 23

Model description paper

04 Dec 2025

Model description paper |

| 04 Dec 2025

PortUrb: a performance portable, high-order, moist atmospheric large eddy simulation model with variable-friction immersed boundaries

Matthew Norman, Muralikrishnan Gopalakrishnan Meena, Kalyan Gottiparthi, Nicholson Koukpaizan, and Stephen Nichols

Abstract

This paper introduces “portUrb”: a moist, compressible, non-hydrostatic atmospheric Large Eddy Simulation model that aims for portability, performance, accuracy, simplicity, readability, robustness, extensibility, and ensemble capabilities. Additionally, there is an emphasis on free-slip immersed boundaries with surface friction to account for urban building geometries. Coded in portable C $+ +$ with high-order Weighted Essentially Non-Oscillatory (WENO) numerics, this study investigates the behavior of portUrb under atmospheric boundary layer, supercell, and urban scenarios. PortUrb matches experimental observations and model comparisons closely under several test cases in mean and turbulent statistics. It also provides physically realizable flow through complex building geometries from a portion of Manhattan without needing to pre-process or smooth the building geometry.

Download & links

Article (PDF, 17217 KB)

Supplement (3811 KB)

Download & links

Article (17217 KB)
Full-text XML
Supplement (3811 KB)
BibTeX
EndNote

How to cite.

Received: 10 Mar 2025 – Discussion started: 31 Mar 2025 – Revised: 22 Oct 2025 – Accepted: 17 Nov 2025 – Published: 04 Dec 2025

This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan, last access: 1 December 2025).

1 Introduction

Large Eddy Simulation (LES) has been an effective tool in simulating turbulent flows in complex scenarios for decades (Mason, 1994; Lesieur and Metais, 1996; Stoll et al., 2020). This study focuses on what would, atmospherically speaking, often be called “microscale” flow (Liu et al., 2012; Zhang et al., 2016), resolving grid spacings in the range [1,100] meters. This resolution is important for understanding very fine-scaled boundary layer flows with applications to wind energy (Mehta et al., 2014), urban flow (Letzel et al., 2008; Xie and Castro, 2009), and vegetation canopy flows (Shaw and Schumann, 1992; Dupont and Brunet, 2008) among other potential applications. It is a challenging scale to simulate due to the coupled effects of large-scale forcing such as incoming weather patterns as well as extremely small-scale forcing such as plant morphology and surface and building material roughness with complex geometries.

There are a number of effective models in this type of regime in literature including the Weather Research and Forecasting (WRF) model's LES configuration (Skamarock et al., 2019), the Parallelized Large-Eddy Simulation Model (PALM) (Maronga et al., 2015), the Dutch Atmospheric Large-Eddy Simulation (DALES) (Heus et al., 2010), the System for Atmospheric Modeling (SAM) (Khairoutdinov and Kogan, 2000; Lyngaas et al., 2022), the Cloud Model 1 (CM1) (Bryan and Fritsch, 2002), the Regional Atmospheric Modeling System (RAMS) (Cotton et al., 2003), The UCLA LES model (Stevens et al., 1998), the Advanced Regional Prediction System (ARPS) (Droegemeier, 2003), the Eulerian-Lagrangian (EULAG) model (Prusa et al., 2008), the MicroHH model (Van Heerwaarden et al., 2017), the PyCLES model (Pressel et al., 2015), the Simulator fOr Wind Farm Applications (SOWFA) (Fleming et al., 2014), the FastEddy model (Sauer and Muñoz-Esparza, 2020; Muñoz-Esparza et al., 2020, 2022), OpenFOAM (Jasak et al., 2007), the UK Met Office NERC Cloud (MONC) model (Brown et al., 2015), and the Energy Research and Forecasting (ERF) model (Almgren et al., 2023).

Existing models cover a wide range of capabilities and applications including wildfires, urban flow, canopy flow, complex terrain flow, shallow and deep convection, complex boundary layers, windfarms, and pollutant dispersion. They cover a lot of different formulations including Eulerian and Lagrangian formulations, different thermodynamic formulations, moisture inclusion approaches, acoustic representations, immersed boundary approaches, numerical discretizations, gridding and variable staggering practices, and turbulent mixing approaches. There are also many different coding practices represented. Languages include Fortran, C $+ +$ , and Python. Some are CPU-threaded with directives (Bronevetsky and De Supinski, 2007). Some are accelerated for use on Graphics Processing Units (GPUs) with CUDA (Cook, 2012), directives (Chandrasekaran and Juckeland, 2017; Bronevetsky and De Supinski, 2007), or C $+ +$ portability layers (Trott et al., 2021; Norman et al., 2023 a; Zhang et al., 2019). Each comes with custom extensions such as advanced microphysics or radiation schemes and various parameterizations for added applications (wind turbines, urban roughness, fire, aerosol chemistry, etc.).

The goal of this study is to introduce and investigate the properties of a very high-order-accurate moist atmospheric LES model. The model was implemented with an emphasis on simplicity, readability, portability, and extensibility. The desire is to create a model that: (1) runs portably and efficiently (in all parts of the code) on CPUs and accelerators such as GPUs in a manner extensible to future hardware by using the Kokkos C $+ +$ portability library; (2) emphasizes readability in all portions of the code for clarity and reliability (clear looping, clear array indexing, clear operations, etc.); (3) has a simple and unambiguous coupler state that is easily augmented with new modules; (4) has embedded boundaries that are simple, resilient, and able to impose roughness on tangential flow; (5) runs stably for any configuration; and (6) has an easy to use ensemble capability to enable running many different configurations simultaneously in a single executable. With these features in mind, the ultimate goal of this code is to create a rapid prototyping environment for creating surrogate models of different microscale atmospheric processes. The simplicity, portable GPU acceleration, and ensemble capabilities enable rapid prototyping. The additive representation of complex physics through modular processes (all tied to the coupler state) enables enough realism to create physically meaningful surrogate models.

The pathway to producing the model investigated in this study, called “portUrb” (Portable Urban model), is mainly two-fold. First, a series of numerical discretization studies (Norman et al., 2011; Norman, 2014; Norman and Larkin, 2020; Norman, 2021; Norman et al., 2023 b) has led to the current choice of discretization: a collocated, upwind (Riemann solver-based), high-order-accurate Finite-Volume method using Weighted Essentially Non-Oscillatory (WENO) limiting (Liu et al., 1994; Feng et al., 2012) and a Strong Stability Preserving Runge–Kutta (SSPRK) discretization in time (Gottlieb et al., 2009). Also, progressive additions to a mini-application called “miniWeather” (Norman, 2020, https://github.com/mrnorman/miniWeather, last access: 1 December 2025) intended for hands-on education in High Performance Computing (HPC) programming has eventually led to portUrb, which provides realistic simulations while retaining much of the simplicity.

We wish to summarize the motivations for numerical discretizations chosen in this study. In doing so, we want to emphasize that discretizations form a very complex landscape with many tradeoffs, and we do not necessarily cast this study's choices as superior in any objective way. The use of a collocated grid simplifies the grid indexing and code layout, but it also introduces instabilities that would otherwise be ameliorated by different grid staggering choices. Therefore, upwind fluxes are used at cell edges in order to introduce dissipation that naturally scales with the accuracy of the reconstructions (Norman et al., 2023 b). It was found in Norman et al. (2023 b) that this can be performed in an inexpensive manner by separating acoustic and advective concerns in the upwind calculations. WENO limiting, while certainly expensive, has an advantage on GPUs in the sense that the extra computations required are largely performed on-chip with memory that has already been fetched from Dynamic Random Access Memory (DRAM), reducing the relative cost on GPUs. It also reduces oscillations near immersed boundaries and sharp hydrometeor fronts in moist convective simulations.

While fully-discrete Arbitrary DERivatives in space and time (ADER) time discretizations using Differential Transforms (DTs) have been investigated in the past (Norman and Finkel, 2012; Norman and Larkin, 2020; Norman, 2013 a), this particular approach to ADER discretization is most economically applied in a fully dimensionally split manner. It has been found through experimentation that less dissipative results are obtained by using multi-stage integrators with dimensional splitting inside each stage. Therefore, a Runge-Kutta method is used here: specifically a SSPRK method that maintains the non-oscillatory properties of the underlying spatial operator. Finally, high-order reconstruction is desired for the same reason that WENO limiting is desired. On GPUs, the extra computational cost is ameliorated by the significant compute throughput of GPU devices, and most of the computations are performed on data that is already on-chip. While it increases overall runtime, it does so significantly less than the factor by which computations are increased.

The present paper focuses on evaluating model behavior in test cases. The mathematical formulation and numerical discretizations are provided in Sect. 2; numerical experiments are described, performed, and discussed in Sect. 3; and conclusions are drawn with a discussion of future work in Sect. 4. The code used for this study is available in the Supplement.

2 Mathematical Formulation and Numerical Discretization

2.1 Moist, Compressible Large Eddy Simulation Equations

The atmospheric Large Eddy Simulation (LES) formulation is based on the moist, compressible, filtered Navier-Stokes equations of gas dynamics with non-hydrostatic buoyancy-driven motions. The gas is treated as a sum of ideal gases, dry air and water vapor – with immersed hydrometeors that contribute to mass but not to pressure. The sub-grid-scale dissipation is assumed to be dominated by eddies, and this model uses a common eddy viscosity formulation (Lilly, 1966, 1967). The equations are cast in Cartesian geometry as follows:

\begin{matrix} (1) & \begin{aligned} \frac{\partial}{\partial t} [\begin{array}{c} ρ \\ ρ u \\ ρ v \\ ρ w \\ ρ θ \\ ρ q_{v} \\ ρ K \\ ρ q_{ℓ} \\ ρ q_{P} \end{array}] + \frac{\partial}{\partial x} [\begin{array}{c} ρ u \\ ρ u u + p^{'} + τ_{11} \\ ρ u v + τ_{21} \\ ρ u w + τ_{31} \\ ρ u θ + τ_{θ 1} \\ ρ u q_{v} + τ_{v 1} \\ ρ u K + τ_{K 1} \\ ρ u q_{ℓ} + τ_{ℓ 1} \\ ρ u q_{P} + τ_{P 1} \end{array}] \\ + \frac{\partial}{\partial y} [\begin{array}{c} ρ v \\ ρ v u + τ_{12} \\ ρ v v + p^{'} + τ_{22} \\ ρ v w + τ_{32} \\ ρ v θ + τ_{θ 2} \\ ρ v q_{v} + τ_{v 2} \\ ρ v K + τ_{K 2} \\ ρ v q_{ℓ} + τ_{ℓ 2} \\ ρ v q_{P} + τ_{P 2} \end{array}] + \frac{\partial}{\partial z} [\begin{array}{c} ρ w \\ ρ w u + τ_{13} \\ ρ w v + τ_{23} \\ ρ w w + p^{'} + τ_{33} \\ ρ w θ + τ_{θ 3} \\ ρ w q_{v} + τ_{v 3} \\ ρ w K + τ_{K 3} \\ ρ w q_{ℓ} + τ_{ℓ 3} \\ ρ w q_{P} + τ_{P 3} \end{array}] \\ = [\begin{array}{c} 0 \\ 0 \\ 0 \\ - ρ^{'} g \\ 0 \\ 0 \\ K_{S} + K_{D} + K_{B} \\ 0 \end{array}] \end{aligned} \\ (2) & \begin{aligned} p = (ρ_{d} R_{d} + ρ_{v} R_{v}) T = ρ R^{⋆} T = C_{0} {(ρ θ)}^{γ}; \\ \frac{θ}{T} = \frac{R^{⋆}}{R_{d}} {(\frac{p_{0}}{p})}^{R_{d} / c_{p}}; \\ R^{⋆} = \frac{ρ_{d}}{ρ} R_{d} + \frac{ρ_{v}}{ρ} R_{v} \end{aligned} \end{matrix}

where ρ is total density; u, v, and w are wind velocities in the x, y, and z directions, respectively; θ is a form of virtual potential temperature defined by Eq. (2); q_v is the wet water vapor mixing ratio (such that ρ_v=ρq_v is the density of water vapor); K is unresolved, sub-grid-scale specific Turbulence Kinetic Energy (TKE); q_ℓ is a tracer quantity that contributes to mass but not to pressure; q_𝒫 is a passive tracer that is mass-weighted but contributes to neither mass nor pressure (e.g., mass-weighted number concentration for two-moment microphysics); p is total ideal gas pressure defined by Eq. (2), a sum of dry and water vapor pressures; $C_{0} = {(R_{d} {(p_{0})}^{- R_{d} / c_{p}})}^{γ}$ is a constant of proportionality; $γ = c_{p} / c_{v}$ is the ratio of specific heats of dry air; c_p is specific heat of dry air at constant pressure; c_v is specific heat of dry air at constant volume; p₀ is reference pressure; R_d is the dry air ideal gas constant; R_v is the water vapor ideal gas constant; K_S is TKE shear production; K_D is TKE dissipation; K_B is the TKE buoyancy source/sink; $τ_{i j} \forall i, j \in \{1, 2, 3\}$ is the unresolved eddy flux of wind momenta; $τ_{θ j} \forall j \in \{1, 2, 3\}$ is eddy flux of potential temperature; τ_vj is eddy flux of water vapor; τ_Kj is eddy flux of TKE (i.e., the “turbulence transport” terms for TKE); τ_ℓj is the eddy flux of tracers that contribute to mass but not to pressure (e.g., hydrometeors); τ_𝒫j is the eddy flux of passive mass-weighted tracers contributing to neither mass nor pressure; and g is acceleration due to gravity. The total density, ρ, is defined as the sum of all mass-contributing densities: $ρ = ρ_{d} + ρ_{v} + \sum_{ℓ} ρ_{ℓ}$ , where ρ_d is the density of dry air. The quantities in Eq. (1) are filtered quantities.

The perturbation pressure, $p^{'} = p - p_{H}$ , and perturbation density, $ρ^{'} = ρ - ρ_{H}$ , are deviations from a dominant hydrostatic balance denoted by:

\begin{matrix} (3) & \frac{d p_{H}}{d z} = - ρ_{H} g \end{matrix}

which can be defined arbitrarily. In these simulations, the balance is obtained with either a vertical potential temperature profile or a combination of a vertical temperature profile and a vertical water vapor dry mixing ratio profile. The purpose of removing hydrostasis is to better resolve the perturbations, which form the primary means of forcing for buoyancy-driven atmospheric flow. Poor resolution of hydrostasis results in spurious vertical velocities, which can render the solution physically unrealizable.

Equation set (1) will also be expressed in places using a generic vector form of conservation laws:

\begin{matrix} (4) & \frac{\partial q}{\partial t} + \frac{\partial f}{\partial x} + \frac{\partial g}{\partial y} + \frac{\partial h}{\partial z} = s \end{matrix}

2.1.1 Eddy Fluxes

In eddy viscosity models, particularly for the atmosphere, the molecular viscosity is so small as to be practically negligible in its direct dissipative influence. However, it leads to a cascade of eddies (only some of which are resolved) that eventually lead to dissipation at the larger scales. The basis of eddy viscosity models is to break the flow into resolved scales and unresolved perturbations. In this study, an “implicitly filtered” eddy viscosity model is used, meaning cell-averaged quantities from the Finite-Volume space-time discretization are considered to be resolved and all perturbations below the grid scale are considered unresolved.

Using averaging rules, the only quantities that remain after integration are products of resolved winds, ${\overline{u}}_{i} {\overline{u}}_{j}$ , and averages of unresolved perturbation products such as $\overline{u_{i}^{'} u_{j}^{'}}$ , or correlations. We employ the commonly used gradient diffusion assumption, assuming isotropically scaled eddies in the three spatial directions. The eddy viscosity, K_m, is considered to be a function of unresolved TKE and a stability-corrected length scale, giving an eddy flux of:

\begin{matrix} (5) & \begin{aligned} τ_{i j} = - ρ (K_{m} + ν) (\frac{\partial u_{i}}{\partial x_{j}} + \frac{\partial u_{j}}{\partial x_{i}} - \frac{2}{3} \frac{\partial u_{k}}{\partial x_{k}} δ_{i j}); \\ τ_{θ j} = - ρ (\frac{K_{m}}{\Pr_{T}} + \frac{ν}{\Pr}) \frac{\partial θ}{\partial x_{j}} \end{aligned} \\ (6) & \begin{aligned} τ_{K j} = - 2 ρ (K_{m} + ν) (\frac{\partial K}{\partial x_{j}}); \\ τ_{ℓ j} = - 2 ρ (\frac{K_{m}}{\Pr_{T}} + \frac{ν}{\Pr}) (\frac{\partial ρ_{ℓ}}{\partial x_{j}}); K_{m} = 0.1 L \sqrt{K} \end{aligned} \\ (7) & \begin{aligned} \Pr_{T} = \frac{Δ}{1 + 2 L}; L = min (\frac{0.76 \sqrt{K}}{N + ϵ}, Δ); \\ Δ = {(Δ x Δ y Δ z)}^{1 / 3}; \\ N = \{\begin{cases} \sqrt{(g / θ) (\partial θ / \partial z)} & if \partial θ / \partial z > 0 \\ 0 & otherwise \end{cases} \end{aligned} \end{matrix}

where $ϵ = 10^{- 20}$ is a small number to avoid division by zero; N is the Brunt–Väisälä frequency, a measure of atmospheric stability; Pr_T is the turbulent Prandtl number; Pr is the Prandtl number; ν is kinematic viscosity; δ_ij is the Kronecker delta; u₁, u₂, and u₃ are the velocity components in the x, y, and z directions (i.e., u, v, and w); and x₁, x₂, and x₃ are the x, y, and z directions. In Eq. (5), the k indices below are summed over the three indices for k (Einstein summation is implied). For the derivatives, second-order-accurate finite-differences valid at the appropriate cell faces for each individual flux are used.

2.1.2 TKE Sources and Sinks

The TKE evolution equation contains advection and turbulent transport (diffusion) on the left-hand-side of Eq. (1). The remaining processes resolved here are the sources and sinks that are, in general, not cast in conservation form. They are defined as:

\begin{matrix} (8) & \begin{aligned} K_{S} = \sum_{i, j \in {1, 2, 3}} \frac{ρ K_{m}}{2} (\frac{\partial u_{i}}{\partial x_{j}} + \frac{\partial u_{j}}{\partial x_{i}}) (\frac{\partial u_{i}}{\partial x_{j}} + \frac{\partial u_{j}}{\partial x_{i}}); \\ K_{D} = - ρ (0.19 + 0.51 L Δ^{- 1}) Δ^{- 1} K^{3 / 2}; \\ K_{B} = - \frac{ρ g K_{m}}{θ (\Pr)} \frac{\partial θ}{\partial z} \end{aligned} \end{matrix}

2.1.3 Surface Fluxes

Surface fluxes at the model surface and at immersed boundaries are enforced with Monin–Obukhov similarity theory (Monin and Obukhov, 1954). For all surface cells, and for all cells adjacent to an immersed cell, surface fluxes in each direction are applied via the following flux terms in a corresponding manner as the eddy fluxes in Eq. (1):

\begin{matrix} (9) & \begin{aligned} \frac{\partial}{\partial t} [\begin{array}{c} u \\ v \\ w \\ T \end{array}] + \frac{\partial}{\partial x} [\begin{array}{c} 0 \\ τ_{21}^{⋆} \\ τ_{31}^{⋆} \\ τ_{T 1}^{⋆} \end{array}] + \frac{\partial}{\partial y} [\begin{array}{c} τ_{12}^{⋆} \\ 0 \\ τ_{32}^{⋆} \\ τ_{T 2}^{⋆} \end{array}] \\ + \frac{\partial}{\partial z} [\begin{array}{c} τ_{13}^{⋆} \\ τ_{23}^{⋆} \\ 0 \\ τ_{T 3}^{⋆} \end{array}] = [\begin{array}{c} 0 \\ 0 \\ 0 \end{array}] \end{aligned} \\ (10) & \begin{aligned} τ_{21}^{⋆} = - \frac{κ^{2} v \sqrt{v^{2} + w^{2}}}{ξ^{2} Δ x}; τ_{31}^{⋆} = - \frac{κ^{2} w \sqrt{v^{2} + w^{2}}}{ξ^{2} Δ x}; \\ τ_{T 1}^{⋆} = - \frac{κ^{2} (T - T_{I}) \sqrt{v^{2} + w^{2}}}{ξ^{2} Δ x} \end{aligned} \\ (11) & \begin{aligned} τ_{12}^{⋆} = - \frac{κ^{2} u \sqrt{u^{2} + w^{2}}}{η^{2} Δ y}; τ_{32}^{⋆} = - \frac{κ^{2} w \sqrt{u^{2} + w^{2}}}{η^{2} Δ y}; \\ τ_{T 2}^{⋆} = - \frac{κ^{2} (T - T_{I}) \sqrt{u^{2} + w^{2}}}{η^{2} Δ y} \end{aligned} \\ (12) & \begin{aligned} τ_{13}^{⋆} = - \frac{κ^{2} u \sqrt{u^{2} + v^{2}}}{ζ^{2} Δ z}; τ_{23}^{⋆} = - \frac{κ^{2} v \sqrt{u^{2} + v^{2}}}{ζ^{2} Δ z}; \\ τ_{T 3}^{⋆} = - \frac{κ^{2} (T - T_{I}) \sqrt{u^{2} + v^{2}}}{ζ^{2} Δ z} \end{aligned} \\ (13) & \begin{aligned} ξ = \ln (\frac{\frac{Δ x}{2} + z_{0}}{z_{0}}); η = \ln (\frac{\frac{Δ y}{2} + z_{0}}{z_{0}}); \\ ζ = \ln (\frac{\frac{Δ z}{2} + z_{0}}{z_{0}}) \end{aligned} \end{matrix}

where z₀ is the roughness length, and T_I is the temperature of the immersed cell. If a temperature is omitted, no forcing toward temperature is applied. One can also implement a heat flux, $\overline{u^{'} θ^{'}}$ (analogous for y and z directions), which is added as a flux from the material surface. Fluxes are applied to wind velocity and to temperature because this is implemented as a separate module in portUrb and operates directly on the coupler state. Since immersed material is forced toward the hydrostatic background potential temperature, thermally, the absolute temperature value, T, of immersed material will generally change with height for stratified flows when no direct temperature is specified.

2.2 Dimensionally Split Finite-Volume Discretization

Using the vector form of the LES equations in Eq. (4), we split the equations such that each dimension is handled separately, and the vector form equations become, in general:

\begin{matrix} (14) & \frac{\partial q}{\partial t} + \frac{\partial f}{\partial x} = s \end{matrix}

This is integrated in space (semi-discretized) over domain-spanning cells of equal spacing, $Ω_{i} = [x_{i - 1 / 2}, x_{i + 1 / 2}]$ , where $x_{i \pm 1 / 2} = x_{i} \pm Δ x / 2$ . The divergence theorem is applied, and an upwind Riemann solver is used to give the form:

\begin{matrix} (15) & \frac{\partial {\overline{q}}_{i}}{\partial t} + \frac{\begin{matrix} R ({\tilde{q}}_{i} (x_{i + 1 / 2}), {\tilde{q}}_{i + 1} (x_{i + 1 / 2})) \\ - R ({\tilde{q}}_{i - 1} (x_{i - 1 / 2}), {\tilde{q}}_{i} (x_{i - 1 / 2})) \end{matrix}}{Δ x} = s ({\overline{q}}_{i}) \end{matrix}

where ${\tilde{q}}_{i} (x)$ is a reconstruction of the variation of q within the cell of domain Ω_i, $R (q^{-}, q^{+})$ is a Riemann solver that returns a flux vector, f(q), from multi-valued states at cell edges using the upwind state based on locally frozen characteristics (the flux Jacobian diagonalization: $\partial f / \partial q = L^{- 1} Λ L$ ). The state $(q^{-} + q^{+}) / 2$ is used to compute the diagonalization for upwinding. For simplicity, a constant acoustic speed of 350 m s⁻¹ is assumed, and supersonic flow is not supported in the Riemann solver. This is the same approach used in Norman et al. (2023 b).

While the gravity and eddy viscosity source term can be integrated with high-order discretizations, the hyperbolic dynamics use constant and uniform gravity, and the diffusive SGS dynamics do not benefit substantially from high-order accuracy. For the reconstruction, ${\tilde{q}}_{i} (x)$ , eleventh-order Weighted Essentially Non-Oscillatory (WENO) limiting (Liu et al., 1994) is used along with weight mapping (Feng et al., 2012). When a cell is within ten cells of an immersed boundary in any direction, weight mapping is omitted to reduce oscillations because immersed boundaries essentially represent permanent contact discontinuities.

2.3 Strong Stability Preserving Runge-Kutta Time Discretization

The optimal, three-stage, third-order accurate Strong Stability Preserving Runge-Kutta (SSPRK3) time discretization is used to integrate the remaining temporal Ordinary Differential Equation resulting from the semi-discretization in Sect. 2.2. The authors experimented with different dimensional splitting approaches, and it was found that the least numerically diffused results came from splitting the dimsensions fully independently within each stage of the SSPRK3 time stepping. While this limited the maximum stable CFL value to roughly 0.7, it resulted in less numerical dissipation than using dimensional splitting techniques outside the SSPRK3 time stepping procedure. This approach with SSPRK3 also produced less numerical diffusion than attempts at using an ADER time discretization with Differential Transforms (Norman and Finkel, 2012; Norman, 2013 a, 2021) with dimensional splitting outside the ADER computations. This approach to dimensional splitting within each RK stage has also been shown to reduce errors at singularities on non-orthogonal grids such as the cubed-sphere (Katta et al., 2015).

2.4 Free-Slip Immersed Boundaries with Surface Friction

2.4.1 Modifying Variables on the Grid

Simplified immersed boundaries are implemented to represent solid surfaces embedded in the flow. These only represent the proportion of immersed material in a cell along with the roughness length of that immersed material as well as the temperature and/or heat flux. They are implemented similarly to how the solid wall boundaries are implemented at the bottom and top of the domain. Each cell will have an immersed “proportion”, ${\overline{σ}}_{i, j, k} \in [0, 1]$ , which is the proportion of the cell that is immersed material. Before each Runge-Kutta stage, immersed boundaries are nudged toward hydrostatic values at rest according to the fifth power of the immersed proportion and a timescale, $T_{i, j, k} \in [1, \infty)$ :

\begin{matrix} (16) & \frac{d {\overline{q}}_{i, j, k}}{d t} = \frac{{({\overline{σ}}_{i, j, k})}^{5}}{T_{i, j, k}} ({\overline{q}}_{H, k} - {\overline{q}}_{i, j, k}) \end{matrix}

where ${\overline{q}}_{H, k} = {[{\overline{ρ}}_{H, k}, 0, 0, 0, {\overline{ρ θ}}_{H, k}, 0, 0, 0]}^{⊤}$ and $T_{i, j, k} = 1$ means the cell is immediately set to its target value. $T_{i, j, k}$ is, in general, the time scale in terms of number of time steps for adjustment to a hydrostatic state at rest. The fifth power is obtained from numerical experiments, matching low resolution to high resolution solutions for a coarsely resolved sphere in Sect. 3.5.

The timescale, $T_{i, j, k}$ , is 8, 4, and 2 if the cell is immersed and is within one, two, or three cells in all directions (including corners) of a non-immersed boundary. All other immersed cells have a timescale of 1, meaning they are immediately set between every stage to the target immersed values. Essentially, this approach “softens” the immersed discontinuity slightly when the flow switches from non-immersed to immersed to reduce Runge oscillations in the presence of high-order reconstruction. This approach also ensures that all immersed boundaries get close to their targets each time step and that internal regions of larger immersed structures like buildings do not allow any flow through at all.

In general, when the user ingests building or terrain geometry, it can be quite sharp in its features. The use of WENO limiting, the avoidance of WENO weight mapping near immersed material, the use of an upwind Riemann solver, and the slight softening of immersed forcing time scale near the outside of structures appears to be enough to avoid visible Runge oscillations while simultaneously striving for a large amount of resolved turbulence and boundary resolution per degree of freedom.

2.4.2 Special Treatment for Tangential Winds and Pressure

In a given direction, the normal wind velocity boundary condition is zero inside immersed boundaries. The tangential winds, however, need to be free-slip so that surface friction can later be applied to impose a roughness length on the tangential flow. To enable this, for transverse velocities relative to a given direction (recall the dynamics are implemented in a dimensionally split fashion), the moment an immersed proportion greater than zero is encountered, the value of the last non-immersed cell before an immersed boundary is replicated throughout the rest of that direction in a stencil. This, in effect, implements a zero derivative Neumann boundary for tangential wind velocities to create a free-slip solid wall boundary. This way, normal velocity is forced toward zero, but tangential velocity is not. This implementation is ad hoc to a dimensionally split implementation.

In this model, during the semi-discretized spatial discretization of the dynamics, pressure is treated as its own separate reconstructed variable. Pressure is given a zero derivative boundary condition with the same treatment used for the two tangential wind velocities in a given direction.

2.4.3 Modifications to LES Closure and Surface Friction

Many of the modules in the model need to be changed to exhibit physical behavior in the presence of immersed boundaries, where mass and thermodynamics variables are set to hydrostatic values, and wind velocities and tracers are set to zero. The LES closure, for instance, cannot mix tangential zero velocity values in immersed cells with adjacent non-immersed cells, or it would essentially imply a surface friction term without consideration of roughness length or Monin-Obukhov similarity theory, which is in the territory of the surface friction scheme, not the LES closure. Therefore, the LES SGS closure sets SGS interface fluxes, τ, to zero whenever one of the two neighboring cells is immersed.

The surface friction scheme is modified to no longer only work at the surface but to impose fluxes on the tangential wind and temperature at any interface where one of the two adjacent cells is immersed. For already immersed cells, this has little to no effect because the cell is already being forced toward immersed target values (hydrostatic flow at rest) in the dynamical core. For non-immersed cells adjacent to an immersed cell, this imposes friction and thermal forcing according to immersed temperature, roughness length, or heat flux. Those three values along with the immersed proportion are defined at every cell in the domain so that one can customize the roughness, temperature, and heat flux of any immersed material in the domain. If one desires no-slip immersed boundaries, the immersed roughness can be set to a large value. If one desires fully free-slip, it can be set to a small value.

2.4.4 Modifications to the WENO Reconstruction Approach

WENO reconstruction performs reconstruction over multiple sub-stencils within an overall stencil and uses the weighted sum of reconstructions. The higher the Total Variation (a measure of oscillation) of a sub-stencil, the lower its weight. Particularly for normal velocity in a given direction, the zero value in boundary ghost cells lead to all stencils containing those ghost cells having a permanent discontinuity. Thus, WENO limiting weights the stencil without ghost cells quite highly and essentially ignores the boundary conditions.

It is not immediately clear how this leads to artifacts in the solution, but two options are the most likely culprits. First, when this behavior occurs, boundary cells end up using fully one-sided reconstructions most of the time, which have lower stability than reconstructions containing cells on both sides of the reconstructed cell (Norman, 2021). Second, when the edge flux is zero, but the reconstruction has no notion of the boundary conditions, this can lead to one cell face with large mass flux and the other cell face with zero mass flux. This can lead to creating higher pressures and, in turn, solution artifacts in cells adjacent to boundaries.

To avoid both of these possibilities, we alter the Total Variation (TV) of the stencil that does not contain the boundary ghost cells to have the maximum TV of the other stencils. This way, the stencil without ghost cells cannot consume the majority of the weighted sum of stencils, and at least one of the other stencils containing ghost cell information will have influence on the final WENO polynomial, injecting boundary information to the final reconstruction and ensuring the reconstruction is not fully one-sided most of the time. It also allows the WENO weighting procedure to generally choose the stencils with lowest TV – an essential component of non-oscillatory reconstruction.

2.5 Variable Vertical Grid Spacing

The vertical grid in portUrb can be arbitrarily set by providing a set of monotonically increasing vertical interface levels. This is supported simply in the numerics by use of a metric transformation from the physical vertical coordinate, z, to an equally spaced reference coordinate: ζ(z). For convenience, the ζ coordinate is simply set to be the vertical interface index such that the grid spacing in ζ coordinates is always one. Before reconstruction, the cell average, $\overline{q_{k}}$ , is multiplied by the physical grid spacing, Δz_k, to transform into ζ coordinates (recalling that Δζ≡1). Then, after reconstruction in ζ coordinates, the cell-edge values are multiplied by the inverse of a high-order reconstruction of $d z / d ζ$ at the cell edge in question to transform back into physical vertical coordinates.

The portUrb code has several convenience functions to generate a stretched vertical grid. There is a function to generate equal grid spacing. There is a function that generates exponentially increasing vertical grid spacing based on an input of the vertical domain extent and the desired number of vertical levels. The routine fits the constant multiple by which grid spacing increases between successive levels to those parameters. There is also a function that takes an input of a lower level grid spacing and a higher level grid spacing along with the vertical domain extent and the domain over which the grid spacing transitions between the two using smooth polynomial interpolation. This routine determines the number of vertical grid cells rather than having that value specified by the user.

2.6 Boundary Conditions

There are four different boundary conditions that can be specified in the dynamical core and SGS TKE closure routines: periodic, solid wall, open, and precursor forcing. The precursor boundary forcing is implemented by declaring one coupler object to be the concurrent precursor simulation. That simulation then saves the state in the domain's “ghost cells” for use by the forced simulation for each of the Runge-Kutta stages. Then, the saved ghost cells from the precursor simulation are copied into the forced simulation's ghost cells at each Runge-Kutta stage if the velocity normal to the boundary is flowing into the domain. An open boundary is used if the velocity normal to the boundary is flowing out of the domain.

2.7 Model Construction

The model is constructed with a “coupler” at the core. The coupled state is defined to be dry air density, wind velocities in all three directions, absolute temperature, and tracer densities. For each tracer, it is defined whether the tracer should remain positive, whether it contributes to mass, and whether it is diffused with the SGS closure. The only required tracers are water vapor density (for all simulations) and mass-weighted TKE (if the LES closure module is used). All tracers must be specified as mass-weighted or total density. This coupled state has the benefit of being non-ambiguous (whereas definitions of potential temperature and mixing ratios can be varied). A disadvantage computationally is that most parameterizations and dynamical cores use mixing ratios and potential temperature, meaning a conversion is needed before and after every module, typically. This cost is deemed acceptable to have a non-ambiguous coupler state.

A “module” is a self-contained action that changes the coupled state or produces output, and all model actions are implemented as modules. The coupler state is passed to a module along with the desired time step. Modules may sub-step to maintain stability as needed, but in the experiments in this study, the dynamical core imposes the smallest time step, and all modules are run at the dynamical core time step for now. The initial state, therefore, is implemented based on the coupler's state rather than the dynamical core state defined in Eq. (1).

2.8 Ensembles

Ensembles within a single executable are flexibly supported with a core module called an “Ensembler”, a class that manages ensembles. One can implement different “dimensions” of ensembles, specify how many tasks (relative to the smallest ensemble member) a given member needs, split the overall MPI communicator into sub-communicators (one for each ensemble member), and redirect the output of each member to its own file. This way, each ensemble member can do any arbitrary operation, read from any individual file, and output to any arbitrary file. With this approach, one can submit capability-class jobs (that is, jobs that use 20 % or more of the machine at a time) on supercomputers such as DOE Leadership Computing Facilities like the Oak Ridge Leadership Computing Facility or the Argonne Leadership Computing Facility. This capability is used in several of the numerical experiments in Sect. 3.

2.9 Portable C $+ +$ Approach

The model is coded in a portable C $+ +$ library called Yet Another Kernel Launcher (YAKL) (Norman et al., 2023 a), which is based on the Kokkos portable C $+ +$ library (Trott et al., 2021). The core of these libraries is to wrap kernel code (ostensibly the code inside a set of loops over the grid cells being operated on) in a class object. That object is then passed to a launcher that launches the code in parallel on a given architecture. For Nvidia, AMD, and Intel GPUs, the CUDA, HIP, and SYCL specifications are used to launch the code, respectively. There are other backends available as well such as OpenMP threading for multi-core CPUs. All modules in portUrb that operate on a set of grid cells are run on whatever “device” is available on a given machine (typically a GPU). Care has been taken to ensure all kernels run efficiently by streaming memory effectively from DRAM (ensuring higher GPU memory bandwidth is realized) (Norman, 2013 b) and keeping kernel resource usage low enough to use the entire GPU effectively (a concept known as “occupancy”) (Shobaki et al., 2020).

3 Numerical Experiments

The initial states for all experiments are implemented with 9-point Gauss Legendre Lobatto (GLL) quadrature in all directions within each cell. All experiments use a CFL value of 0.6 assuming a maximum wave speed of 450 m s⁻¹ (roughly 350 m s⁻¹ acoustic speed at the surface with up to 100 m s⁻¹ wind speeds). All experiments use equal grid spacings in all directions. An initial TKE of 10⁻⁶ is initialized uniformly to seed initial unresolved TKE. This initial value appears to have little effect on the solution as it quickly resolves into an equilibrium state driven largely by pressure gradient or geostrophic forcing, shear production at immersed interfaces, and by buoyancy in some simulations.

3.1 Neutral Atmospheric Boundary Layer

3.1.1 Specification

This experiment simulates a buoyantly neutral atmospheric boundary layer seeded by initial temperature perturbations and maintained by geostrophic forcing (described in Sect. 3.1.2). A domain of $4 \times 4 \times 1$ km is used to simulate for ten model hours with 10 m grid spacing. This test case uses an initial potential temperature profile of:

\begin{matrix} (17) & θ_{initial} (z) = \{\begin{cases} 300 & if z < 500 m \\ 300 + 0.08 (z - 500) & if z \geq 500 m and \\ z < 650 \\ 300 + 0.08 (150) \\ + 0.003 (z - 650) & if z \geq 650 m \end{cases} \end{matrix}

This creates a neutral environment in the lowest 500 m, a strong stable inversion between 500 and 650 m, and a weaker inversion above 650 m. An initial wind velocity of u=10 m s⁻¹ and $v = w = 0$ is specified. From this, an initial surface pressure of 10⁵ Pa along with hydrostasis and the equation of state define a hydrostatically balanced initial state that provides the density. Finally, cell-wise random uniform perturbations in the domain $[- 0.25, 0.25]$ K are added to the temperature field. A surface roughness length of 0.1 m is used. Geostrophic forcing is used with u_G=10 m s⁻¹, v_G=0 m s⁻¹, and ϕ_G≈43.289° (the exact value is the solution to $2 Ω \sin ϕ_{G} = 10^{- 4}$ ) as described in Sect. 3.1.2.

This uses no surface heat flux, and the surface temperature is set to 300 K. Periodic boundary conditions are used in the horizontal directions, and solid no-slip solid walls are used in the vertical direction. Surface fluxes are only applied at the bottom boundary, and a sponge layer is implemented in the top 10 % of the model with a forcing scaling of z³ to force the model variables to zero for everything except horizontal velocities, which are forced to the horizontal mean values at each respective vertical level.

3.1.2 Geostrophic Forcing

The model is forced to an equilibrium state with geostrophic forcing, which imposes the following extra source term to the horizontal wind velocities:

\begin{matrix} (18) & \frac{\partial}{\partial t} [\begin{array}{c} {\overline{u}}_{i, j, k} \\ {\overline{v}}_{i, j, k} \end{array}] = [\begin{array}{c} 2 Ω \sin ϕ_{G} ({\overline{v}}_{k, horiz} - v_{G}) \\ - 2 Ω \sin ϕ_{G} ({\overline{u}}_{k, horiz} - u_{G}) \end{array}] \end{matrix}

In Eq. (18), ${\overline{u}}_{k, horiz}$ and ${\overline{v}}_{k, horiz}$ are the averages of u and v over the global horizontal domain at the vertical level, k. Equation set (18) in continuous form as it would be applied to equation set (1) would appear as:

\begin{matrix} (19) & \frac{\partial}{\partial t} [\begin{array}{c} ρ u \\ ρ v \end{array}] = [\begin{array}{c} 2 ρ Ω \sin ϕ_{G} ({〈v〉}_{horiz} - v_{G}) \\ - 2 ρ Ω \sin ϕ_{G} ({〈u〉}_{horiz} - u_{G}) \end{array}], \end{matrix}

where 〈u〉_horiz and 〈v〉_horiz (both functions of z and time) are the horizontal averages of $u (x, y, z, t)$ and $v (x, y, z, t)$ , respectively.

3.1.3 Results

Figure 1 plots the wind magnitude for the neutral atmospheric boundary layer test case at ten model hours. The scale and angle of turbulent fluctuations as well as behavior near the inversion match that of Fig. 1 of Sauer and Muñoz-Esparza (2020). The supergeostrophic wind speed at the inversion is present as well (Pedersen et al., 2014). Figure 2 plots vertical profiles of the domain mean u and v velocities as a function of height at 8 and 10 model hours. Also plotted are the vertical domain mean profiles from Fig. 2 of Sauer and Muñoz-Esparza (2020) as comparison points. Both models show the supergeostrophic wind at the inversion and increase in wind speed from 8 to 10 model hours in both horizontal wind directions. However, the u-direction wind speed in portUrb is lower aloft and higher at the surface. The v-velocity is quite similar between portUrb and FastEddy at t=10 h, but portUrb has a smaller difference between 8 and 10 h of simulation. The potential temperature profile is also similar, showing a slight growth in the inversion height and smoothing of the initial inversion temperature discontinuity. The second potential temperature discontinuity at z=650 m is less diffused in portUrb than it is in FastEddy. The magnitudes of u-velocity and v-velocity are slightly lower in this study, possibly due to the use of a larger grid spacing, but the structure remains similar.

https://gmd.copernicus.org/articles/18/9605/2025/gmd-18-9605-2025-f01

Figure 1Plots of wind magnitude in m s⁻¹ for the neutral ABL test case at ten model hours.

PortUrb: a performance portable, high-order, moist atmospheric large eddy simulation model with variable-friction immersed boundaries

2.1 Moist, Compressible Large Eddy Simulation Equations

2.1.1 Eddy Fluxes

2.1.2 TKE Sources and Sinks

2.1.3 Surface Fluxes

2.2 Dimensionally Split Finite-Volume Discretization

2.3 Strong Stability Preserving Runge-Kutta Time Discretization

2.4 Free-Slip Immersed Boundaries with Surface Friction

2.4.1 Modifying Variables on the Grid

2.4.2 Special Treatment for Tangential Winds and Pressure

2.4.3 Modifications to LES Closure and Surface Friction

2.4.4 Modifications to the WENO Reconstruction Approach

2.5 Variable Vertical Grid Spacing

2.6 Boundary Conditions

2.7 Model Construction

2.8 Ensembles

2.9 Portable C++ Approach

3.1 Neutral Atmospheric Boundary Layer

3.1.1 Specification

3.1.2 Geostrophic Forcing

3.1.3 Results

3.2 Dry Convective Boundary Layer

3.2.1 Specification

3.2.2 Results

3.3 Supercell

3.3.1 Specification

3.3.2 Results

3.4 Staggered Surface-Mounted Cube Array

3.5 Coarsely Resolved Sphere

3.6 Flow Through Manhattan Buildings

3.7 City Flow Forced by Turbulent Precursor

3.8 Performance

2.9 Portable C $+ +$ Approach