HOPE: an arbitrary-order non-oscillatory finite-volume shallow water dynamical core with automatic differentiation

Zhou, Lilong; Xue, Wei; Shen, Xueshun

doi:10.5194/gmd-18-8175-2025

Articles | Volume 18, issue 21

https://doi.org/10.5194/gmd-18-8175-2025

Articles | Volume 18, issue 21

Model description paper

05 Nov 2025

Model description paper |

| 05 Nov 2025

HOPE: an arbitrary-order non-oscillatory finite-volume shallow water dynamical core with automatic differentiation

Lilong Zhou, Wei Xue, and Xueshun Shen

Abstract

This study presents the High Order Prediction Environment (HOPE), an automatically differentiable, non-oscillatory finite-volume dynamical core for shallow water equations on the cubed-sphere grid. HOPE integrates five key features: (1) arbitrary high-order accuracy through genuine two-dimensional reconstruction schemes; (2) essential non-oscillation via adaptive polynomial order reduction in discontinuous regions; (3) exact mass conservation inherited from finite-volume discretization; (4) automatically differentiable and (5) GPU-native scalability through PyTorch-based implementation. Another innovation is the development of a two-way coupled ghost cell interpolation method. This approach incorporates information from adjacent panels on both sides of the boundary, thereby overcoming the integration instability inherent in one-sided ghost cell interpolation approaches when using high-order reconstruction. Leveraging the linear operator nature of this interpolation scheme, we optimized its computation: information exchange across the panel boundary is achieved through a single matrix-vector multiplication instead of iterative coupling, without accuracy loss. Numerical experiments demonstrate the capabilities of HOPE: The 11th-order scheme reduces errors to near double-precision round-off levels in steady-state geostrophic flow tests on coarse grids. Maintenance of Rossby–Haurwitz waves over 100 simulation days without crashing. A cylindrical dam-break test case confirms the genuinely two-dimensional WENO scheme exhibits significantly better isotropy compared to dimension-by-dimension approaches. Moreover, normalized conservation errors in total energy, total potential enstrophy, and total zonal angular momentum significantly reduce with increasing order of the reconstruction scheme. Two implementations are developed: a Fortran version for convergence analysis and a PyTorch version leveraging automatic differentiation and GPU acceleration. The PyTorch implementation maps reconstruction and quadrature operation to 2D convolution and Einstein summation respectively, achieving about 2× speedup on single NVIDIA RTX3090 GPU versus Dual Intel E5-2699v4 CPUs execution. This design enables seamless coupling with neural network parameterizations, positioning HOPE as a foundational tool for next-generation differentiable atmosphere models.

Download & links

How to cite.

Received: 23 Apr 2025 – Discussion started: 27 May 2025 – Revised: 17 Sep 2025 – Accepted: 22 Sep 2025 – Published: 05 Nov 2025

1 Introduction

Recent years have witnessed a surge in research integrating numerical weather prediction (NWP) with artificial intelligence (AI) techniques. A prominent advancement in this domain is the hybrid modeling paradigm, which synergizes the complementary strengths of both approaches. This framework implements numerical dynamical cores within AI software platforms such as TensorFlow or PyTorch, thereby enabling seamless integration of AI models into the numerical solution process for atmospheric dynamical partial differential equations (PDEs). Unlike the fully surrogated methods, such as Pangu-Weather (Bi et al., 2022), FengWu (Chen et al., 2023), GraphCast (Lam et al., 2023), NowcastNet (Zhang et al., 2023), hybrid model integrates traditional PDE-based dynamical cores with neural network (NN)-based physical parameterizations. The auto-differentiable nature of the dynamical core enables training losses to propagate through the entire model during backpropagation, allowing the NN-based parameterization module to access more comprehensive residual information. NeuralGCM (Kochkov et al., 2024) exemplifies this hybrid approach by combining a spectral numerical dynamical core with NN-based physical parameterizations. The governing equation-based dynamical core imposes rigorous physical constraints within the framework, effectively mitigating the blurriness characteristic of purely data-driven models. Furthermore, NeuralGCM demonstrates superior power spectra performance compared to conventional data-driven meteorological models. While the implementation of a spectral dynamical core in NeuralGCM theoretically enables infinite-order accuracy, the global nature of spectral expansion restricts the method's scalability. Furthermore, in contrast to finite-volume algorithms which inherently ensure strict mass conservation, achieving strict mass conservation with NeuralGCM's spectral dynamical core requires supplementary modifications.

To address these shortcomings, we present the High Order Prediction Environment (HOPE) dynamical core with following contributions:

A new-generation shallow-water model architecture integrating:
- i.
  Arbitrary high-order accuracy (up to 13th-order verified) via tensor product polynomial (TPP).
- ii.
  A finite-volume scheme requiring only information from a local stencil surrounding each cell to perform state updates, enabling massively parallel scalability.
- iii.
  Inherent mass conservation from finite-volume discretization.
- iv.
  A WENO (Weighted Essentially Non-Oscillatory) based, adaptive polynomial order reduction for essential non-oscillation.
A novel two-way coupled ghost cell interpolation scheme achieving:
- i.
  Arbitrary odd-order convergence through central stencil interpolation.
- ii.
  Single sparse matrix-vector operation replacing iterative procedures (Eq. A.12).
- iii.
  Overcome numerical instability beyond 7th-order accuracy.
PyTorch-based high performance differentiable implementation featuring:
- i.
  GPU acceleration through convolution/einsum operator in PyTorch, 2× speedup on single RTX3090 GPU vs. Dual Intel Xeon 2699v4 CPUs.
- ii.
  Automatic ghost cell interpolation matrix generation via native auto-differentiation.
- iii.
  Seamless integration with NN modules for hybrid modeling.

In the following part of the introduction, we introduce the relevant work on constructing the HOPE model, and from this, we elaborate on the challenges and motivations for establishing the algorithm of the dynamical core. High-order accuracy is an extremely appealing trait for the design of a dynamical core, particularly in high-resolution atmospheric simulations. A dynamical core model with high-order accuracy produces significantly less simulation error in smooth regions compared to a low-order model. Furthermore, even when the resolution is equivalent or coarser, a high-order model is capable of resolving finer details than a low-order one.

A high-order finite volume model was developed on cubed sphere, named MCORE (Ullrich et al., 2010; Ullrich and Jablonowski, 2012). High-order reconstruction requires information from cells external to panel boundaries (commonly termed ghost cells). Due to coordinate discontinuities across the six panels of the cubed-sphere grid, MCORE implements an interpolation scheme for ghost cells based on one-side information. This approach employs a two-dimensional reconstruction stencil to interpolate prognostic variables onto Gaussian quadrature points within each cell, followed by integration to obtain cell-averaged values. The authors assert that MCORE's convergence rate can theoretically be of arbitrary order. However, during the design of the ghost cell interpolation for HOPE, we initially attempted to use a one-sided reconstruction stencil similar to MCORE. While stable integration was achieved with the 3rd-, 5th-, and 7th-order schemes, the model became unstable when schemes of 9th-order or higher were used. In other words, for HOPE, overcoming the 7th-order accuracy limitation necessitated the development of a new ghost cell interpolation scheme.

Therefore, we designed a bilateral interpolation algorithm. This algorithm employs an iterative procedure that incorporates information from both adjacent panels of the cubed-sphere grid simultaneously. This enabled stable model integration even with higher-order schemes. Though not detailed in the paper, our testing confirmed stable integration even at 13th-order accuracy.

In this article, we devise the reconstruction based on tensor product polynomial (TPP). When the stencil width is k, our method achieves kth order accuracy, surpassing MCORE by one order of accuracy with the same stencil width. In addition, we have developed a new class of ghost interpolation schemes that abandon the use of one-sided stencils and instead adopt central stencils. This new approach enables the scheme to overcome the non-physical oscillations arising from interpolation at panel boundaries. Our method allows for arbitrary order of accuracy while the field is smooth enough, and we have verified this by testing up to the 11th order.

Nevertheless, higher-order reconstruction does not invariably yield superior simulation outcomes, as elucidated by analyzing the properties of the Taylor series remainder term. The accuracy of approximating a function via a Taylor series requires two essential conditions: (1) the existence of higher-order derivatives of the function at the expansion point, and (2) the convergence of the series within the relevant domain. When the field exhibits poor continuity – where higher-order derivatives either do not exist or lead to increasing residuals with series order – employing higher-order approximations can introduce significant errors. Therefore, for reconstruction schemes based on polynomial functions, high-order accuracy should only be adopted when the field is sufficiently smooth. Conversely, for discontinuous or poorly continuous fields, reducing the reconstruction order is necessary to maintain numerical stability and effectiveness.

The Weighted Essentially Non-Oscillatory (WENO) scheme is an adaptive limiter widely employed in computational fluid dynamics (CFD) to address this challenge. Originally developed for one-dimensional problems (Liu et al., 1994), WENO was later extended to two dimensions by Shi et al. (2002) using two distinct approaches: a genuinely two-dimensional (WENO2D) scheme and a dimension-by-dimension reconstruction. In this work, we implement WENO2D scheme to enforce the non-oscillatory property. This approach effectively suppresses non-physical oscillations near sharp discontinuities while preserving high-order accuracy in smooth regions.

The remainder of this paper is organized as follows: Sect. 2 details the governing equations on the cubed-sphere grid. Section 3 presents the numerical methods, including reconstruction schemes, panel boundary treatment method, and temporal marching scheme. Section 4 focuses on HOPE's high-performance implementation leveraging PyTorch's built-in operators for GPU acceleration. The adoption of PyTorch simultaneously enables automatic differentiation capabilities through its computational graph construction. Section 5 validates model performance through standard test cases, followed by conclusions and future directions in Sect. 6.

2 Governing equation on cubed sphere

The cubed-sphere grid partitions the spherical domain into six panels, each with a structured and rectangular computational space. This configuration facilitates high-order spatial reconstruction and efficient massive-thread parallelism (see Fig. 1). Early work on solving the primitive equations on the cubed-sphere grid dates back to Sadourny (1972). In recent decades, the cubed-sphere grid has been widely adopted in high-order-accuracy atmospheric models. For instance, Chen and Xiao (2008) developed a shallow water model using the multi-moment constrained finite volume method on the cubed sphere, achieving 3rd–4th order accuracy. Ullrich et al. (2010) designed a high-order finite volume dynamical core based on this grid, Nair et al. (2005a, b) implemented a discontinuous Galerkin model on the cubed sphere.

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f01

Figure 1Cubed sphere grid. (a) Physical space; (b) computational space. Six panels are identified by indices from 1 to 6.

Download

In this study, we also employ the equiangular cubed-sphere grid. Although the mesh is non-orthogonal, the computational space can still be treated as a rectangular grid by adopting a generalized curvilinear coordinate system. In this section, we present the shallow water equations in generalized curvilinear coordinates and discuss specialized treatments for topography.

Shallow water equation set on gnomonic equiangular cubed sphere grid is written as

\begin{matrix} (1) & \{\begin{cases} \frac{\partial \sqrt{G} ϕ}{\partial t} + \frac{\partial \sqrt{G} ϕ u}{\partial x} + \frac{\partial \sqrt{G} ϕ v}{\partial y} = 0 \\ \frac{\partial \sqrt{G} ϕ u}{\partial t} + \frac{\partial \sqrt{G} (ϕ u u + \frac{1}{2} G^{11} ϕ^{2})}{\partial x} + \frac{\partial \sqrt{G} (ϕ u v + \frac{1}{2} G^{12} ϕ^{2})}{\partial y} = ψ_{M}^{1} + ψ_{C}^{1} + ψ_{B}^{1} \\ \frac{\partial \sqrt{G} ϕ v}{\partial t} + \frac{\partial \sqrt{G} (ϕ u v + \frac{1}{2} G^{21} ϕ^{2})}{\partial x} + \frac{\partial \sqrt{G} (ϕ v v + \frac{1}{2} G^{22} ϕ^{2})}{\partial y} = ψ_{M}^{2} + ψ_{C}^{2} + ψ_{B}^{2} \end{cases} . \end{matrix}

The gnomonic equiangular coordinates are represented by (x, y, p), where $(x, y) \in [- \frac{π}{4}, \frac{π}{4}]$ are local equiangular coordinate of each panel and p=1, 2, 3, …, n_p is panel index as shown in Fig. 1b; n_p=6 is the number of panels. ϕ=gh is geopotential, h is fluid thickness, u and v are the contravariant wind components in the x and y directions, respectively, g is gravity acceleration. ψ_M, ψ_C, ψ_B are the metric term, Coriolis term and bottom topography influence term

\begin{matrix} (2) & \begin{aligned} ψ_{M} = & (\begin{array}{c} ψ_{M}^{1} \\ ψ_{M}^{2} \end{array}) = \frac{2 \sqrt{G}}{δ^{2}} \\ (\begin{array}{l} - X Y^{2} ϕ u u + Y (1 + Y^{2}) ϕ u v \\ X (1 + X^{2}) ϕ u v - X^{2} Y ϕ v v \end{array}) \end{aligned} \\ (3) & \begin{aligned} ψ_{C} = & - \sqrt{G} \sqrt{G} f k \times ϕ u = \sqrt{G} f (\begin{array}{cc} - G^{12} & G^{11} \\ - G^{22} & G^{12} \end{array}) \\ (\begin{array}{c} \sqrt{G} ϕ u \\ \sqrt{G} ϕ v \end{array}) \end{aligned} \\ (4) & ψ_{B} = - \sqrt{G} ϕ G^{i j} \frac{\partial ϕ_{s}}{\partial x^{j}} = - \sqrt{G} ϕ (\begin{array}{c} G^{11} \frac{\partial ϕ_{s}}{\partial x} + G^{12} \frac{\partial ϕ_{s}}{\partial y} \\ G^{21} \frac{\partial ϕ_{s}}{\partial x} + G^{22} \frac{\partial ϕ_{s}}{\partial y} \end{array}) \end{matrix}

where $X = \tan x, Y = \tan y, δ = \sqrt{1 + X^{2} + Y^{2}}$ , f=2Ωsin θ is Coriolis parameter, ϕ_s=gh_s is surface geopotential, and h_s is surface height.

\begin{matrix} (5) & \sin θ = \{\begin{cases} Y / δ, & p \in {1, 2, 3, 4} \\ 1 / δ, & p = 5 \\ - 1 / δ, & p = 6 \end{cases} \end{matrix}

The contravariant metric on cubed-sphere is

\begin{matrix} (6) & G^{i j} = \frac{δ^{2}}{r^{2} (1 + X^{2}) (1 + X^{2})} (\begin{array}{ll} 1 + Y^{2} & X Y \\ X Y & 1 + X^{2} \end{array}) . \end{matrix}

The covariant metric

\begin{matrix} (7) & G_{i j} = \frac{r^{2} (1 + X^{2}) (1 + Y^{2})}{δ^{4}} (\begin{array}{ll} 1 + X^{2} & - X Y \\ - X Y & 1 + Y^{2} \end{array}) \end{matrix}

and the metric determinant is given by

\begin{matrix} (8) & \sqrt{G} = \sqrt{\det (G_{i j})} = \frac{r^{2} (1 + X^{2}) (1 + Y^{2})}{δ^{3}} . \end{matrix}

r is radius of earth.

The contravariant wind vector $V = (u, v)$ can be convert to wind vector on spherical LAT/LON coordinate $V_{s} = (u_{s}, v_{s})$ by the following formula

\begin{matrix} (9) & (\begin{array}{c} u_{s} \\ v_{s} \end{array}) = J (\begin{array}{c} u \\ v \end{array}) \end{matrix}

where J is a 2×2 conversion matrix, the expressions are different in each panel

\begin{matrix} (10) & \begin{aligned} J = r (\begin{array}{ll} \cos θ \frac{\partial λ}{\partial x} & \cos θ \frac{\partial λ}{\partial y} \\ \frac{\partial θ}{\partial x} & \frac{\partial θ}{\partial y} \end{array}) = \\ \{\begin{cases} r (\begin{array}{ll} \cos θ & 0 \\ - \sin θ \cos θ \tan λ_{p} & \cos λ_{p} \cos^{2} θ + \frac{\sin^{2} θ}{\cos λ_{p}} \end{array}), p \in {1, 2, 3, 4} \\ r (\begin{array}{ll} \cos λ \sin θ Γ_{1} & \sin λ \sin θ Γ_{2} \\ - \sin λ \sin^{2} θ Γ_{1} & \cos λ \sin^{2} θ Γ_{2} \end{array}), p = 5 \\ r (\begin{array}{ll} - \cos λ \sin θ Γ_{1} & \sin λ \sin θ Γ_{2} \\ \sin λ \sin^{2} θ Γ_{1} & \cos λ \sin^{2} θ Γ_{2} \end{array}), p = 6 \end{cases} \end{aligned} \\ (11) & λ_{p} = λ - \frac{π}{2} (p - 1), Γ_{1} = 1 + \frac{\sin^{2} λ}{\tan^{2} θ}, Γ_{2} = 1 + \frac{\cos^{2} λ}{\tan^{2} θ} \end{matrix}

where λ, θ are longitude and latitude. The relation between J and G_ij is

\begin{matrix} (12) & G_{i j} = J^{T} J . \end{matrix}

To discretize and solve the equation system, we first perform reconstruction on the prognostic variables to obtain their values at the cell interfaces. These reconstructed values are then used within a Riemann solver to compute the numerical fluxes. During the numerical experiments, we observed that reconstructing $\sqrt{G} ϕ$ directly leads to non-physical oscillations. This occurs because topography may induce discontinuities in the variable ϕ, while high-order reconstruction fundamentally requires smoothness of the field.

To address this, inspired by the approach mentioned by Ii and Xiao (2010), we instead reconstruct $\sqrt{G} ϕ_{t}$ , where $ϕ_{t} = ϕ + ϕ_{s}$ is total geopotential. The detailed formulation of this reconstruction method is presented in Sect. 3. Crucially, $\sqrt{G} ϕ_{t}$ is used exclusively during the reconstruction step. The prognostic variable remains $\sqrt{G} ϕ$ to ensure exact mass conservation.

The momentum equations need to be modified as follow

\begin{matrix} (13) & \{\begin{cases} \frac{\partial \sqrt{G} ϕ}{\partial t} + \frac{\partial \sqrt{G} ϕ u}{\partial x} + \frac{\partial \sqrt{G} ϕ v}{\partial y} = 0 \\ \frac{\partial \sqrt{G} ϕ u}{\partial t} + \frac{\partial \sqrt{G} (ϕ u u + \frac{1}{2} G^{11} ϕ_{t}^{2})}{\partial x} + \frac{\partial \sqrt{G} (ϕ u v + \frac{1}{2} G^{12} ϕ_{t}^{2})}{\partial y} = ψ_{M}^{1} + ψ_{C}^{1} + ψ_{B}^{1} \\ \frac{\partial \sqrt{G} ϕ v}{\partial t} + \frac{\partial \sqrt{G} (ϕ u v + \frac{1}{2} G^{21} ϕ_{t}^{2})}{\partial x} + \frac{\partial \sqrt{G} (ϕ v v + \frac{1}{2} G^{22} ϕ_{t}^{2})}{\partial y} = ψ_{M}^{2} + ψ_{C}^{2} + ψ_{B}^{2} \end{cases} \end{matrix}

and the bottom topography influence term is now expressed as

\begin{matrix} (14) & ψ_{B} = \sqrt{G} ϕ_{s} G^{i j} \frac{\partial ϕ_{t}}{\partial x^{j}} = \sqrt{G} ϕ_{s} (\begin{array}{l} G^{11} \frac{\partial ϕ_{t}}{\partial x} + G^{12} \frac{\partial ϕ_{t}}{\partial y} \\ G^{21} \frac{\partial ϕ_{t}}{\partial x} + G^{22} \frac{\partial ϕ_{t}}{\partial y} \end{array}) . \end{matrix}

The reconstruction variables are ( $\sqrt{G} ϕ_{t}$ , $\sqrt{G} ϕ u$ , $\sqrt{G} ϕ v$ ).

We write the governing equation set to vector form

\begin{matrix} (15) & \frac{\partial q}{\partial t} + \frac{\partial F (q)}{\partial x} + \frac{\partial G (q)}{\partial y} = S (q) \\ (16) & \begin{aligned} q = & [\begin{array}{l} \sqrt{G} ϕ \\ \sqrt{G} ϕ u \\ \sqrt{G} ϕ v \end{array}], F = [\begin{array}{l} \sqrt{G} ϕ u \\ \sqrt{G} (ϕ u u + \frac{1}{2} G^{11} ϕ_{t}^{2}) \\ \sqrt{G} (ϕ u v + \frac{1}{2} G^{21} ϕ_{t}^{2}) \end{array}], \\ G = [\begin{array}{l} \sqrt{G} ϕ v \\ \sqrt{G} (ϕ u v + \frac{1}{2} G^{12} ϕ_{t}^{2}) \\ \sqrt{G} (ϕ v v + \frac{1}{2} G^{22} ϕ_{t}^{2}) \end{array}] \\ S = [\begin{array}{l} 0 \\ ψ_{M}^{1} + ψ_{C}^{1} + ψ_{B}^{1} \\ ψ_{M}^{2} + ψ_{C}^{2} + ψ_{B}^{2} \end{array}] . \end{aligned} \end{matrix}

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f02

Figure 2(a) Adjacent area of panels 1, 4 and 5. (b) 5×5 reconstruction stencil nearby panel corner is represented by shade. The cell contains red dot is the target cell on panel 4; the magenta points are overlapped GQPs shared by panel 1 and panel 5; red solid lines are boundary of panel 4, red dash lines are extension line of panel 4 boundary line. A and C are points on dash line, B is the upper right corner point of panel 4.

Download

3 Numerical discretization

The finite volume method computes the temporal tendency of cell-averaged quantities by evaluating the net flux across cell interfaces. The interfacial flux is obtained through Gaussian quadrature, where the field values at quadrature points are reconstructed spatially and then processed by a Riemann solver to determine the flux magnitude.

In this section, we present two distinct spatial reconstruction approaches: (1) a two-dimensional tensor product polynomial (TPP) method, and (2) a two-dimensional weighted essentially non-oscillatory (WENO2D) scheme based on tensor product polynomials. Each reconstruction yields two potential values at every Gaussian quadrature point (GQP). These values are then resolved into a single flux value using the Low Mach number Approximate Riemann Solver (LMARS) (Chen et al., 2013) or AUSM+-up (Liou, 2006; Ullrich et al., 2010). Even with an approximate Riemann solver like LMARS, the scheme preserves high-order because it combines high-order reconstructions from both sides of the cell interface to determine the flux. Finally, the total flux across each cell edge is computed by applying linear Gaussian quadrature integration along the interface.

According to the finite volume scheme, average Eq. (15) on cell i, j, we have

\begin{array}{l} (17) & \frac{\partial {\overline{q}}_{i, j}}{\partial t} + \frac{{\overline{F}}_{i + \frac{1}{2}, j} - {\overline{F}}_{i - \frac{1}{2}, j}}{Δ x} + \frac{{\overline{G}}_{i, j + \frac{1}{2}} - {\overline{G}}_{i, j - \frac{1}{2}}}{Δ y} = {\overline{S}}_{i, j} \\ (18) & \frac{\partial {\overline{q}}_{i, j}}{\partial t} = \frac{1}{Δ x Δ y} \frac{\partial}{\partial t} \iint_{Ω_{i, j}} q d x d y, {\overline{S}}_{i, j} = \frac{1}{Δ x Δ y} \iint_{Ω_{i, j}} S d x d y \\ (19) & {\overline{F}}_{i - \frac{1}{2}, j} = \frac{1}{Δ y} \int_{e_{i - \frac{1}{2}}} F d y, {\overline{F}}_{i + \frac{1}{2}, j} = \frac{1}{Δ y} \int_{e_{i + \frac{1}{2}}} F d y \\ (20) & {\overline{G}}_{i, j - \frac{1}{2}} = \frac{1}{Δ x} \int_{e_{j - \frac{1}{2}}} G d x, {\overline{G}}_{i, j + \frac{1}{2}} = \frac{1}{Δ x} \int_{e_{j + \frac{1}{2}}} G d x \end{array}

where Ω_i,j represents the region overlapped by cell (i, j), e_i−½, e_i+½, e_j−½, e_j+½ are left, right, bottom, top edges of cell (i, j).

The physical interpretation of equation Eq. (17) is that the average tendency of prognostic field q within cell (i, j) is governed by the average net flux and average source. In this study, we calculate these averages using Gaussian quadrature, the function points within each cell are illustrated in Fig. 3, the EQPs are share by adjacent cells, and CQPs are exclusive for each cell.

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f03

Figure 3Function points on cell. Red points are edge quadrature points (EQP) or called flux points, green points are inner cell quadrature points (CQP).

Download

Average on edge by 1D scheme:

\begin{matrix} (21) & {\overline{F}}_{i + \frac{1}{2}, j} \approx \sum_{ξ = 1}^{m_{e}} w_{ξ} F_{ξ} = w \vec{F} \end{matrix}

where m_e is the number of quadrature points on each edge, $w = (w_{1}, w_{2}, \dots, w_{m_{e}})$ is the 1D Gaussian quadrature coefficient vector. $\vec{F} = (F_{1}, F_{2}, \dots, F_{m_{e}})^{T}$ is the vector of flux, the elements of $\vec{F}$ represent the flux on EQPs.

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f04

Figure 4Reconstruction coordinate and polynomial terms on stencils. (a) Local reconstruction coordinate (the red points denote cell centers); (b) 2nd degree polynomial stencil; (c) TPP3 stencil; (d) TPP5 stencil.

Download

Average in cell by 2D scheme:

\begin{matrix} (22) & {\overline{S}}_{i, j} \approx \sum_{ξ = 1}^{m_{c}} W_{ξ} S_{ξ} = W \vec{S} \end{matrix}

where m_c is the number of quadrature points on each cell, $W = (W_{1}, W_{2}, \dots, W_{m_{c}})$ is the 2D Gaussian quadrature coefficient matrix, $\vec{S} = (S_{1}, S_{2}, \dots, S_{m_{c}})^{T}$ is the vector of source term, the elements of S represent the source value on GQPs, superscript T stands for transpose matrix.

HOPE employs an equiangular cubed-sphere grid, where each panel undergoes uniform angular discretization into n_c×n_c cells. In the computational space (equiangular coordinates), each cell spans an angular interval of $\frac{π}{2 n_{c}}$ , therefore

\begin{matrix} (23) & Δ x = Δ y = \frac{π}{2 n_{c}} . \end{matrix}

This uniformity ensures that all cells are geometrically identical in the computational space, thereby avoiding the need for cell-specific treatment during reconstruction studies. In the following part of this section, we set a new computational space for reconstruction process. The local coordinate system ( $\hat{x}$ , $\hat{y}$ ) is established such that within each reconstruction stencil, the origin (0, 0) is located at the stencil center, the central cell spans [−0.5, 0.5] in both $\hat{x}$ and $\hat{y}$ directions, as shown in Fig. 4a. All of the cells have the same size in $\hat{x}$ , $\hat{y}$ directions:

\begin{matrix} (24) & Δ \hat{x} = Δ \hat{y} = 1 . \end{matrix}

On the cubed-sphere grid, a fixed reconstruction scheme yields consistent stencils across all cells. This structural homogeneity renders the reconstruction operation computationally equivalent to two-dimensional convolution, thereby enabling efficient GPU acceleration through PyTorch's built-in conv2d function.

3.1 Tensor product polynomial (TPP) reconstruction

HOPE employs genuinely two-dimensional reconstruction, simultaneously incorporating information in both spatial dimensions to minimize dimensional splitting errors. For computational efficiency, reconstruction algorithms using square stencils are computationally equivalent to convolution operations. This equivalence allows efficient implementation via PyTorch's conv2d function for acceleration.

To construct genuinely 2D reconstructions, the functional form of the reconstruction basis must be selected. A bivariate polynomial of degree d contains $\frac{(d + 1) (d + 2)}{2}$ terms. As illustrated in Fig. 4b, the 6 terms of a bivariate quadratic polynomial (d=2) are insufficient to cover a square stencil. To address this, we adopt Tensor Product Polynomials (TPP) as basis functions. We denote a TPP function containing n×n terms as TPPn. Determining the coefficients of TPPn requires information from a n×n block of cells. When using a TPP reconstruction stencil of size n×n, HOPE achieves fifth-order accuracy when simulating smooth flow fields. We therefore designate a TPP reconstruction stencil of size n×n as an nth order TPP stencil, the 3rd and 5th order TPP stencils are shown in Fig. 4c and d.

A TPPn polynomial is expressed as

\begin{matrix} (25) & p (\hat{x}, \hat{y}) = \sum_{\hat{j} = 1}^{n} \sum_{\hat{i} = 1}^{n} a_{k} {\hat{x}}^{\hat{i} - 1} {\hat{y}}^{\hat{j} - 1} = \sum_{k = 1}^{N} a_{k} c_{k} (\hat{x}, \hat{y}) \end{matrix}

where n is width of stencil. a_k is the coefficient of each term, the term index $k = \hat{i} + n (\hat{j} - 1)$ , and $c_{k} (\hat{x}, \hat{y}) = {\hat{x}}^{α} {\hat{y}}^{β}$ , $α = k - int (\frac{k - 1}{n}) n - 1$ , $β = int (\frac{k - 1}{n})$ , “int” is equivalent to Fortran's intrinsic function int( ) that truncates to integer values. N=n² is the cell number in stencil and also the term number of the TPP. We define column vectors $c (\hat{x}, \hat{y}) = {c_{k} (\hat{x}, \hat{y}) | k = 1, 2, 3, \dots, N}$ and $a = {a_{k} | k = 1, 2, 3, \dots, N}$ , the point value on ( $\hat{x}$ , $\hat{y}$ ) can be written as

\begin{matrix} (26) & p (\hat{x}, \hat{y}) = c (\hat{x}, \hat{y}) \cdot a . \end{matrix}

The volume integration average (VIA) of prognostic field q on cell Ω_i,j is represented by

\begin{matrix} (27) & {\overline{q}}_{i, j} = \frac{1}{Δ {\hat{x}}_{i, j} Δ {\hat{y}}_{i, j}} \int \int_{Ω_{i, j}} p (\hat{x}, \hat{y}) d \hat{x} d \hat{y} . \end{matrix}

$Δ {\hat{x}}_{i, j}$ , $Δ {\hat{y}}_{i, j}$ are length of edges $\hat{x}$ , $\hat{y}$ of cell Ω_i,j in computational space. The VIA value ${\overline{q}}_{i}$ on each cell is predicted by time integration, we wish to determine the coefficient vector a by these VIA values. HOPE employs an equiangular cubed-sphere grid, wherein each cell in computational space can be considered a perfectly identical square, according to Eq. (24), we may assume without loss of generality that $Δ {\hat{x}}_{i, j} = Δ {\hat{y}}_{i, j} = 1$ , and Eq. (27) becomes

\begin{matrix} (28) & {\overline{q}}_{i, j} = \int \int_{Ω_{i, j}} p (\hat{x}, \hat{y}) d \hat{x} d \hat{y} = \int \int_{Ω_{i, j}} c \cdot a d \hat{x} d \hat{y} = ψ_{i, j} \cdot a \end{matrix}

where

ψ_{i, j} = \int \int_{Ω_{i, j}} c d \hat{x} d \hat{y} = (\begin{array}{l} \int \int_{Ω_{i, j}} c_{1} d \hat{x} d \hat{y} \\ \int \int_{Ω_{i, j}} c_{2} d \hat{x} d \hat{y} \\ ⋮ \\ \int \int_{Ω_{i, j}} c_{N} d \hat{x} d \hat{y} \end{array}),

combining N cells. We have following linear system

\begin{array}{l} (29) & A a = \overline{q} \\ (30) & A = (\begin{array}{l} ψ_{1}^{T} \\ ψ_{2}^{T} \\ ⋮ \\ ψ_{N}^{T} \end{array}), \overline{q} = (\begin{array}{l} {\overline{q}}_{1} \\ {\overline{q}}_{2} \\ ⋮ \\ {\overline{q}}_{N} \end{array}) \end{array}

and polynomial coefficient a can be obtain by solving Eq. (29).

\begin{matrix} (31) & a = A^{- 1} \overline{q} \end{matrix}

The reconstruction values on M points can be obtained by following formula

\begin{matrix} (32) & p = C a = {CA}^{- 1} \overline{q} = R \overline{q} \end{matrix}

where $p = (\begin{array}{l} p ({\hat{x}}_{1}, {\hat{y}}_{1}) \\ p ({\hat{x}}_{2}, {\hat{y}}_{2}) \\ ⋮ \\ p ({\hat{x}}_{M}, {\hat{y}}_{M}) \end{array}), C = (\begin{array}{l} c_{1}^{T} \\ c_{2}^{T} \\ ⋮ \\ c_{M}^{T} \end{array})$ , $c_{m}^{T} = c^{T} ({\hat{x}}_{m}, {\hat{y}}_{m})$ , $m = 1, 2, \dots, M$ , superscript T stands for transpose matrix, ( ${\hat{x}}_{m}$ , ${\hat{y}}_{m}$ ) represents the mth function point on target cell. The reconstruction matrix

\begin{matrix} (33) & R = {CA}^{- 1} . \end{matrix}

In practical implementation, the reconstruction matrix R needs to be computed only once during model initialization and stored in memory. Crucially, a fundamental advantage of our cubed-sphere grid dynamical core implementation lies in employing a globally shared reconstruction matrix R. This unification signifies that a single instance of R applies identically to all grid cells, thereby significantly reducing memory/VRAM requirements, and enabling straightforward utilization of PyTorch's conv2d for accelerated reconstruction. For example, the TPP reconstruction procedure can be directly formulated as a two-dimensional convolutional operation using R as the convolution kernel.

3.2 Genuine two-dimensional WENO

Weighted Essentially Non-Oscillatory (WENO) represents an adaptive algorithm that dynamically preserves high-order approximation accuracy in smooth flow regions while automatically degenerating to robust low-order reconstruction near discontinuities for effective shock capturing. Shi et al. (2002) mentioned two different approaches for constructing a fifth-order finite volume WENO scheme: the “Genuine 2D” method and the “Dimension by Dimension” method.

For HOPE, within the Genuine 2D framework, nth order accuracy WENO scheme employs a n×n master stencil partitioned into $\frac{(n + 1)^{2}}{4}$ distinct $\frac{(n + 1)}{2} \times \frac{(n + 1)}{2}$ sub-stencils, for example:

a.
WENO3: third-order reconstruction utilizes a 3×3 cell stencil that decomposes into four 2×2 sub-stencils.
b.
WENO5: fifth-order accuracy employs a 5×5 master stencil partitioned into nine distinct 3×3 sub-stencils (complete schematic representations of these decomposition strategies are provided in Figs. 5 and 6).

The scheme's theoretical order of accuracy fundamentally depends on the proper determination of optimal linear weights for the multidimensional stencil combination. These weights, when correctly derived, enable the weighted superposition of sub-stencils to recover full high-order accuracy in smooth solution regions. While Shi et al. (2002) indicated the theoretical possibility of computing these weights through Lagrange interpolation basis analysis, they omitted specific implementation details. In this section, we present the methods for constructing genuine two-dimensional WENO (WENO 2D) schemes using least squares method.

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f05

Figure 5Stencils of 3rd order WENO 2D. The high order stencil contains cells No. 1–9, blue ones represent the cells in sub-stencils (1)–(4).

Download

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f06

Figure 6Stencils of 5th order WENO 2D. The high order stencil contains cells No. 1–25, blue ones represent the cells in sub-stencils (1)–(9).

Download

We construct WENO 2D based on TPP and square stencil. As mentioned in previous section, a nth order stencil contains N=n² cells, and the full stencil (also called high-order stencil) width is h=n. Decomposing the high-order stencil into $s = {(\frac{n + 1}{2})}^{2}$ sub-stencils, there are s_c=s cells in each sub-stencil (also called low-order stencil), and the sub-stencil width is $l = \frac{n + 1}{2}$ . We define p_H as the high- order reconstruction polynomial, and p_i represents ith sub-stencil reconstruction polynomial, they share the same expression as Eq. (25) with different stencil width and coefficient a. For the reconstruction point ( $\hat{x}$ , $\hat{y}$ ), suppose $p_{H} (\hat{x}, \hat{y})$ is the reconstruction value of high-order stencil, the reconstruction values of sub-stencils are stored in vector $p = (p_{1} (\hat{x}, \hat{y}), p_{2} (\hat{x}, \hat{y}), \dots, p_{s} (\hat{x}, \hat{y}))^{T}$ . The intention of constructing the optimal linear weights is to determine the unique weights $γ = (γ_{1}, γ_{2}, \dots, γ_{s})$ , such that

\begin{matrix} (34) & p_{H} = R_{H} \overline{q} = γ \cdot p \end{matrix}

where the elements of vector $\overline{q} = (q_{1}, q_{2}, \dots, q_{N})^{T}$ represent VIA of each cell in high-order stencil. $R_{H} = (r_{H_{j}})$ , $j = 1, 2, \dots, N$ is the reconstruction matrix of high-order stencil.

It should be noted that Eq. (34) appears overdetermined at first glance. However, subsequent analysis demonstrates that the solution obtained via the least squares method satisfies Eq. (34) exactly. Specifically, in the case of square stencils, the rank of the system defined by Eq. (34) becomes s, resulting in a unique solution for the linear system. This finding aligns with observations presented in Hu and Shu (1999) regarding their research on Triangular Meshes.

The computation of γ requires the integration of both high-order and low-order reconstruction matrices into a unified linear system. For each sub-stencil i we define the reconstruction matrix R_i=(r_ik), $k = 1, 2, \dots, s_{c}$ (computed via Eq. 33). and $R_{L_{i}} = (r_{L_{i j}})$ , $j = 1, 2, \dots, N$ is the extension matrix of R_i. The matrix relationship is expressed as

\begin{matrix} (35) & {(R_{i})}_{1 \times s_{c}} (E)_{s_{c} \times N} = {(R_{L_{i}})}_{1 \times N} \end{matrix}

where the subscripts denote matrix dimensions. The correspondence matrix E=(e_ij), $i = 1, 2, \dots, s_{c}$ ; $j = 1, 2, \dots, N$ encodes the cell relationships between stencils: when the ith cell in low-order stencil is the same as the jth cell in high order stencil, e_ij=1, otherwise, e_ij=0.

Substitute Eq. (32) into Eq. (34), yield

\begin{matrix} (36) & R_{H} \overline{q} = \sum_{i = 1}^{s} R_{L_{i}} γ_{i} \overline{q} . \end{matrix}

We set $R_{L} = (R_{L_{1}}, R_{L_{2}}, \dots, R_{L_{s}})^{T}$ , Eq. (36) becomes

\begin{matrix} (37) & R_{L} γ = R_{H} . \end{matrix}

The unknown optimal weights vector γ can be determined by following least square procedure

\begin{matrix} (38) & γ = {(R_{L}^{T} R_{L})}^{- 1} R_{L}^{T} R_{H} . \end{matrix}

However, the elements of γ could be negative, which would cause unstable. A split technique mentioned by Shi et al. (2002) was adopted to solve this problem. The optimal weights can be split into two parts:

\begin{matrix} (39) & {\tilde{γ}}^{+} = \frac{θ |γ| + γ}{2}, {\tilde{γ}}^{-} = γ^{+} - γ \end{matrix}

where the constant θ=3. For keeping the sum of weights to 1, ${\tilde{γ}}^{\pm}$ and new value of γ^± can be rescaled as:

\begin{matrix} (40) & σ^{\pm} = \sum_{i = 1}^{s} {\tilde{γ}}_{i}^{\pm} \end{matrix}

and

\begin{matrix} (41) & γ_{i}^{\pm} = \frac{{\tilde{γ}}_{i}^{\pm}}{σ^{\pm}} i = 1, 2, \dots, s \end{matrix}

where ${\tilde{γ}}_{i}^{\pm}$ is the ith element of ${\tilde{γ}}^{\pm}$ , $γ_{i}^{\pm}$ is the ith element of γ^±.

The WENO scheme adaptively assigns nonlinear weights ω_i, ( $i = 1, 2, \dots, s$ ) to each candidate stencil to suppress unphysical oscillations during high-order reconstruction. Essentially, it gives greater weight to stencils identified as smooth over the local cell, while suppressing the influence of those containing discontinuities by assigning them smaller weights. Several nonlinear weighting schemes have been developed to meet these criteria, including WENO-JS (Jiang and Shu, 1996), WENO-Z (Borges et al., 2008), WENO-Z+ (Acker et al., 2016), WENO-Z+M (Luo and Wu, 2021), among others.

In this work, we employ the WENO-Z formulation as our baseline scheme. While most existing WENO schemes were originally developed for one-dimensional problems, we propose a two-dimensional extension of WENO-Z through modification of τ, a crucial coefficient that governs the scheme's higher-order accuracy properties.

For stencil i the nonlinear weight is given as

\begin{array}{l} (42) & ω_{i}^{\pm} = \frac{α_{i}^{\pm}}{\sum_{i = 1}^{s} α_{i}^{\pm}} \\ (43) & α_{i}^{\pm} = γ_{i}^{\pm} [1 + {(\frac{τ}{β_{i} + ε})}^{2}] \\ (44) & τ = \frac{2}{(s - 1) s} \sum_{η = 1}^{s - 1} \sum_{ψ = η + 1}^{s} |β_{ψ} - β_{η}| \end{array}

where $ε = 10^{- 14}$ is introduced to prevent division by zero. The smooth indicators β_i quantify the smoothness of reconstruction functions within the target cell. We employ a formulation analogous to that described in Zhu and Shu (2019).

As mentioned in Eq. (24), all cells participating in reconstruction within HOPE's computational space can be treated as identical unit squares with $Δ \hat{x} = Δ \hat{y} = 1$ . Thus, the smooth indicator for sub-stencil i is expressed as:

\begin{matrix} (45) & β_{i} = \sum_{ζ = 1}^{l} \int \int_{Ω} {[\frac{\partial^{ζ}}{\partial {\hat{x}}^{ζ_{1}} \partial {\hat{y}}^{ζ_{2}}} p_{i} (\hat{x}, \hat{y})]}^{2} d \hat{x} d \hat{y} \end{matrix}

where $ζ_{1} + ζ_{2} = ζ$ and ζ>0, ζ₁, $ζ_{2} \in [0, n]$ , and l is the sub-stencil width.

The reconstruction value on point ( $\hat{x}$ , $\hat{y}$ ) is expressed by:

\begin{matrix} (46) & q (\hat{x}, \hat{y}) = \sum_{i = 1}^{s} (σ^{+} ω_{i}^{+} - σ^{-} ω_{i}^{-}) p_{i} (\hat{x}, \hat{y}) . \end{matrix}

3.3 Treatment of the panel boundaries

The cubed sphere grid comprises 12 panel boundaries, and the flux across the interface between any two panels must be computed at the quadrature points situated on the edges of the boundary cells, as depicted in Fig. 7a. However, a challenge arises because the coordinates across these panel boundaries are discontinuous. Given that the TPP reconstruction necessitates a square stencil, the values of the cells outside the domain (referred to as ghost cells) must be computed through interpolation within the adjacent panel, as illustrated in Fig. 7b. While Ullrich et al. (2010) proposed a one-sided interpolation scheme, our testing with the HOPE model revealed that using a similar one-sided ghost cell interpolation approach around panel boundaries resulted in instability when scheme exceeded 7th order of accuracy. To address this limitation, we redesigned the ghost cell interpolation scheme to incorporate information from both panels adjacent to the boundary. This modified approach ensures stable integration even for very high-order schemes, as validated in tests up to 13th-order accuracy.

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f07

Figure 7Points and cells close to panel boundary. (a) Flux points (red points) on the interface between Panel 1 and Panel 4, the flux across each panel at these points are determined by Riemann solver, which merges the reconstruction outcomes from both panels into a single flux value; (b) ghost cells (shaded cells) out of Panel 4 boundary, with green points representing the GQPs in these cells; (c) cells requirement for 5th order ghost cell interpolation stencil, red points represent the GQPs located in the ghost cell on Panel 4, the blue shaded region represents the TPP reconstruction stencil (on Panel 1) to interpolate these red GQPs.

Download

3.3.1 Ghost cell interpolation

To achieve arbitrary high-order accuracy, we propose a ghost cell interpolation scheme that incorporates information from both sides of the panel boundary. Since the ghost cell values are inherently unknown prior to interpolation, our approach involves an initial estimation through an iterative process. Specifically, the method iteratively performs ghost cell interpolation until the increments of the cell values converge to within a specified tolerance.

Through mathematical analysis (detailed in the Appendix), we demonstrate that this iterative process can be expressed as a linear mapping, thereby eliminating the need for actual iterations. However, direct computation of the mapping matrix requires inversion of a large matrix, which poses significant computational and memory challenges. To address this, we implement the iterative interpolation process using PyTorch and leverage its automatic differentiation capability to efficiently obtain the interpolation matrix.

The complete methodology, as derived in the Appendix, proceeds as follows:

Initialization: all ghost cell values are initialized to zero (denoted as g⁽⁰⁾=0, where the superscript indicates the iteration number).
Interpolation: the Gaussian quadrature points (GQPs) in the ghost cells are interpolated using the Tensor Product Polynomial (TPP) stencil. For instance, considering two adjacent panels (Fig. 7a), any out-domain cell in Panel 4 (shaded cell in Fig. 7b) contains GQPs that physically reside in Panel 1. These GQPs are interpolated using the TPP stencil shown in Fig. 7c, which incorporates relevant ghost cells from Panel 1.
Update and convergence check: after interpolating all GQPs, the ghost cell values are updated via Gaussian quadrature (Eq. 22), yielding g⁽¹⁾. The L₂-norm residual $r^{(k)} = {∥g^{(k + 1)} - g^{(k)}∥}_{2}$ is then computed. Steps 2–3 repeat until r^(k)<ϵ, where $ϵ = 1 \times 10^{- 14}$ for double precision and $ϵ = 1 \times 10^{- 5}$ for single precision. In practice, convergence typically occurs within 10 iterations, so we fix the iteration count at 10 for consistency.

This process establishes a linear mapping $G : q \to g$ from known cell values to ghost cell values. As proven in Eq (A12), the mapping's linearity implies that $G = \frac{\partial g}{\partial q}$ forms a matrix, which we efficiently compute using PyTorch's autograd functionality. This approach avoids explicit matrix inversion while maintaining numerical precision.

It is important to note that overlapping GQPs occur at the corner positions of the cubed-sphere grid, as illustrated by the magenta points in Fig. 2b. These points lie on the interface shared by adjacent panels (e.g., Panel 1 and Panel 5). Consequently, during ghost value interpolation, two distinct interpolated values are obtained at these overlapping points – one from each adjoining panel. The final interpolated value is computed as the average of these two values. Since the interpolation performed on each individual panel is high-order, the approximation order is preserved when taking this average.

𝒢 is a sparse matrix containing many zero entries. To avoid unnecessary memory costs, we adopt the Compressed Sparse Row (CSR) format for storing 𝒢. Furthermore, the size of 𝒢 is extremely large-making direct application of “torch.autograd.functional.jacobian” to generate 𝒢 computationally infeasible. Our implementation for generating ghost cell interpolation matrix achieves significant acceleration and substantially reduces VRAM demand compared to PyTorch's native “torch.autograd.functional.jacobian” function. The key optimizations are:

Parallel Multi-Row Computation: utilizing “torch.vmap” to encapsulate “torch.func.vjp”, enabling simultaneous computation of multiple matrix rows.
CSR Compression and Incremental Disk Storage:
- a.
  Employing Compressed Sparse Row (CSR) format for matrix representation.
- b.
  Implementing incremental disk storage, where computed data batches are immediately written to disk after processing, avoiding prolonged VRAM retention.
Tunable Batch Processing: adjusting the number of rows processed per iteration maximizes GPU utilization while respecting VRAM constraints (e.g., 24 GB on NVIDIA RTX 3090).

It should be note that the model grid does not change during simulation, the ghost interpolation matrix 𝒢 needs to be calculated only once in initialization progress.

3.3.2 Fields conversion between panels

Due to the differing coordinate systems across panels, field variables must be appropriately transformed when transferring information between adjacent panels. To illustrate this process, we consider the interface between Panel 1 and Panel 4, as depicted in Figs. 2a and 7a. Although flux points are shared between the two panels, their coordinate representations are discontinuous across the interface.

To ensure consistency, two key transformations are required:

Metric reset for mass variables: the mass-related prognostic quantities must be recomputed in the target panel's coordinate system to maintain metric consistency.
Wind vector transformation: velocity components (or other vector quantities) must be converted from the source panel's local coordinate frame to that of the target panel.

This coordinate conversion ensures proper continuity and physical consistency when interpolating or exchanging data across panel boundaries.

Suppose $q_{1} = [(\sqrt{G} ϕ)_{1}, (\sqrt{G} ϕ u)_{1}, (\sqrt{G} ϕ v)_{1}]^{T}$ and $q_{4} = [(\sqrt{G} ϕ)_{4}, (\sqrt{G} ϕ u)_{4}, (\sqrt{G} ϕ v)_{4}]^{T}$ represent the fields on panel 1 and 4. The mass field conversion from panel 4 to panel 1 is expressed by

\begin{matrix} (47) & (\sqrt{G} ϕ)_{4}^{1} = \frac{{\sqrt{G}}_{4}}{{\sqrt{G}}_{1}} (\sqrt{G} ϕ)_{1} \end{matrix}

the subscript represents the target panel and the superscript stands for source panel.

The transformation of momentum vectors between panels is performed in two sequential steps to maintain proper tensor consistency. The contravariant momentum components in Panel 1 are first projected onto the global spherical coordinate system using the transformation matrix J, as defined in Eq. (10). The resulting spherical momentum components are then transformed into the contravariant representation specific to Panel 4, ensuring compatibility with the target panel's local coordinate system.

\begin{array}{l} (48) & [\begin{array}{c} {(\sqrt{G} ϕ u_{s})}_{1} \\ {(\sqrt{G} ϕ v_{s})}_{1} \end{array}] = J_{1} [\begin{array}{c} (\sqrt{G} ϕ u)_{1} \\ (\sqrt{G} ϕ v)_{1} \end{array}] \\ (49) & [\begin{array}{c} (\sqrt{G} ϕ u)_{4} \\ (\sqrt{G} ϕ v)_{4} \end{array}] = J_{4}^{- 1} \frac{{\sqrt{G}}_{4}}{{\sqrt{G}}_{1}} [\begin{array}{c} {(\sqrt{G} ϕ u_{s})}_{1} \\ {(\sqrt{G} ϕ v_{s})}_{1} \end{array}] \end{array}

where J₁ is the J matrix on panel 1, $J_{4}^{- 1}$ is the inverse matrix of J on panel 4. (u_s, v_s) are zonal wind and meridional wind, (u, v) are contravariant wind components. Since the vector conversion is linear process, Eqs. (48) and (49) can be merged into following equation

\begin{matrix} (50) & [\begin{array}{c} (\sqrt{G} ϕ u)_{4} \\ (\sqrt{G} ϕ v)_{4} \end{array}] = C_{1, 4} [\begin{array}{c} (\sqrt{G} ϕ u)_{1} \\ (\sqrt{G} ϕ v)_{1} \end{array}] \end{matrix}

where matrix $C_{1, 4} = \frac{{\sqrt{G}}_{4}}{{\sqrt{G}}_{1}} J_{4}^{- 1} J_{1}$ , the subscript stands for converting vector form panel 1 to panel 4.

The mass and vector are also need to be converted on GQPs in the same manner.

3.4 Riemann solver

Following spatial reconstruction, discontinuous solutions arise on either side of each flux point location. Since the majority of atmospheric flow speeds correspond to small Mach numbers, we adopt the Low Mach number Approximate Riemann Solver (Chen et al., 2013) and AUSM⁺-up (Liou, 2006; Ullrich et al., 2010) as Riemann solvers to determine the flux at the edge quadrature points (EQPs).

3.4.1 Low Mach number approximate Riemann solver (LMARS)

Spatial reconstruction gives the left and right state vector,

\begin{matrix} (51) & q_{L} = [\begin{array}{l} (\sqrt{G} ϕ)_{L} \\ (\sqrt{G} ϕ u)_{L} \\ (\sqrt{G} ϕ v)_{L} \end{array}], q_{R} = [\begin{array}{l} (\sqrt{G} ϕ)_{R} \\ (\sqrt{G} ϕ u)_{R} \\ (\sqrt{G} ϕ v)_{R} \end{array}] . \end{matrix}

First of all, we convert the contravariant wind u to physical speed u^⊥ that is perpendicular to the cell edge.

\begin{matrix} (52) & u^{⊥} = \frac{u}{{\sqrt{G}}^{i i}}, i = \{\begin{array}{cc} 1, & x direction \\ 2, & y direction \end{array} \end{matrix}

For example, in x direction, $u^{⊥} = \frac{u}{{\sqrt{G}}^{11}}$ , and there is no summation over i in Eq. (52).

The wind speed m^* and geopotential ϕ are calculated by

\begin{array}{l} (53) & m^{*} = \frac{1}{2} (u_{L}^{⊥} + u_{R}^{⊥} - \frac{ϕ_{R} - ϕ_{L}}{c}) \\ (54) & ϕ = \frac{1}{2} [ϕ_{L} + ϕ_{R} - c (u_{R}^{⊥} - u_{L}^{⊥})] \\ (55) & c = \frac{c_{L} + c_{R}}{2} \\ (56) & c_{L} = {\sqrt{ϕ}}_{L}, c_{R} = {\sqrt{ϕ}}_{R} . \end{array}

c is the phase speed of external gravity wave, the subscript “L”, “R” represent the left and right side of cell edge.

Once m^* is determined, we convert it back to contravariant speed by

\begin{matrix} (57) & m = m^{*} {\sqrt{G}}^{i i} . \end{matrix}

We define the pressure-driven flux as

\begin{matrix} (58) & P = \frac{1}{2} \sqrt{G} ϕ_{t}^{2} . \end{matrix}

The flux across the cell edge is then given by

\begin{array}{l} (59) & F = \frac{1}{2} m [(q_{L} + q_{R}) - sign (m) (q_{R} - q_{L})] + P \\ (60) & P = (\begin{array}{l} 0 \\ G^{1 i} P \\ G^{2 i} P \end{array}), i = \{\begin{array}{cc} 1, & x direction \\ 2, & y direction \end{array} . \end{array}

For calculation of G (the flux in y direction) the method is similar.

3.4.2 Advection upstream splitting method for all speeds (AUSM⁺-up)

The differences between AUSM⁺-up and LMARS lie in the method of determining the wind speed m^* and pressure-driven flux P. In AUSM⁺-up

\begin{matrix} (61) & m^{*} = c M \end{matrix}

where c denotes the gravity phase speed defined in Eq. (55). Mach number M is expressed as

\begin{matrix} (62) & \begin{aligned} M = & M_{(4)}^{+} (M_{L}) + M_{(4)}^{-} (M_{R}) - K_{p} \max (1 - σ {\overline{M}}^{2}, 0) \\ \frac{P_{R} - P_{L}}{c^{2} ϕ} \end{aligned} \end{matrix}

where $M_{L} = \frac{u_{L}^{⊥}}{c}$ , $M_{R} = \frac{u_{R}^{⊥}}{c}$ , ${\overline{M}}^{2} = \frac{(u_{L}^{⊥})^{2} + (u_{R}^{⊥})^{2}}{2 c^{2}}$ , and

\begin{array}{l} (63) & M_{(4)}^{\pm} (M) = \{\begin{cases} \frac{1}{2} (M \pm |M|), & |M| \geq 1 \\ M_{(2)}^{\pm} (M) [1 \mp 16 β M_{(2)}^{\mp} (M)], & |M| < 1 \end{cases} \\ (64) & M_{(2)}^{\pm} (M) = \pm \frac{1}{4} (M \pm 1)^{2} . \end{array}

The pressure-driven flux is expressed as

\begin{matrix} (65) & \begin{aligned} P = & P_{(5)}^{+} (M_{L}) P_{L} + P_{(5)}^{-} (M_{R}) P_{R} + - 2 K_{u} P_{(5)}^{+} (M_{L}) \\ P_{(5)}^{-} (M_{R}) ϕ c (u_{R}^{⊥} - u_{L}^{⊥}) \end{aligned} \end{matrix}

where $P_{L} = \frac{1}{2} ϕ_{L}^{2}$ , $P_{R} = \frac{1}{2} ϕ_{R}^{2}$ , and

\begin{matrix} (66) & \begin{aligned} P_{(5)}^{\pm} = \\ \{\begin{cases} \frac{1}{2} (1 \pm sign (M)), & |M| \geq 1 \\ M_{(2)}^{\pm} (M) [(\pm 2 - M) \mp 16 α M M_{(2)}^{\mp} (M)], & |M| < 1 \end{cases} \end{aligned} \end{matrix}

The mathematical meaning of sign(M) (returning −1, 0, or 1 based on the sign of M) is standard. The coefficients take the values: σ=1, $α = \frac{3}{16}$ , $β = \frac{1}{8}$ , $K_{p} = \frac{1}{4}$ , $K_{u} = \frac{3}{4}$ .

Once m^* and P are computed, the flux across the cell edge can be calculated using Eqs. (57) to (60).

3.5 Temporal integration

We use the explicit Runge–Kutta (RK) as time marching scheme, Wicker and Skamarock (2002) described a 3rd order RK with three stages (achieves third-order accuracy exclusively when applied to linear problems). For the prognostic fields q, the integration step from time slot n to n+1:

\begin{array}{l} (67) & q^{*} = q^{n} + \frac{Δ t}{3} (\frac{\partial q^{n}}{\partial t}) \\ (68) & q^{* *} = q^{*} + \frac{Δ t}{2} (\frac{\partial q^{*}}{\partial t}) \\ (69) & q^{n + 1} = q^{n} + Δ t (\frac{\partial q^{* *}}{\partial t}) \end{array}

where Δt is the time step, and temporal tendency terms $\frac{\partial q}{\partial t}$ can be obtain by Eqs. (15) and (16). In our numerical experiments, the choice of different time marching schemes influenced only the integration stability; it had no significant impact on the simulation norm errors, non-oscillatory property, or conservation property.

4 High performance implementation and automatic differentiation

The spatial operator and temporal integration of HOPE can be easily implemented using PyTorch. Specifically, the spatial reconstruction given by Eq. (32) is implemented as a convolution operation, while the Gaussian quadrature can be efficiently expressed as a matrix-vector multiplication. Leveraging PyTorch's highly optimized built-in functions for both convolution and quadrature operations ensures superior performance on GPUs.

Furthermore, PyTorch's role as a versatile AI development platform provides automatic differentiation capabilities across the entire computation graph. This streamlines implementation and enables efficient gradient computation for all components.

For the reconstruction implementation. Suppose the cubed sphere grid comprises n_c cells in x-direction within each panel, including ghost cells. The panel number is n_p, and the shallow water equation involves n_v prognostic variables per cell, we write the cell state tensor q with the shape (n_v, n_p, 1, n_c, n_c). The TPP reconstruction weight tensor R has shape (n_poc, 1, s_w, s_w), where n_poc is the number of points required to be interpolated within each cell (including EQP and CQP), s_w denotes the stencil width (same as the stencil width represented by n in Sec. 3.1). The reconstruction can be executed by a simple command (pseudo-code):

\begin{matrix} (70) & \begin{aligned} q_{rec} = & torch . nn . functional . conv 2 d (q \cdot view (n_{v} n_{p}, 1, n_{c}, n_{c}), \\ R) view (n_{v}, n_{p}, n_{poc}, n_{c}, n_{c}) \end{aligned} \end{matrix}

where the shape of q_rec is (n_v, n_p, n_poc, n_c, n_c).

We exclusively demonstrate the flux computation procedure at cell edges as an illustrative example, where Gaussian quadrature is employed to obtain edge-averaged fluxes. The analogous procedure applies to source term integration at CQPs. The edge state tensor q_e, corresponding to the EQPs along each cell edge, is subsequently expressed as:

\begin{matrix} (71) & q_{e} = q_{rec} (\dots, pes : pee, :, :) \end{matrix}

where subscript “e” represents edges on cell including L(left), R(right), B(bottom), T(top). “pes, pee” are start and end point indices on edge e. The shape of q_e (including q_L, q_R, q_B, q_T) is (n_v, n_p, n_poe, n_c, n_c). n_poe denotes the number of edge quadrature points (EQPs). This value is computed as $n_{poe} = pee - pes$ in PyTorch implementations, whereas in Fortran it is calculated as $n_{poe} = pee - pes + 1$ , reflecting the difference in array indexing conventions between the two languages.

After spatial reconstruction, the resulting data is utilized in the Riemann solver for EQPs and for source term computation on CQPs. Subsequently, integration is performed on both EQPs and CQPs to calculate the net flux and the cell-averaged source term tendency. The cell-edge flux tensor F with dimensions (n_v, n_p, n_poe, n_c, n_c) is obtained after the Riemann solver.

There is a dimensionality mismatch between the flux tensor and weight tensor during using matrix multiplication. For the Gaussian quadrature implementation, consider an edge Gaussian quadrature weight tensor g_e with shape (n_poe), if an edge flux tensor $\tilde{F}$ has shape (n_v, n_p, n_c, n_c, n_poe), the Gaussian quadrature can be expressed by:

\begin{matrix} (72) & F_{g} = torch . matmul (\tilde{F}, g_{e}) \end{matrix}

where the shape of $F_{g} (n_{v}, n_{p}, n_{c}, n_{c})$ is the average flux on edge. In this operation, n_poe must occupy the last dimension of $\tilde{F}$ , to permit “torch.matmul” execution. We note, however, that in the flux tensor F obtained from the Riemann solver, n_poe corresponds to the third dimension, direct matrix multiplication is therefore not feasible.

To address this dimensionality issue, two methods are available. The first method involves rearranging the n_poc dimension to the last position using the “torch.tensor.permute” operation in PyTorch, this allows Gaussian integrations to be naturally implemented through the “torch.matmul” operation. The second method, which avoids the need for the “permute” operation, maintains the original dimension sequence. Instead, Gaussian integrations are performed using the “torch.einsum” function. This function sums the product of the elements of the input arrays along dimensions specified using a notation based on the Einstein summation convention.

\begin{matrix} (73) & F_{g} = torch . einsum (^{'} v n p i j, p \to v n i j^{'}, F, g_{e}) \end{matrix}

We have conducted performance tests comparing the two methods, and the results indicate that the second method is approximately 5% faster than the first. Specifically, the first method took 649 seconds, while the second method took 616 s. The test was set as a one-day steady state geostrophic flow (with details provided in Sect. 5.2) simulation at a resolution of C900 ( $Δ x = Δ y = 0.1$ °), using 3rd order accuracy reconstruction stencil. The time step was 30 s, and the default data type used was “torch.float32” (single precision).

The Riemann solver implementation on flux points is way easier, only needs to call “torch.sign” for Eq. (59), while all other operations can be executed using basic arithmetic: addition, subtraction, multiplication, and division. During a Runge–Kutta sub-step, there are no dependencies, and neither “for” loops nor “if” statements are required in the HOPE kernel code. This algorithm fully leverages the capabilities of the GPU.

5 Numerical experiments

The HOPE dynamical core is evaluated using the standard test cases (Test 1, 2, 5, and 6) for the spherical shallow water model as described in Williamson et al. (1992), along with the perturbed jet flow case proposed by Galewsky et al. (2004). Additionally, a dam-break experiment is designed to demonstrate the HOPE model's capability in capturing shock waves.

In our experiments, the grid resolutions are denoted by the count of cells along one dimension on each panel of the cubed sphere; for instance, C90 signifies that each panel is subdivided into a 90×90 grid, corresponding to a grid interval of $Δ x = Δ y = 1$ °.

We measure the conservation errors by defining the normalized error ϵ_r of the variable η as $ϵ_{r} = \frac{I_{g} (η^{n}) - I_{g} (η^{0})}{I_{g} (η^{0})}$ , where η⁰ and ηⁿ stand for η value at initial time and time slot n, respectively. The global integral is defined as:

\begin{matrix} (74) & I (η) = \sum_{p = 1}^{n_{p}} \sum_{j = 1}^{n_{c}} \sum_{i = 1}^{n_{c}} {\sqrt{G}}_{i, j, p} η_{i, j, p} \end{matrix}

where $η_{i, j, p}$ represents the average value of η in cell (i, j, p).

We use three kinds of norm errors to measure the simulation errors,

\begin{array}{l} (75) & L_{1} = \frac{I [ϕ (i, j, p) - ϕ_{ref} (i, j, p)]}{I [ϕ_{ref} (i, j, p)]} \\ (76) & L_{2} = \sqrt{\frac{I [{(ϕ (i, j, p) - ϕ_{ref} (i, j, p))}^{2}]}{I [ϕ_{ref}^{2} (x, y, p)]}} \\ (77) & L_{\infty} = \frac{\max |ϕ (i, j, p) - ϕ_{ref} (i, j, p)|}{\max |ϕ_{ref} (i, j, p)|} \end{array}

the subscript “ref” represents reference state.

5.1 Cosine bell advection

The Solid Body Rotation Cosine Bell (Case 1 from Williamson et al., 1992) is commonly employed to assess noise generated by panel boundaries, as noted by Chen and Xiao (2008) and Ullrich et al. (2010). The wind field is given by

\begin{array}{l} (78) & u_{s} = u_{0} (\cos θ \cos α + \cos λ \sin θ \sin α) \\ (79) & v_{s} = - u_{0} \sin λ \sin α \end{array}

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f08

Figure 8The variation of norm errors during simulation days for the cosine bell advection test case, with direction parameter α=0. The rows represent reconstruction schemes TPP3, TPP5 and TPP7, the columns stand for grid C30, C45, C90 and C180.

Download

where u_s, v_s are zonal wind and meridional wind, earth radius is a=6 371 220 m, basic flow speed $u_{0} = \frac{2 π a}{12 \cdot 86 400}$ m s⁻¹. The initial height is defined as

\begin{matrix} (80) & h (λ, θ) = \{\begin{cases} h_{0} (1 + \cos \frac{π d_{s}}{R}), & r < R \\ 0, & r \geq R \end{cases} \end{matrix}

where λ, θ are longitude and latitude. The basic height h₀=1000 m. The great circle distance between (λ, θ) and the initial center point of cosine bell (λ_c, $θ_{c}) = (3 π / 2, 0)$ is expressed by $d_{s} = a \cdot \arccos [\sin θ_{c} \sin θ + \cos θ_{c} \cos θ \cos (λ - λ_{c})]$ . The radius $R = a / 3$ .

Figure 8 presents the norm errors for a 12 d simulation at α=0; results for $α = π / 2$ are identical. The temporal evolution of L₁ and L₂ norm errors does not exhibit a pronounced signature attributable to panel boundaries. In contrast, the L_∞ norm error evolution shows significant sensitivity to panel boundaries, varying considerably with grid resolution and reconstruction order. When using low resolution and low reconstruction order (TPP3 with C30 grid), oscillations induced by panel boundaries are relatively weak. However, as the model resolution or reconstruction order increases, the influence of panel boundaries on the L_∞ norm error manifests as a distinct four-peak pattern, corresponding to the four longitudinally aligned panel boundaries of the cubed-sphere grid.

Figure 9 shows the 12 d simulation norm errors for $α = π / 4$ . In this test configuration, the cosine bell initially moves alone the interface between Panel 1 and Panel 5, and subsequently moves along the interface between Panel 3 and Panel 6. The temporal evolution of L₁ and L₂ norm errors display two gentle peaks, corresponding to the errors generated as the cosine bell crosses these panel interfaces. Similar to Fig. 8, the L_∞ norm error progressively exceeds the L₁ and L₂ norm errors as grid resolution and reconstruction order increase.

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f09

Figure 9The variation of norm errors during simulation days for the cosine bell advection test case, with direction parameter $α = π / 4$ . The rows represent reconstruction schemes TPP3, TPP5 and TPP7, the columns stand for grid C30, C45, C90 and C180.

Download

Because the Cosine Bell field lacks infinite continuity, the convergence rate of the norm errors cannot exceed second order in our tests, regardless of the reconstruction order employed. This observation aligns with the key point emphasized in our paper: high-order numerical methods achieve their design accuracy only when the flow field is sufficiently smooth. Discontinuities in the flow field violate the fundamental premise of polynomial reconstruction (as discontinuities impair the continuity of higher derivatives, leading to non-convergence of the Taylor series). This inherent sensitivity to smoothness is precisely the factor causing norm errors to be influenced by cubed-sphere panel boundaries. When using low-order reconstruction schemes at low resolutions, the Tensor Product Polynomial (TPP) reconstruction employs lower-degree polynomials and is consequently less sensitive to the smoothness of the flow field. Conversely, high-order TPP reconstruction requires the flow field to possess higher-order continuity to maintain accuracy; it is thus more sensitive to discontinuities. Insufficiently smooth flow fields can introduce numerical oscillations with high-order schemes. Therefore, while TPP5 and TPP7 yield lower L_∞ norm error magnitudes than TPP3, they exhibit more pronounced oscillations caused by the cubed-sphere panel boundaries.

5.2 Steady state geostrophic flow

Steady state geostrophic flow is the 2nd case in Williamson et al. (1992), it provided an analytical solution for spherical shallow water equations, it was widely used in accuracy test for shallow water models. The analytical solution is a steady state, which means the initial filed is the exact solution. The initial wind field replicates the formulation given in Eqs. (78) and (79), while the initial geopotential is expressed as

\begin{matrix} (81) & ϕ = ϕ_{0} - (a Ω u_{0} + \frac{u_{0}^{2}}{2}) (- \cos λ \cos θ \sin α + \sin θ \cos α)^{2} \end{matrix}

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f10

Figure 10Numerical errors (simulation result minus exact solution) of geopotential for steady state flow with Riemann solvers (a) LMARS and (b) AUSM⁺-up. The reconstruction scheme is TPP5, and the model resolution is C90.

Download

where $Ω = 7.292 \times 10^{- 5}$ s⁻¹ is the earth rotation angular velocity, basic geopotential ϕ₀=29 400 m² s⁻², α=0 denotes the rotation angle transcribed between the physical north pole and the center of the northern panel on the cubed-sphere grid, and gravity acceleration g=9.80616 m s⁻². The conversion between the spherical wind (u_s, v_s) and contravariant wind is given by Eq. (9).

We simulated the steady state geostrophic flow over one period (12 d) to test the norm errors and corresponds convergence rate. Since the norm error becomes too small to express by double precision number, all of the experiments were based on the quadruple precision version of HOPE. Time steps were set to Δt=600, 400, 200, 100, 50 s for C30, C45, C90, C180 and C360, respectively.

As shown in Fig. 10, errors near the panel boundaries of the cubed-sphere grid are significantly higher than those in the central regions, confirming the presence of grid imprinting. Furthermore, we implemented the AUSM-up+ Riemann solver (consistent with the scheme described in Ullrich et al., 2010) as an alternative to LMARS. While computationally more complex, AUSM⁺-up substantially reduces simulation errors. Comparative analysis of Fig. 10a and b demonstrates that the maximum absolute error decreases from 8.792×10⁵ (LMARS) to 2.413×10⁵ (AUSM⁺-up), while convergence rates remain unchanged.

Performance benchmarks using HOPE's Fortran implementation on a C90 grid show that simulating 12 d with a 200 s integration time step requires 49.4 s for LMARS versus 57.34 s for AUSM⁺-up. This demonstrates that Riemann solver selection critically impacts simulation outcomes, consistent with the discussions in Ullrich et al. (2010).

In Table 1, we present the geopotential simulation errors and convergence rate of different order accuracy schemes at various resolutions. It is evident that HOPE is capable of achieving the designed accuracies in all tests. When the resolution exceeds C180, the errors obtained from the TPP7, TPP9 and TPP11 schemes have surpassed the limits expressible by double-precision numbers. This demonstrates HOPE's excellent error convergence for simulating smooth flow fields. It should be noted that high-order accuracy schemes do consume more computational resources. HOPE has proven the feasibility of ultra-high-order accuracy finite volume methods on cubed sphere grids. However, in simulating the real atmosphere, a balance between computational efficiency and error must be considered. We believe that 3rd or 5th order accuracy schemes will be more practical for subsequent developments in baroclinic atmosphere model.

Table 1Norm errors and convergence rates of steady state geostrophic flow at day 12, with LMARS as Riemann Solver.

Download Print Version | Download XLSX

At lower resolutions, the simulation error of WENO3 is significantly higher than that of TPP3. However, as the resolution increases, the error of WENO3 progressively approaches that of TPP3. Comparing WENO5 and TPP5 results reveals a marginal increase in norm errors for WENO5, while maintaining 5th-order convergence rates. This confirms WENO5's capability to preserve high accuracy when simulating smooth flows.

It should be noted that HOPE achieves extremely small errors in simulating smooth flow fields even on very coarse resolutions. These errors can be so minute that they fall below the 16 significant digits representable in double precision. Under these conditions, conducting precision tests using double precision alone fails to accurately capture the true convergence rate. To obtain correct error measurements and convergence rate, we must employ FP128 (real (16) in Fortran). However, PyTorch's underlying architecture is built on NVIDIA CUDA, which currently supports only up to FP64 (double precision). Consequently, the PyTorch implementation cannot provide correct simulation errors when utilizing ultra-high-order schemes.

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f11

Figure 11TPP5 (with LMARS) simulation result of the isolated mountain wave on C90 grid. The rows stand for variables: geopotential, zonal wind, meridional wind and relative vorticity, respectively. The columns represent simulation day 5, 10, 15. Geopotential contour from 5050 to 5950 m with interval 50 m. Zonal wind contour from −30 to 50 m s⁻¹ with interval 10 m s−1. Meridional wind contour from −30 to 30 m s⁻¹ with interval 10 m s⁻¹. Relative vorticity contour from $- 3 \times 10^{- 5}$ to $4 \times 10^{- 5}$ s⁻¹ with interval $1 \times 10^{- 5}$ s⁻¹.

Download

5.3 Zonal flow over an isolated mountain

Zonal flow over an isolated mountain is the 5th case mentioned in Williamson et al. (1992), this case was usually be implemented to test the topography influence in shallow water models. The initial condition is defined by Eqs. (79)–(81), but h₀=5960 m, ϕ₀=h₀g, u₀=20 m s⁻¹. The mountain height is expressed as

\begin{matrix} (82) & h_{s} = h_{s_{0}} (1 - \frac{d_{s}}{R}) \end{matrix}

where $h_{s_{0}} = 2000$ m; $R = \frac{π}{9}$ ; $d_{s} = \sqrt{\min [R^{2}, (λ - λ_{c})^{2} + (θ - θ_{c})^{2}]}$ . $λ_{c} = \frac{3 π}{2} θ_{c} = \frac{π}{6}$ are the center longitude and latitude of the mountain, respectively.

HOPE is able to deal with the bottom topography correctly, as shown in Fig. 11, all of the simulation result is consistent with prior researches such as (Nair et al., 2005a; Ullrich et al., 2010; Chen and Xiao, 2008) and so on. Furthermore, as discussed in Bao et al. (2014), some high order Discontinuous Galerkin (DG) method exhibit non-physical oscillation during simulating the over mountain flow, the additional viscosity operators are necessary to alleviate this issue. However, HOPE does not require any explicit viscosity operator to suppress vorticity oscillations, the vorticity fields are smooth all the time as illustrated in Fig. 11j–l. We have tested other schemes as well, including TPP3, TPP7, WENO3, and WENO5, all of the schemes are able to achieve similar simulation results.

In the 15 d simulation of zonal flow over an isolated mountain the total energy exhibited a gradual increase over the integration time, while the total potential enstrophy showed gradual dissipation as the simulation progressed. The AUSM⁺-up scheme demonstrated stronger energy dissipation compared to the LMARS scheme, as illustrated in Fig. 12.

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f12

Figure 12Time series of normalized conservation errors for the zonal flow over isolated mountain simulation on the C90 grid over days 0 to 15. (a) Normalized total energy error. (b) Normalized total potential enstrophy error. (c) Normalized total zonal angular momentum error.

Download

5.4 Rossby–Haurwitz wave with 4 waves

Rossby–Haurwitz (RH) wave is the 6th test case introduced by Williamson et al. (1992), the RH waves are analytic solution of the spherical nonlinear barotropic vorticity equation, the reference solution is the zonal advection of RH wave without pattern changing, the angular phase speed is given by

\begin{matrix} (83) & c = \frac{R (R + 3) ω - 2 Ω}{(R + 1) (R + 2)} \end{matrix}

where R=4 is the zonal wavenumber, $ω = 7.848 \times 10^{- 6}$ s⁻¹; the earth rotation angular speed $Ω = 7.292 \times 10^{- 5}$ s⁻¹. Therefore, we have the period T≈29.52 d. The initial condition expressed as

\begin{matrix} (84) & ϕ = ϕ_{0} + a^{2} [A (θ) + B (θ) \cos R λ + C (θ) \cos 2 R λ] \\ (85) & u = a ω \cos θ + a K \cos^{R - 1} θ (R \sin^{2} θ - \cos^{2} θ) \cos R λ \\ (86) & v = - a K R \cos^{R - 1} θ \sin θ \sin R λ \\ (87) & \begin{aligned} A (θ) = & \frac{ω}{2} (2 Ω + ω) \cos^{2} θ + \frac{1}{4} K^{2} \cos^{2 R} θ \\ [(R + 1) \cos^{2} θ + 2 R^{2} - R - 2 - 2 R^{2} \cos^{- 2} θ] \end{aligned} \\ (88) & \begin{aligned} B (θ) & = \frac{2 (Ω + ω) K}{(R + 1) (R + 2)} \cos^{R} θ [R^{2} + 2 R + 2 \\ - (R + 1)^{2} \cos^{2} θ] \end{aligned} \\ (89) & C (θ) = \frac{1}{4} K^{2} \cos^{2 R} θ [(R + 1) \cos^{2} θ - R - 2] \end{matrix}

where λ, θ are longitude and latitude, K=ω, ϕ₀=gh₀, h₀=8000 m, and a=6 371 220 m is the earth radius.

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f13

Figure 13Geopotential of Rossby–Haurwitz wave simulated by TPP3 scheme. The rows represent grid C45, C90, C180 and C360, the columns stand for simulation day 14, 30, 60, 90. Contours from 8100 to 10 500 m with interval 200 m.

Download

According to the study by Thuburn and Li (2000), the Rossby–Haurwitz (RH) wave with wavenumber 4 is inherently dynamically unstable and prone to collapse. This instability can be triggered by minute perturbations, such as those arising from grid structure (breaking initial symmetry), initial condition imperfections, or numerical errors (e.g., truncation or roundoff). Similar conclusions have been verified in subsequent research. In tests conducted by Zhou et al. (2020), the TRiSK framework based on the SCVT grid could only sustain the RH wave pattern for 25 d without collapse. In contrast, Li et al. (2020) successfully maintained the RH wave pattern for 89 d using a similar algorithm on a latitude-longitude grid. Ullrich et al. (2010) developed the high-order accuracy finite volume model based on a cubed-sphere grid, which was able to sustain the RH wave for up to 90 d. In the most of our experiments, the ability of HOPE to maintain the Rossby–Haurwitz (RH) wave significantly improved with increased order of accuracy and grid resolution. All of the simulation results are based on LMARS in this section.

In the TPP3 simulation, we found that the duration for which the RH wave is maintained increases with higher grid resolution, as exhibit in Fig. 13. When the grid resolution is low (C45, C90), an obvious dissipation phenomenon can be observed. When the resolution reaches C180, the dissipation is significantly reduced, but the waveform has completely collapsed by day 90. When the resolution reaches C360, the simulation results are further improved, with dissipation further reduced, and the RH wave waveform can still barely be maintained on day 90.

A 100 d simulation of the Rossby–Haurwitz wave was conducted using a C90 grid (1° resolution). The total energy simulated with the TPP3, TPP5, TPP7, and TPP9 schemes underwent dissipation to varying degrees. By day 100, the normalized total energy errors reached $- 1.49 \times 10^{- 3}$ , $- 1.33 \times 10^{- 5}$ , $- 1.71 \times 10^{- 6}$ , $- 4.20 \times 10^{- 7}$ , respectively, indicating significantly stronger dissipation for the TPP3 scheme compared to the other higher-order schemes (Fig. 14a). Figure 14b presents a scaled view of the energy evolution for TPP5, TPP7, and TPP9, clearly demonstrating that increasing the reconstruction order progressively reduces energy dissipation. Furthermore, following the RH wave collapse, a significant drop in total energy was observed for the TPP5 scheme (after approximately 90 d) and the TPP7 scheme (after approximately 95 d).

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f14

Figure 14Time series of normalized conservation errors for the Rossby–Haurwitz wave simulation on the C90 grid over days 0 to 100, with LMARS scheme as Riemann solver. (a) Normalized total energy error for TPP3, TPP5, TPP7 and TPP9. (b) The total energy normalized error for TPP5, TPP7 and TPP9. (c) Normalized potential enstrophy error for TPP3, TPP5, TPP7 and TPP9. (d) Normalized total zonal angular momentum error for TPP3, TPP5, TPP7 and TPP9.

Download

Analysis of the normalized total potential enstrophy error (Fig. 14c) and the normalized zonal angular momentum error (Fig. 14d) over time yields conclusions consistent with those for total energy. Specifically, the TPP3 scheme exhibited substantially higher dissipation than the higher-order schemes, confirming that employing higher-order reconstruction schemes effectively minimizes dissipation. Notably, significant dissipation surges occurred in these quantities following the RH wave collapse.

In Fig. 15, we compare the impact of order-of-accuracy on the simulation capability of RH waves by fixing the resolution. By comparing row by row, it can be observed that when the accuracy reaches 5th order or higher, the dissipation is significantly reduced. Both the TPP5 and TPP7 simulations show signs of waveform distortion on day 90, and the waveform completely collapses by day 100. However, when using TPP9 for the simulation, the waveform is well maintained even until day 100.

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f15

Figure 15Geopotential of Rossby–Haurwitz wave on C90 grid, the rows represent the spatial reconstruction scheme with TPP3, TPP5, TPP7 and TPP9 the columns stand for simulation day 30, 60, 90 and 100. Contours from 8100 to 10 500 m with interval 200 m.

Download

Figure 16 presents the simulation results on the 80th day for different resolutions and reconstruction schemes. The dissipation decreases as the resolution and reconstruction order improve. At the C45 resolution, both the TPP3 and TPP5 simulations exhibit significant dissipation. Although the TPP7 simulation shows a notable improvement in dissipation, the waveform is severely distorted. The TPP9 scheme produces the best simulation results. As the resolution increases, the simulation performance also improves significantly. When using the C360 resolution, all TPP schemes yield good simulation results.

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f16

Figure 16Geopotential of Rossby–Haurwitz wave at simulation day 80. The rows represent spatial reconstruction with TPP3, TPP5, TPP7 and TPP9. The columns stand for grid C45, C90, C180 and C360. Contours from 8100 to 10 500 m with interval 200 m.

Download

Significant differences were observed between the 2D WENO scheme and the TPP schemes in this test. Regardless of the specific WENO order employed (3, 5, 7, or 9), all WENO variants maintained the Rossby–Haurwitz (RH) wave pattern for a shorter duration compared to their TPP counterparts of equivalent order. We infer that the nonlinear processes inherent within the WENO scheme introduce asymmetries that disrupt the computational stencil symmetry, leading to a premature collapse of the RH wave.

5.5 Perturbed jet flow

The perturbed jet flow was introduced by Galewsky et al. (2004), this experiment was desired to test the model ability of simulating the fast and slow motion. the initial field is defined as

\begin{array}{l} (90) & u (θ) = \{\begin{cases} \frac{u_{\max}}{e_{n}} e^{\frac{1}{(θ - θ_{0}) (θ - θ_{1})}}, & θ \in (θ_{0}, θ_{1}) \\ 0, & otherwise \end{cases} \\ (91) & ϕ (λ, θ) = ϕ_{0} + ϕ^{'} (λ, θ) - \int_{- \frac{π}{2}}^{θ} a u (θ^{'}) [f + \frac{\tan θ^{'}}{a} u (θ^{'})] d θ^{'} \\ (92) & ϕ^{'} (λ, θ) = g \hat{h} \cos θ e^{- {(\frac{λ}{α})}^{2} - {(\frac{θ_{2} - θ}{β})}^{2}}, λ \in (- π, π) \end{array}

where λ, θ represents longitude and latitude, a=6 371 220 m is radius of earth, u_max=80 m s⁻¹, $θ_{0} = \frac{π}{7}$ , $θ_{1} = \frac{5 π}{14}$ , $θ_{2} = \frac{π}{4}$ , $e_{n} = e^{\frac{- 4}{(θ_{1} - θ_{0})^{2}}}$ , $α = \frac{1}{3}$ , $β = \frac{1}{15}$ , and $\hat{h} = 120$ m. We adopt LMARS as Riemann solver in all of the simulation in this section.

As mentioned in Chen and Xiao (2008), the perturbed jet flow experiment poses a particular challenge for the cubed-sphere grid model. Firstly, the jet stream is located at 45° N, which is very close to the boundaries of panel 5 of the cubed-sphere grid, resulting in a large geopotential gradient in the ghost interpolation region, which leads to larger interpolation error. Furthermore, the location of the geopotential perturbation ϕ^′ coincides with the boundary between panel 1 and panel 5, which also leads to greater numerical computation errors.

Figure 17 displays the HOPE simulation outcomes at day 6 for varying levels of reconstruction order and resolutions. The four rows correspond to the TPP5, TPP7, TPP9 and TPP11 schemes in terms of reconstruction order. The three columns, meanwhile, represent the resolutions of C45, C90, and C180, respectively. Upon comparing the different columns, it is evident that the perturbed jet flow test case converges as the resolution increases. Figure 17a, d, g, and j illustrate that, with an increase in reconstruction order, the vorticity field patterns become increasingly similar to the high-resolution results shown in the second and third columns of Fig. 17. Notably, HOPE enhances the simulation results by utilizing both higher reconstruction order and higher resolution.

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f17

Figure 17Relative vorticity of perturbed jet flow. (a)–(c) represent the results of TPP5 scheme with resolutions C45, C90, C180. (d)–(f) represent the results of TPP7 scheme with resolutions C45, C90, C180. (g)–(i) represent the results of TPP9 scheme with resolutions C45, C90, C180. (j)–(l) represent the results of TPP11 scheme with resolutions C45, C90, C180.

Download

https://gmd.copernicus.org/articles/18/8175/2025/gmd-18-8175-2025-f18

Figure 18Geopotential of dam-break test case on C90 grid at 2nd hour. (a) Initial condition, (b) WENO 1D, (c) WENO 2D. The horizontal resolution for both schemes is C90. Shaded and contour from 3.2×10⁴ to 6×10⁴ m, with contour interval 10³ m.

Download

5.6 Dam-break shock wave

In this section we introduce a dam-break case for testing the capability of HOPE to capture the shock wave and comparing the difference between 1D and 2D WENO schemes. The initial condition is configured as a cylinder with a geopotential of 30 000 m² s⁻², as shown in Fig. 18a. The geopotential is given by

\begin{matrix} (93) & ϕ (d_{s} (λ, θ)) = \{\begin{cases} 2 ϕ_{0}, & r < r_{c} \\ ϕ_{0}, & otherwise \end{cases} \end{matrix}

where $d_{s} = \sqrt{(λ - λ_{c})^{2} + (θ - θ_{c})^{2}} λ_{c} = π$ , θ_c=0, $r_{c} = \frac{π}{9}$ , ϕ₀=30 000 m² s⁻², and the earth rotation angular speed Ω=0. We adopt LMARS as Riemann solver in all of the simulation in this section.

In this experiment, we compare WENO5 (WENO scheme with reconstruction width 5) on both 1D and 2D schemes, the WENO-Z (Borges et al., 2008) is adopted as WENO 1D scheme, and WENO 2D scheme is consist with Sect. 3.2. Due to the initial condition being a cylinder, the resulting shock wave should maintain a circular feature. In the simulation results of WENO 1D, numerous radial textures appear (Fig. 18b). The simulation results using the WENO 2D scheme exhibit a smoother circular shape, Fig. 18c. This outcome arises because the 1D reconstruction scheme suffers from dimension split error, whereas the fitting function in the 2D reconstruction scheme incorporates cross terms. Therefore, when simulating fluid fields characterized by isotropic features, the 1D scheme lacks the capability to accurately represent diagonal directional features. Conversely, the 2D scheme correctly captures the inherent isotropic characteristics.

6 Conclusions

This paper presents HOPE, an innovative finite-volume model capable of achieving arbitrary odd-order convergence rate. Through comprehensive numerical experiments, we demonstrate that HOPE exhibits excellent convergence properties when applied to smooth flow fields, with simulation errors decreasing rapidly as the order of accuracy increases.

The model's performance has been rigorously evaluated across several benchmark cases:

In Rossby–Haurwitz wave simulations, HOPE demonstrates superior waveform preservation capabilities that scale with both spatial resolution and accuracy order.
For perturbed jet flow scenarios, the model successfully resolves both fast and slow dynamical features, with significant improvements in solution quality observed at higher orders and finer resolutions.
Mountain wave simulations confirm HOPE's ability to accurately represent orographically-forced gravity waves.
In the dam break test case featuring cylindrical shock fronts, the two-dimensional WENO reconstruction scheme proves more effective than dimension-split approaches in maintaining circular symmetry.

In the case of steady geostrophic flow, Both WENO3 and WENO5 achieve the expected 3rd-order and 5th-order convergence rates, respectively. However, the computed norm errors for WENO schemes are marginally larger than those obtained with the TPP3 and TPP5 schemes. This observation confirms that the 2D WENO scheme preserves the designed convergence rate in smooth flow regions. Concurrently, in the Dam-Break Shock Wave case, the 2D WENO scheme demonstrates its robust capability for handling discontinuous flow fields. These combined results align perfectly with the primary motivation for introducing the WENO scheme: its adaptive oscillation suppression capability. Specifically, the scheme preserves the high convergence rate in sufficiently smooth regions while automatically reducing the reconstruction order near discontinuities to effectively suppress the development and propagation of non-physical oscillations.

A key innovation of HOPE lies in its computational architecture. The algorithm is specifically designed to harness GPU acceleration through (1) implementation of spatial reconstructions as convolutional operations, and (2) formulation of integration steps as matrix-vector products. These design choices leverage computational patterns widely adopted in machine learning frameworks. By developing HOPE within PyTorch, we inherit automatic differentiation capabilities, enabling straightforward coupling with neural network systems.

This integration facilitates the development of hybrid prediction models that combine a high-order, high-performance dynamical core, and Neural network-based physical parameterizations. Current research efforts have successfully extended this algorithmic framework to a two-dimensional baroclinic model (X–Z dimensions).

Future work will focus on developing a global, fully compressible baroclinic model using the HOPE algorithm, further demonstrating its versatility and advantages for modeling complex atmospheric dynamics. The model's unique combination of physical conservation, computational efficiency, and machine learning compatibility positions it as a powerful tool for next-generation atmospheric modeling.

Appendix A

In this appendix, we introduce a novel boundary ghost cell interpolation scheme for cubed sphere, which is able to support HOPE to reach the accuracy over 11th order or even higher.

There are two types of cells, in-domain and out-domain (also named ghost cell, as show in Fig. 7b), we define the set of in-domain cell values $q_{d \times 1} = (q_{1}, q_{2}, \dots, q_{d})^{T}$ , the set of out-domain cell values $g_{h \times 1} = (g_{1}, g_{2}, \dots, g_{d})^{T}$ , and the set of Gaussian quadrature point values (green points in Fig. 3) in out-domain cells is define as $v_{p \times 1} = (v_{1}, v_{2}, \dots, v_{p})$ . To identify the shape of the arrays, we denote the array shape using subscripts (this convention will be followed throughout the subsequent text). The purpose of ghost cell interpolation is using the known cell value q to interpolate the unknown g.

Define a new set includes the values of domain cell values and ghost cell values

\begin{matrix} (A1) & {\tilde{q}}_{(d + h) \times 1} = q ⋃ g = {(q_{1}, q_{2}, \dots, q_{d}, g_{1}, g_{2}, \dots, g_{h})}^{T} \end{matrix}

Similar to the describe in section 0, we can use a TPP to reconstruct the ghost quadrature points

\begin{matrix} (A2) & v_{p \times 1} = A_{p \times (d + h)} {\tilde{q}}_{(d + h) \times 1} \end{matrix}

where $A_{p \times (d + h)}$ is the interpolation matrix that can be obtain by the similar method to Eq. (29). The ghost cell values are calculated by Gaussian quadrature

\begin{matrix} (A3) & g_{h \times 1} = B_{h \times p} v_{p \times 1} \end{matrix}

where B_h×p is the Gaussian quadrature matrix.

${\tilde{q}}_{(d + h) \times 1}$ can be decomposed as the linear q_d×1 and v_p×1

\begin{matrix} (A4) & \begin{aligned} {\tilde{q}}_{(d + h) \times 1} & = (\begin{array}{ll} I_{d \times d} & 0 \\ 0 & B_{h \times p} \end{array}) (\begin{array}{c} q_{d \times 1} \\ v_{p \times 1} \end{array}) \\ = {\tilde{B}}_{(d + h) \times (d + p)} {\overline{q}}_{(d + p) \times 1} \end{aligned} \end{matrix}

where I_d×d is an identity matrix, and

\begin{array}{l} (A5) & {\tilde{B}}_{(d + h) \times (d + p)} = (\begin{array}{ll} I_{d \times d} & 0 \\ 0 & B_{h \times p} \end{array}) \\ (A6) & {\overline{q}}_{(d + p) \times 1} = (\begin{array}{c} q_{d \times 1} \\ v_{p \times 1} \end{array}) . \end{array}

Substitute Eq. (30) into Eq. (26), we have

\begin{matrix} (A7) & \begin{aligned} v_{p \times 1} & = A_{p \times (d + h)} {\tilde{B}}_{(d + h) \times (d + p)} {\overline{q}}_{(d + p) \times 1} \\ = {\tilde{A}}_{p \times (d + p)} {\overline{q}}_{(d + p) \times 1} = {\tilde{A}}_{p \times (d + p)} (\begin{array}{c} q_{d \times 1} \\ v_{p \times 1} \end{array}) . \end{aligned} \end{matrix}

We found that matrix ${\tilde{A}}_{p \times (d + p)}$ can be decomposed into two parts

\begin{matrix} (A8) & {\tilde{A}}_{p \times (d + p)} = (\begin{array}{cc} {\overline{A}}_{p \times d} & C_{p \times p} \end{array}) . \end{matrix}

Such that

\begin{matrix} (A9) & v_{p \times 1} = {\overline{A}}_{p \times d} q_{d \times 1} + C_{p \times p} v_{p \times 1} . \end{matrix}

Therefore

\begin{matrix} (A10) & (I_{p \times p} - C_{p \times p}) v_{p \times 1} = {\overline{A}}_{p \times d} q_{d \times 1} . \end{matrix}

We set $D_{p \times p} = I_{p \times p} - C_{p \times p}$ , then v_p×1 can be determined by

\begin{matrix} (A11) & v_{p \times 1} = D_{p \times p}^{- 1} {\overline{A}}_{p \times d} q_{d \times 1} . \end{matrix}

Substitute Eq. (A11) into Eq. (A3), we establish the relationship between ghost cell values and in-domain cell values

\begin{matrix} (A12) & \begin{aligned} g_{h \times 1} & = B_{h \times p} v_{p \times 1} = B_{h \times p} D_{p \times p}^{- 1} {\overline{A}}_{p \times d} q_{d \times 1} \\ = G_{h \times d} q_{d \times 1} \end{aligned} \end{matrix}

where $G_{h \times d} = B_{h \times p} D_{p \times p}^{- 1} {\overline{A}}_{p \times d}$ . It is clear that Eq. (A.12) is linear, and only rely on the mesh and Gaussian quadrature scheme. Therefore, we need to compute the projection matrix 𝒢_h×d only once for a given mesh and accuracy, this matrix can be computed by a preprocessing system and save it to the hard disk.

Code and data availability

The core data underpinning our findings are outputs produced by the HOPE code (including ghost interpolation weights and simulation results). With this code, the data can be accessed and regenerated. The digital object identifier (DOI) for the HOPE shallow water model is https://doi.org/10.5281/zenodo.16635583 (Zhou, 2025).

Author contributions

LZ developed the algorithm, implemented the code, conducted numerical experiments, and wrote the manuscript. WX provided expert guidance on software architecture design and high-performance computing implementations. XS provided expert guidance on manuscript writing.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Acknowledgements

The authors wish to express their profound respect for the rigorous scientific approach and dedication to pure science demonstrated by the anonymous reviewers. Their numerous valuable suggestions and critiques have significantly enhanced the manuscript's linguistic clarity, argumentative rigor, and experimental completeness.

Financial support

This work is partially supported by the National Natural Science Foundation of China (grant no. U2242210).

Review statement

This paper was edited by Yongze Song and reviewed by two anonymous referees.

References

Acker, F., de R. Borges, R. B., and Costa, B.: An improved WENO-Z scheme, J. Comput. Phys., 313, 726–753, https://doi.org/10.1016/j.jcp.2016.01.038, 2016.

Bao, L., Nair, R. D., and Tufo, H. M.: A Mass and Momentum Flux-Form High-Order Discontinuous Galerkin Shallow Water Model on the Cubed-Sphere, J. Comput. Phys., 271, 224–243, https://doi.org/10.1016/j.jcp.2013.11.033, 2014.

Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., and Tian, Q.: Pangu-Weather: A 3D High-Resolution System for Fast and Accurate Global Weather Forecast, arXiv [preprint], arXiv:2211.02556v1 [physics.ao-ph], https://doi.org/10.48550/arXiv.2211.02556, 2022.

Borges, R., Carmona, M., Costa, B., and Don, W. S.: An improved weighted essentially non-oscillatory scheme for hyperbolic conservation laws, J. Comput. Phys., 227, 3191–3211, https://doi.org/10.1016/j.jcp.2007.11.038, 2008.

Chen, C. and Xiao, F.: Shallow Water Model on Cubed-Sphere by Multi-Moment Finite Volume Method, J. Comput. Phys., 227, 5019–5044, https://doi.org/10.1016/j.jcp.2008.01.033, 2008.

Chen, K., Han, T., Chen, X., Ma, L., Gong, J., Zhang, T., Yang, X., Bai, L., Ling, F., Su, R., Ci, Y., and Ouyang, W.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead, arXiv [prepint, arXiv:2304.02948v1 [cs.AI], https://doi.org/10.48550/arXiv.2304.02948, 2023.

Chen, X., Andronova, N., Van Leer, B., Penner, J. E., Boyd, J. P., Jablonowski, C., and Lin, S.-J.: A Control-Volume Model of the Compressible Euler Equations with a Vertical Lagrangian Coordinate, Mon. Weather Rev., 141, 2526–2544, https://doi.org/10.1175/mwr-d-12-00129.1, 2013.

Galewsky, J., Scott, R. K., and Polvani, L. M.: An Initial-Value Problem for Testing Numerical Models of the Global Shallow-Water Equations, Tellus A, 56, 429–440, 2004.

Hu, C. and Shu, C.-W.: Weighted Essentially Non-oscillatory Schemes on Triangular Meshes, J. Comput. Phys., 150, 97–127, 1999.

Ii, S. and Xiao, F.: A Global Shallow Water Model Using High Order Multi-Moment Constrained Finite Volume Method and Icosahedral Grid, J. Comput. Phys., 229, 1774–1796, https://doi.org/10.1016/j.jcp.2009.11.008, 2010.

Jiang, G.-S. and Shu, C.-W.: Efficient Implementation of Weighted ENO Schemes, J. Comput. Phys., 126, 202–228, https://doi.org/10.1006/jcph.1996.0130, 1996.

Kochkov, D., Yuval, J., Langmore, I., Norgaard, P., Smith, J., Mooers, G., Lottes, J., Rasp, S., Duben, P., Klower, M., Hatfield, S., Battaglia, P., Sanchez-Gonzalez, A., Willson, M., Brenner, M. P., and Hoyer, S.: Neural General Circulation Models for Weather and Climate, Nature, 632, 1060–1066, 2024.

Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F., Ravuri, S., Ewalds, T., Eaton-Rosen, Z., Hu, W., Merose, A., Hoyer, S., Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S., and Battaglia, P.: GraphCast: Learning Skillful Medium-Range Global Weather Forecasting, Science, 382, 1416–1421, 2023.

Li, J., Wang, B., and Dong, L.: Analysis of and Solution to the Polar Numerical Noise Within the Shallow-Water Model on the Latitude–Longitude Grid, J. Adv. Model. Earth Syst., 12, https://doi.org/10.1029/2020ms002047, 2020.

Liou, M.-S.: A Sequel to AUSM, Part II: AUSM+-up for All Speeds, J. Comput. Phys., 214, 137–170, https://doi.org/10.1016/j.jcp.2005.09.020, 2006.

Liu, X.-D., Osher, S., and Chan, T.: Weighted Essentially Non-oscillatory Schemes, J. Comput. Phys., 115, 200–212, 1994.

Luo, X. and Wu, S.-P.: An improved WENO-Z+ scheme for solving hyperbolic conservation laws, J. Computat. Phys., 445, https://doi.org/10.1016/j.jcp.2021.110608, 2021.

Nair, R. D., Thomas, S. J., and Loft, R. D.: A Discontinuous Galerkin Global Shallow Water Model, Mon. Weather Rev., 133, 876–888, 2005a.

Nair, R. D., Thomas, S. J., and Loft, R. D.: A Discontinuous Galerkin Transport Scheme on the Cubed Sphere, Mon. Weather Rev., 133, 814–828, 2005b.

Sadourny, R.: Conservative Finite-Difference Approximations of the Primitive Equations on Quasi-Uniform Spherical Grids, Mon. Weather Rev., 100, 136–144, 1972.

Shi, J., Hu, C., and Shu, C.-W.: A Technique of Treating Negative Weights in WENO Schemes, J. Comput. Phys., 175, 108–127, https://doi.org/10.1006/jcph.2001.6892, 2002.

Thuburn, J. and Li, Y.: Numerical Simulations of Rossby–Haurwitz Waves, Tellus A, 52, 181–189, https://doi.org/10.1034/j.1600-0870.2000.00107.x, 2000.

Ullrich, P. A. and Jablonowski, C.: MCore: A Non-hydrostatic Atmospheric Dynamical Core Utilizing High-order Finite-Volume Methods, Journal of Computational Physics, 231, 5078-5108, 10.1016/j.jcp.2012.04.024, 2012.

Ullrich, P. A., Jablonowski, C., and van Leer, B.: High-Order Finite-Volume Methods for the Shallow-Water Equations on the Sphere, J. Comput. Phys., 229, 6104–6134, https://doi.org/10.1016/j.jcp.2010.04.044, 2010.

Wicker, L. J. and Skamarock, W. C.: Time-Splitting Methods for Elastic Models Using Forward Time Schemes, Mon. Weather Rev., 130, 2088–2097, 2002.

Williamson, D. L., Drake, J. B., Hack, J. J., Jakob, R., and Swarztrauber, P. N.: A Standard Test Set for Numerical Approximations to the Shallow Water Equations in Spherical Geometry, J. Comput. Phys., 102, 211–224, 1992.

Zhang, Y., Long, M., Chen, K., Xing, L., Jin, R., Jordan, M. I., and Wang, J.: Skilful Nowcasting of Extreme Precipitation with NowcastNet, Nature, 619, 526–532, https://doi.org/10.1038/s41586-023-06184-4, 2023.

Zhou, L.: High Order Prediction Environment, Zenodo [code], https://doi.org/10.5281/zenodo.16635583, 2025.

Zhou, L., Feng, J., Hua, L., and Zhong, L.: Extending Square Conservation to Arbitrarily Structured C-grids with Shallow Water Equations, Geosci. Model Dev., 13, 581–595, https://doi.org/10.5194/gmd-13-581-2020, 2020.

Zhu, J. and Shu, C.-W.: A New Type of Multi-resolution WENO Schemes with Increasingly Higher Order of Accuracy on Triangular Meshes, J. Comput. Phys., 392, 19–33, https://doi.org/10.1016/j.jcp.2019.04.027, 2019.

Articles

Short summary

This study develops a novel physics-based weather prediction model using artificial intelligence development platform, achieving high accuracy while maintaining strict physical conservation laws. Our algorithms are optimized for modern super computers, enabling efficient large-scale weather simulations. A key innovation is the model's inherent differentiable nature, allowing seamless integration with AI systems to enhance predictive capabilities through machine learning techniques.

HOPE: an arbitrary-order non-oscillatory finite-volume shallow water dynamical core with automatic differentiation

3.1 Tensor product polynomial (TPP) reconstruction

3.2 Genuine two-dimensional WENO

3.3 Treatment of the panel boundaries

3.3.1 Ghost cell interpolation

3.3.2 Fields conversion between panels

3.4 Riemann solver

3.4.1 Low Mach number approximate Riemann solver (LMARS)

3.4.2 Advection upstream splitting method for all speeds (AUSM+-up)

3.5 Temporal integration

5.1 Cosine bell advection

5.2 Steady state geostrophic flow

5.3 Zonal flow over an isolated mountain

5.4 Rossby–Haurwitz wave with 4 waves

5.5 Perturbed jet flow

5.6 Dam-break shock wave

3.4.2 Advection upstream splitting method for all speeds (AUSM⁺-up)