The Simulation EnviRonment for Geomorphology, Hydrodynamics, and Ecohydrology in Integrated form (SERGHEI) is a multi-dimensional, multi-domain, and multi-physics model framework for environmental and landscape simulation, designed with an outlook towards Earth system modelling. At the core of SERGHEI's innovation is its performance-portable high-performance parallel-computing (HPC) implementation, built from scratch on the Kokkos portability layer, allowing SERGHEI to be deployed, in a performance-portable fashion, in graphics processing unit (GPU)-based heterogeneous systems. In this work, we explore combinations of MPI and Kokkos using OpenMP and CUDA backends. In this contribution, we introduce the SERGHEI model framework and present with detail its first operational module for solving shallow-water equations (SERGHEI-SWE) and its HPC implementation. This module is designed to be applicable to hydrological and environmental problems including flooding and runoff generation, with an outlook towards Earth system modelling. Its applicability is demonstrated by testing several well-known benchmarks and large-scale problems, for which SERGHEI-SWE achieves excellent results for the different types of shallow-water problems. Finally, SERGHEI-SWE scalability and performance portability is demonstrated and evaluated on several TOP500 HPC systems, with very good scaling in the range of over 20 000 CPUs and up to 256 state-of-the art GPUs.

The upcoming exascale high-performance parallel-computing (HPC) systems will enable physics-based geoscientific modelling with unprecedented detail

Hydrological models, as with many other HPC applications, are currently facing challenges in exploiting available and future HPC systems. These
challenges arise, not only because of the intrinsic complexity of maintaining complex codes over large periods of time, but because HPC and its
hardware are undergoing a large paradigm change

Different strategies are currently being developed to cope with this grand challenge. One strategy is to offload the architecture-dependent
parallelisation tasks to the compiler – see, for example,

This paper introduces the Kokkos-based computational (eco)hydrology framework SERGHEI (Simulation EnviRonment for Geomorphology, Hydrodynamics, and
Ecohydrology in Integrated form) and its surface hydrology module SERGHEI-SWE. The primary aim of SERGHEI's implementation is scalability and
performance portability. In order to achieve this, SERGHEI is written in C++ and based from scratch on the Kokkos abstraction. Kokkos currently
supports CUDA, OpenMP, HIP, SYCL, and Pthreads as backends. We chose Kokkos over other alternatives because it is actively engaged in
securing the sustainability of its programming model, fostering its partial inclusion into ISO C++ standards

SERGHEI-SWE enables the simulation of surface hydrodynamics of overland flow and streamflow seamlessly and across scales. Historically, hydrological
models featuring surface flow have relied on kinematic or zero-inertia (diffusive) approximations due to their apparent simplicity

Overview of openly available SWE solvers.

SERGHEI-SWE distinguishes itself from other HPC SWE solvers through a number of key novelties. Firstly, SERGHEI-SWE is open sourced under a
permissive BSD license. While there are indeed many GPU-enabled SWE codes, many of these are research codes that are not openly available – for
example,

SERGHEI-SWE has been developed by harnessing the past 15 years' worth of numerical advances in the solution of SWE, ranging from fundamental numerical
formulations

SERGHEI is envisioned as a modular simulation framework around a physically based hydrodynamic core, which allows a variety of
water-driven and water-limited processes to be represented in a flexible manner. In this sense, SERGHEI is based on the idea of water fluxes as a connecting thread
among various components and processes within the Earth system

A conceptual framework of SERGHEI.

In this section we provide an overview of the underlying mathematical model and the numerical schemes implemented in SERGHEI-SWE. The implementation is based on well-established numerical schemes, and consequently, we limit this to a minimal presentation.

SERGHEI-SWE is based on the resolution of the two-dimensional (2D) shallow-water equations that can be expressed in a compact differential
conservative form as

Here,

SERGHEI-SWE uses a first-order accurate upwind finite-volume scheme with a forward Euler time integration to solve the system of
Eq. (

Well-balancing and water depth positivity are ensured by solving numerical fluxes at each cell edge

Although the wave speed values are formally defined at the interfaces, the corresponding cell values are used instead for the CFL condition. As
pointed in

It is relevant to acknowledge that second (and higher)-order schemes for SWE are available

In this section we describe the key ingredients of the HPC implementation of SERGHEI. Conceptually, this requires, firstly, handling parallelism inside a computational device (multicore CPU or GPU) with shared memory and the related portability and corresponding backends (i.e. OpenMP, CUDA, HIP, etc.). On a higher level of parallelism, distributing computations across many devices requires domain decomposition and a distributed memory problem, implemented via MPI. The complete implementation of SERGHEI encompasses both, distributing parallel computations into many subdomains, each of which is mapped onto a computational device. Here we start the discussion from the higher level of domain decomposition and highlight that the novelty of SERGHEI lies with the multiple levels of parallelism together with the performance-portable shared-memory approach via Kokkos.

Domain decomposition and indexing in SERGHEI: a subdomain consists of physical cells (white) and halo cells (grey). SERGHEI uses two sets of indices: an index for physical cells

Data exchange between subdomains in SERGHEI: in the global surface domain, subdomains overlap with each other through their halo cells

The surface domain is a two-dimensional plane, discretised by a Cartesian grid with a total cell number of

Besides the global cell index that ranges from 0 to

The underlying methods for data exchange between subdomains are centred on the subdomains rather than on the interfaces. Data are exchanged through
MPI-based send-and-receive calls (non-blocking) that aggregate data in the halo cells across the subdomains. Note that, by default, Kokkos
implicitly assumes that the MPI library is GPU aware, allowing GPU-to-GPU communication provided that the MPI libraries support this
feature. Figure

Intra-device parallelism is achieved per subdomain through the Kokkos framework, which allows the user to choose between shared-memory parallelism
and GPU backends for further acceleration. SERGHEI's implementation makes use of the Kokkos concept of

For a CUDA backend, the use of unified memory (

Similar definitions can be constructed for integer arrays. These arrays describe spatially distributed fields, such as conserved variables, model
parameters, and forcing data. Deriving these arrays from

Conceptually, the SERGHEI-SWE solver consists of two computationally intensive kernels: (i) cell-spanning and (ii) edge-spanning kernels. The update
of the conserved variables following Eq. (

Conserved variable update in standard C++.

In order to achieve the desired portability, we replace the standard

Conserved variable update using Kokkos.

Edge-spanning loops are conceptually necessary to compute numerical fluxes (Eq.

Flux computations.

Lake at rest solution for emerged bump. SERGHEI-SWE satisfies the C property.

In this section we report evidence supporting the claim that SERGHEI-SWE is an accurate, robust and efficient shallow-water solver. The formal accuracy testing strategy is based on several well-known benchmark cases with well-defined reference solutions. Herein, for brevity, we focus only on the results of these tests, while providing a minimal presentation of the set-ups. We refer the interested reader to the original publications (and to the many instances in which these tests have been used) for further details on the geometries, parametrisations and forcing.

We purposely report an extensive testing exercise to show the wide applicability of SERGHEI across hydraulic and hydrological problems, with a wide range of the available benchmark tests. Analytical, experimental and field-scale tests are included. The first are aimed at showing formal convergence and accuracy. The experimental cases are meant as validation of the capabilities of the model to reach physically meaningful solutions under a variety of conditions. The field-scale tests showcase the applicability of the solver for real problems, and allow for strenuous computational tasks to show performance, efficiency, and parallel scaling. All solutions reported here were computed using double precision arithmetic.

We test SERGHEI's capability to capture moving equilibria in a number of steady-flow test cases compiled in

Analytical steady flows: summary of

These tests feature a smooth bump in a one-dimensional, frictionless domain which can be used to validate the C property, well-balancing, and the
shock-capturing ability of the numerical solver

To show well-balancing under steady flow, we computed two transcritical flows based on the analytical benchmark of a one-dimensional flume with
varying geometry proposed by

Analytical steady flows: flumes. SERGHEI-SWE captures moving equilibria solutions for two transcritical steady flows. Note that the solution is stable (no oscillations) and well-balanced (discharge remains constant along the flume).

Analytical dam break:

Figure

We verify SERGHEI-SWE's capability to capture transient flow based on analytical dam breaks

Dam break on dry bed without friction: model predictions for different number of grid cells. SERGHEI-SWE converges to the analytical solution (Ritter's solution) as the grid is refined.

A classical frictionless dam break over a wet bed is reported in Sect.

We present transient two-dimensional test cases with moving wet–dry fronts that consider the periodical movement of water in a parabolic bowl,
so-called “oscillations” that have been studied by

The well-established test case by

Planar surface in a paraboloid: snapshots of water depth by the model compared to the analytical solution (contour lines). Period

Snapshots of the simulation are plotted in Fig.

The simulated discharge hydrograph at the outlet is compared against the analytical solution in Fig.

Simulated and analytical discharge for the analytical case of rainfall in a flume.

For both cases the flume (including the upstream wider reservoir) was discretised at a 5

Simulated and experimental steady water surface in the obstacle region of the G3 flume for the centreline profile

The steady-flow case had a discharge of 2.5

Simulated and experimental transient water depths at three gauge points (

The dam break case is triggered by a sudden opening of the gate followed by a wave advancing along the dry flume. Results for this case at three gauge
points are shown in Fig.

Simulated and experimental results of unsteady flow over an island for gauges G9

The dimensions of the experimental flume are 2

Simulated hydrographs compared to experimental data from

Figure

Simulated velocities are compared to experimental velocities at the 62 gauged locations in Fig.

Comparison of simulated (line) and experimental (circles) steady velocities in the Thiès field case.

Water surface elevations

Geolocated relative WSE error

HPC systems in which SERGHEI-SWE has been tested.

JSC: Jülich Supercomputing Centre; FZJ: Forschungszentrum Jülich; OLCF: Oak Ridge Leadership Computing Facility; ORNL: Oak Ridge National Laboratory; NERSC: National Energy Research Scientific Computing Center; LBNL: Lawrence Berkeley National Laboratory

The Malpasset dam break event

In this section we report an investigation of the computational performance and parallel scaling of SERGHEI-SWE for selected test cases. To
demonstrate performance portability, we show performance metrics for both OpenMP and CUDA backends enabled by Kokkos, computed on CPU and
GPU architectures, respectively. For that, hybrid MPI-OpenMP and MPI-CUDA implementations are used, with one MPI task per node for
MPI-OpenMP and one MPI task per GPU for MPI-CUDA. Most of the runs were performed on JUWELS at JSC (Jülich Supercomputing
Centre). Additional HPC systems were also used for come cases. Properties of all systems are shown in Table

It is important to highlight that no performance tuning or optimisation has been carried out for these tests and that no system-specific porting efforts were done. All runs relied entirely on Kokkos for portability. The code was simply compiled with the available software stacks in the HPC systems and executed. All results reported here were computed using double-precision arithmetic.

The commonly used Malpasset dam break test (introduced in Sect.

Scaling for the Malpasset case (

Strong scaling behaviour for a circular dam break test case with two resolutions.

Runtime

This is a simple analytical verification test in the shallow-water literature, which generalises the 1D dam break solution. We purposely select this
case (instead of one of the many verification problems) for its convenience for scaling studies. Firstly, resolution can be increased at
will. Additionally, the square domain allows for trivial domain decomposition, which together with the fully wet domain and the radially symmetric
flow field minimises load-balancing issues. Essentially, it allows for a very clean scalability test with minimal interference from the problem
topology, which facilitates scalability and performance analysis (in contrast to the limitations of the Malpasset domain discussed in
Sect.

We generated three computational grids, with

For the 552-million-cells grid, only two runs were computed with 128 and 160 GPUs (corresponding to 32 and 40 nodes in JUWELS-booster,
respectively). Runtime for these was 95.4 and 84.7

Runtime

To demonstrate scaling under production conditions of real scenarios, we use an idealised rainfall runoff simulation over the Lower Triangle region in
the East River Watershed (Colorado, USA)

For practical purposes, two configurations have been used for this test: a short rainfall of

In this paper we present the SERGHEI framework and, in particular, the SERGHEI-SWE module. SERGHEI-SWE implements a 2D fully dynamic shallow-water solver, harnessing state-of-the-art numerics and leveraging on Kokkos to facilitate portability across architectures. We show through empirical evidence with a large set of well-established benchmarks that SERGHEI-SWE is accurate, numerically stable, and robust. Importantly, we show that SERGHEI-SWE's parallel scaling is very good for CPU-based HPC systems, consumer-grade GPUs, and GPU-based HPC systems. Consequently, we claim that SERGHEI is indeed performance portable and approaching exascale readiness. These features make SERGHEI-SWE a plausible community code for shallow-water modelling for a wide range of applications requiring large-scale, very-high-resolution simulations.

Exploiting increasingly better and highly resolved geospatial information (DEMs, land use, vector data of structures) prompts the need for high-resolution solvers. At the same time, the push towards the study of multiscale systems and integrated management warrants increasingly larger domains. Together, these trends result in larger computational problems, motivating the need for exascale-ready shallow-water solvers. Additionally, HPC technology is evermore available, not only via (inter)national research facilities but also through cloud-computing facilities. It is arguably timely to enable such an HPC-ready solver.

The HPC allows for not only large simulations but also large ensembles of simulations, allowing uncertainty issues to be addressed and enabling scenario analysis for engineering problems, parameter space exploration, and hypothesis testing. Furthermore, although the benefits of high resolution may be marginal for runoff hydrograph estimations, they allow the local dynamics to be better resolved in the domain. Flow paths, transit times, wetting–drying dynamics, and connectivity play important roles in transport and ecohydrological processes. For these purposes, enabling very-high-resolution simulations will prove to be highly beneficial. We also envision that, provided with sufficient computational resources, SERGHEI-SWE could be used for operational flood forecasting and probabilistic flash-flood modelling. Altogether, this strongly paves the way for the uptake of shallow-water solvers by the broader ESM community and its coupling to Earth system models, as well as their many applications, from process and system understanding to hydrometeorological risk and impact assessment. We also envision that, for users not requiring HPC capabilities, the benefit of SERGHEI-SWE is access to a transparent, open-sourced, performance-portable software that allows workstation GPUs to be exploited efficiently.

As additional SERGHEI modules become operational, the HPC capabilities will further enable simulations that are unfeasible with the current generation of available solvers. For example, with a fully operational transport and morphology module, it will be possible to run decade-long morphological simulations relevant for river management applications; to better capture sediment connectivity and sediment cascades across the landscape, a relevant topic for erosion and catchment management; or to perform catchment-scale hydro-biogeochemical simulations with unprecedented high spatial resolutions for better understanding of ecohydrological and biogeochemical processes.

Finally, SERGHEI is conceptualised and designed with extendibility and software interoperability in mind, with design choices made to facilitate
foreseeable future developments on a wide range of topics, such as

numerics, e.g. the discontinuous Galerkin discretisation strategies

interfaces to mature geochemistry engines, e.g. CrunchFlow

vegetation models with varying degrees of complexity, for example, Ecosys

This appendix contains an extended set of relevant test cases that are commonly used as validation cases in the literature. It complements and extends
the verification evidence in Sect.

Using the same set-up as in Sect.

Lake-at-rest solution for an immersed bump.

Analytical steady flows over a bump. SERGHEI-SWE captures moving equilibria solutions for transcritical flow with a shock (top left), fully subcritical flow (top right), and transcritical flow without a shock (bottom)

Analytical steady flows: flumes. SERGHEI-SWE captures moving equilibria solutions for a subcritical

To further show that SERGHEI-SWE is well-balanced, we computed three steady flows over a bump. We include a transcritical flow with a shock wave, a
fully subcritical flow, and a transcritical flow, as shown in Fig.

We also include two additional cases from

Additionally, MacDonald-type solutions can be constructed for frictionless flumes to study the bed slope source term implementation in isolation.
We present a frictionless test case with SERGHEI-SWE that is not part of the SWASHES benchmark compilation. We discretise the bed elevation of the
flume as

Using

Analytical steady flows: flumes. SERGHEI-SWE captures moving equilibrium solution for frictionless test case, with a stable and well-balanced solution.

The

Analytical steady flows: summary of

The dam break on a wet-bed-without-friction test case is configured by setting water depths in the domain as

Dam break on wet bed without friction: model predictions for different number of grid cells. SERGHEI-SWE converges to the analytical solution (Stoker's solution) as the grid is refined.

Analytical dam break:

Radially symmetrical paraboloid: snapshots of water depth by the model compared to the analytical solution (contour lines). Period

Using the same computational domain and bed topography as the case in Sect.

Simulated and experimental results for the laboratory-scale tsunami case at gauges G1

Simulated (black lines) and experimental (red points) transient water depths at seven gauge points (

Simulated (lines) and experimental (points) water depth profiles at

Simulated hydrographs compared to experimental data from

A

The computational domain was discretised with a 380

A laboratory-scale experiment of a dam break over an idealised urban area was reported by

SERGHEI is available through GitLab, at

A repository containing test cases is available at

Additional convenient pre- and post-processing tools are also available at

DCV contributed to conceptualisation, investigation, software development, model validation, visualisation, and writing. MMH contributed to conceptualisation, methodology design, software development, formal analysis, model validation, and writing. MRN contributed to software development. IÖX contributed to formal analysis, software development, model validation, visualisation, and writing.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors gratefully acknowledge the Earth System Modelling Project (ESM) for supporting this work by providing computing time on the ESM partition of the JUWELS supercomputer at the Jülich Supercomputing Centre (JSC) through the compute time project Runoff Generation and Surface Hydrodynamics across Scales with the SERGHEI model (RUGSHAS), project no. 22686. This work used resources of the National Energy Research Scientific Computing Center (NERSC), a US Department of Energy, Office of Science, user facility operated under contract no. DE-AC02-05CH11231. This research was also supported by the US Air Force Numerical Weather Modelling programme and used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is a US Department of Energy (DOE) Office of Science User Facility.

The article processing charges for this open-access publication were covered by the Forschungszentrum Jülich.

This paper was edited by Charles Onyutha and reviewed by Reinhard Hinkelmann and Kenichiro Kobayashi.