<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">GMD</journal-id><journal-title-group>
    <journal-title>Geoscientific Model Development</journal-title>
    <abbrev-journal-title abbrev-type="publisher">GMD</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Geosci. Model Dev.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1991-9603</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/gmd-19-5623-2026</article-id><title-group><article-title>CaMa-Flood-GPU: a GPU-based hydrodynamic model implementation for scalable global simulations</article-title><alt-title>CaMa-Flood-GPU: GPU hydrodynamic model</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Kang</surname><given-names>Shengyu</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Yin</surname><given-names>Jiabo</given-names></name>
          <email>jboyn@whu.edu.cn</email>
        <ext-link>https://orcid.org/0000-0002-2305-8729</ext-link></contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff2">
          <name><surname>Yamazaki</surname><given-names>Dai</given-names></name>
          <email>yamadai@iis.u-tokyo.ac.jp</email>
        </contrib>
        <aff id="aff1"><label>1</label><institution>State Key Laboratory of Water Resources Engineering and Management, Wuhan University, Wuhan, P.R. China</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Institute of Industrial Science, The University of Tokyo, Tokyo, Japan</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Jiabo Yin (jboyn@whu.edu.cn) and Dai Yamazaki (yamadai@iis.u-tokyo.ac.jp)</corresp></author-notes><pub-date><day>29</day><month>June</month><year>2026</year></pub-date>
      
      <volume>19</volume>
      <issue>12</issue>
      <fpage>5623</fpage><lpage>5640</lpage>
      <history>
        <date date-type="received"><day>27</day><month>December</month><year>2025</year></date>
           <date date-type="rev-request"><day>5</day><month>February</month><year>2026</year></date>
           <date date-type="rev-recd"><day>10</day><month>June</month><year>2026</year></date>
           <date date-type="accepted"><day>21</day><month>June</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Shengyu Kang et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://gmd.copernicus.org/articles/19/5623/2026/gmd-19-5623-2026.html">This article is available from https://gmd.copernicus.org/articles/19/5623/2026/gmd-19-5623-2026.html</self-uri><self-uri xlink:href="https://gmd.copernicus.org/articles/19/5623/2026/gmd-19-5623-2026.pdf">The full text article is available as a PDF file from https://gmd.copernicus.org/articles/19/5623/2026/gmd-19-5623-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e107">Floods are among the costliest natural hazards, demanding scalable models to simulate river and floodplain dynamics at a global scale. The Catchment-based Macro-scale Floodplain (CaMa-Flood) model is a leading system for this purpose, but its CPU-based implementation is computationally demanding. This paper introduces CaMa-Flood-GPU, a fundamental refactoring of the model optimized for Graphics Processing Unit (GPU) architectures. We systematically reinterpreted its core algorithms – including river routing on irregular networks, runoff interpolation, and water depth diagnosis – into highly parallel, GPU-native operations. Key challenges were addressed by implementing scatter-add for flux updates, sparse matrix multiplication for runoff mapping, and branchless kernels for floodplain dynamics, all while preserving the original model's physical fidelity. Implemented in Python with Triton kernels and PyTorch, CaMa-Flood-GPU achieves multi-GPU scalability through optimized communication patterns that minimize synchronization overhead. The software adopts a modular structure with optional components (e.g., bifurcation routing, adaptive time stepping) and flexible data interfaces. Benchmarks demonstrate an order-of-magnitude speedup over a 192-core CPU baseline and near-linear scaling on multiple GPUs, with negligible numerical differences from the original model. This performance leap reduces simulation times for high-resolution global runs from days to hours, enabling larger ensembles and rapid scenario analysis. By providing a reproducible and efficient tool, CaMa-Flood-GPU lowers the barrier for adopting GPU acceleration in large-scale hydrology. The released implementation provides a reproducible reference for future method development.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>National Natural Science Foundation of China</funding-source>
<award-id>T2522026</award-id>
<award-id>52441902</award-id>
<award-id>W2521014</award-id>
<award-id>52361145864</award-id>
<award-id>W2421111</award-id>
</award-group>
<award-group id="gs2">
<funding-source>Natural Science Foundation of Hubei Province</funding-source>
<award-id>2024AFA055</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

      
<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e121">Floods are among the costliest and most widespread natural hazards, motivating the development of scalable hydrodynamic models for river routing and floodplain dynamics at continental to global scales <xref ref-type="bibr" rid="bib1.bibx16 bib1.bibx4 bib1.bibx6" id="paren.1"/>. While detailed channel processes can be simulated with one-dimensional models (e.g., HEC-RAS), capturing floodplain dynamics requires two-dimensional (2D) models that solve the shallow-water equations on high-resolution grids <xref ref-type="bibr" rid="bib1.bibx30" id="paren.2"/>. However, applying such 2D models to large river basins is computationally prohibitive for many applications. To improve efficiency, simplified models such as LISFLOOD-FP adopt the local inertial formulation, which neglects the convective-acceleration term in the shallow-water momentum equation while retaining gravity and bed-friction terms; this preserves the propagation of flood waves at a fraction of the cost of the full Saint-Venant equations <xref ref-type="bibr" rid="bib1.bibx2 bib1.bibx5 bib1.bibx24" id="paren.3"/>. A further-scalable alternative is the sub-grid inundation approach: instead of solving the 2D shallow-water equations explicitly, the inundated area and depth within each computational unit are diagnosed from a pre-computed sub-grid topographic profile, given the unit's total water storage <xref ref-type="bibr" rid="bib1.bibx31" id="paren.4"/>. The Catchment-based Macro-scale Floodplain model (CaMa-Flood) is a leading example of this approach, enabling efficient yet physically-based global flood simulations <xref ref-type="bibr" rid="bib1.bibx32" id="paren.5"/>. CaMa-Flood introduced a novel vectorized unit-catchment discretization of the river network, a departure from traditional uniform grids <xref ref-type="bibr" rid="bib1.bibx32" id="paren.6"/>. Among available global routing models, CaMa-Flood provides an established catchment-based representation of channel storage, floodplain storage and river-network routing. We selected CaMa-Flood as the baseline model for our GPU implementation; it has been adopted as the offline river-routing layer in land-surface and global hydrological models, with benchmark comparisons reporting measurable gains in discharge reproducibility relative to the native routing schemes <xref ref-type="bibr" rid="bib1.bibx37 bib1.bibx10" id="paren.7"/>.</p>
      <p id="d2e146">In the CaMa-Flood scheme, the world's river basins are divided into numerous irregularly shaped sub-basins (catchments), each treated as a computational unit. The water balance for each catchment <inline-formula><mml:math id="M1" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula> is the core of the model:

          <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M2" display="block"><mml:mrow><mml:msubsup><mml:mi>S</mml:mi><mml:mi>c</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mi>c</mml:mi><mml:mi>t</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mi>c</mml:mi><mml:mi>t</mml:mi></mml:msubsup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:munder><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>∈</mml:mo><mml:mtext>upstream</mml:mtext><mml:mo>(</mml:mo><mml:mi>c</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:munder><mml:msubsup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>→</mml:mo><mml:mi>c</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:msubsup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mo>→</mml:mo><mml:mi>d</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        where <inline-formula><mml:math id="M3" display="inline"><mml:mrow><mml:msubsup><mml:mi>S</mml:mi><mml:mi>c</mml:mi><mml:mi>t</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> is the water storage, <inline-formula><mml:math id="M4" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mi>c</mml:mi><mml:mi>t</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> is the runoff input, <inline-formula><mml:math id="M5" display="inline"><mml:mrow><mml:msubsup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>→</mml:mo><mml:mi>c</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> is the inflow from upstream neighbor <inline-formula><mml:math id="M6" display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M7" display="inline"><mml:mrow><mml:msubsup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mo>→</mml:mo><mml:mi>d</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> is the outflow to the downstream catchment <inline-formula><mml:math id="M8" display="inline"><mml:mi>d</mml:mi></mml:math></inline-formula>. The outflow <inline-formula><mml:math id="M9" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mo>→</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is computed using a local inertial approximation of the shallow-water momentum equations. This formulation neglects convective acceleration but preserves gravity and friction effects, leading to a one-dimensional momentum equation for the flow in each river link:

          <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M10" display="block"><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>∂</mml:mo><mml:mi>Q</mml:mi></mml:mrow><mml:mrow><mml:mo>∂</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>+</mml:mo><mml:mi>g</mml:mi><mml:mi>A</mml:mi><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>∂</mml:mo><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mo>∂</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>+</mml:mo><mml:mi>g</mml:mi><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msup><mml:mi>n</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mi>Q</mml:mi><mml:mo>|</mml:mo><mml:mi>Q</mml:mi><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mi>A</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">h</mml:mi><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        where <inline-formula><mml:math id="M11" display="inline"><mml:mi>Q</mml:mi></mml:math></inline-formula> is the discharge, <inline-formula><mml:math id="M12" display="inline"><mml:mi>A</mml:mi></mml:math></inline-formula> is the cross-sectional flow area, <inline-formula><mml:math id="M13" display="inline"><mml:mi>h</mml:mi></mml:math></inline-formula> is the water surface elevation, <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi mathvariant="normal">h</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the hydraulic radius, and <inline-formula><mml:math id="M15" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> is the Manning roughness coefficient. CaMa-Flood employs an explicit time integration of this equation, where the new outflow is computed based on the water level difference <inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mi>d</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> between adjacent catchments. CaMa-Flood is an instance of the catchment-based macro-scale floodplain (CMF) approach <xref ref-type="bibr" rid="bib1.bibx32 bib1.bibx31" id="paren.8"/>, which discretises every river basin into irregular unit-catchments of roughly 5–50 km extracted from MERIT Hydro <xref ref-type="bibr" rid="bib1.bibx34" id="paren.9"/> rather than from a regular grid. Within each unit-catchment, water mass is partitioned into a channel storage and a floodplain storage, exchanged according to the channel bank-full geometry. Two simplifying assumptions are imposed: no significant topographic depressions inside a unit-catchment, and a spatially uniform water surface across its floodplain. Under these assumptions, sub-grid topographic profiles, precomputed as the cumulative distribution of high-resolution DEM elevations, diagnose water depth and inundation extent directly from total storage, so no 2D shallow-water solve is needed between neighbouring unit-catchments. Channel routing along the river network follows the local inertial momentum equation <xref ref-type="bibr" rid="bib1.bibx2 bib1.bibx5" id="paren.10"/>, with the outlet pixel of each unit-catchment supplying an absolute reference elevation so that water-surface gradients, and therefore backwater and flow reversal, can be represented even between coarse-resolution units. Recent extensions remove the single downstream flow path constraint and represent channel bifurcations as additional divergent flows driven by the same local inertial equation <xref ref-type="bibr" rid="bib1.bibx33 bib1.bibx20" id="paren.11"/>.</p>
      <p id="d2e507">Over the years, CaMa-Flood has been continually enhanced to improve its realism and applicability. For instance, <xref ref-type="bibr" rid="bib1.bibx33" id="text.12"/> extended it to represent river bifurcations and delta channel networks, and high-resolution datasets like MERIT Hydro <xref ref-type="bibr" rid="bib1.bibx34" id="paren.13"/> have been incorporated to define channel geometry and catchment parameters globally. Some implementations have introduced adaptive time stepping to ensure numerical stability without significantly compromising efficiency. In an adaptive scheme, the model adjusts the time step <inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:math></inline-formula> based on a Courant-Friedrichs-Lewy condition related to the fastest wave speed in the domain <xref ref-type="bibr" rid="bib1.bibx14" id="paren.14"/>. For example, one can require

          <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M18" display="block"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi><mml:mo>&lt;</mml:mo><mml:mi>f</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:msqrt><mml:mrow><mml:mi>g</mml:mi><mml:msub><mml:mi>h</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msqrt></mml:mfrac></mml:mstyle><mml:mspace linebreak="nobreak" width="1em"/><mml:mtext>for all river reaches </mml:mtext><mml:mi>i</mml:mi><mml:mo>→</mml:mo><mml:mi>j</mml:mi><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        where <inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the local water depth, <inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the length of the river reach from catchment <inline-formula><mml:math id="M21" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> to <inline-formula><mml:math id="M22" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:mi>f</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> is a safety factor (e.g. 0.7) ensuring stability. In practice, the smallest allowable <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:math></inline-formula> from this criterion (across all reaches) is used as the adaptive time step for that update interval. By dynamically limiting the time step based on wave celerity, the model avoids numerical instability even in steep or high-flow segments while maximizing efficiency in calmer regions. Due to its balanced fidelity and efficiency, CaMa-Flood has been widely used in global hydrological studies – from estimating present-day flood hazards to projecting future flood risks under climate change <xref ref-type="bibr" rid="bib1.bibx11 bib1.bibx18" id="paren.15"/>, and it has been coupled with climate and land-surface models to route runoff in Earth system simulations <xref ref-type="bibr" rid="bib1.bibx13 bib1.bibx19 bib1.bibx7" id="paren.16"/>.</p>
      <p id="d2e646">Despite these advances, running CaMa-Flood at very high spatial resolutions or with large ensembles remains computationally challenging on conventional CPUs. The model's irregular domain (hundreds of thousands of catchments with diverse sizes) and the need for small time steps in large rivers mean that global simulations at fine scales (on the order of 2–5 km catchments) can require many hours to days of runtime on a multi-core CPU. This limitation motivates the use of modern high-performance computation architectures, in particular Graphics Processing Units (GPUs), to accelerate global flood modeling. GPUs have become a cornerstone of scientific computing because of their massive parallelism, and they have been successfully applied to hydrodynamic models in recent years. GPU acceleration is increasingly being adopted across Earth-system model components, from atmospheric chemistry kernels in coupled climate-chemistry models <xref ref-type="bibr" rid="bib1.bibx1" id="paren.17"/> to flood hydrodynamics. In the latter, GPU-based modelling has matured along three complementary directions. First, for the full 2D shallow-water equations, multi-GPU and single-GPU codes deliver order-of-magnitude speed-ups for high-resolution flood-inundation studies on regular meshes <xref ref-type="bibr" rid="bib1.bibx22" id="paren.18"/>. Second, for simplified local inertial or kinematic formulations, GPU ports of established CPU floodplain models extend GPU acceleration from urban catchments to continental flood mapping while retaining sub-grid topographic detail <xref ref-type="bibr" rid="bib1.bibx27 bib1.bibx25 bib1.bibx3" id="paren.19"/>. Third, for integrated land-surface and routing, GPU acceleration of hydrology models <xref ref-type="bibr" rid="bib1.bibx12" id="paren.20"/> and GPU-resident machine-learning surrogates of routing <xref ref-type="bibr" rid="bib1.bibx36" id="paren.21"/> demonstrate that the throughput unlocked by GPUs makes ensemble and global-domain experiments tractable. Most of the above target single-GPU execution on regular or quasi-regular meshes at sub-continental scale. Global river-routing on irregular unit-catchment networks, the regime in which CaMa-Flood operates and which is mandatory for properly representing bifurcations, deltas and floodplain storage on a global mesh, has so far not been ported to GPU. CaMa-Flood-GPU is, to our knowledge, the first multi-GPU implementation of a global, irregular, bifurcation-aware routing model.</p>
      <p id="d2e665">The remaining gap therefore lies in global-scale river routing models such as CaMa-Flood, whose irregular unit-catchment networks pose challenges that differ fundamentally from those encountered in regional 2D GPU flood models. This likely stems from several intrinsic characteristics of large-scale hydrodynamic models that make GPU computation more challenging: (1) In 2D models, the connectivity between computational cells is uniform and limited to adjacent neighbors on a regular grid, whereas in global river models, the river network topology is defined by irregular upstream–downstream relationships that must be handled explicitly <xref ref-type="bibr" rid="bib1.bibx21 bib1.bibx32" id="paren.22"/>. (2) In 2D models, the relationship between water storage and water level within each grid cell is generally linear or prescribed by a simple function, but global river models employ sub-grid floodplain topography, resulting in a nonlinear and spatially variable relationship <xref ref-type="bibr" rid="bib1.bibx32" id="paren.23"/>. (3) While 2D models use uniformly shaped grid cells as computational units, global river models discretize the land surface into irregular catchment-based units. This introduces interpolation across variable areas and greatly increases the total number of computational elements when performing high-resolution global simulations <xref ref-type="bibr" rid="bib1.bibx34" id="paren.24"/>, making efficient parallelization far more demanding.</p>
      <p id="d2e677">In this study, we aim to clarify and overcome the fundamental challenges that have hindered the application of GPU acceleration to global-scale river models. Using CaMa-Flood as a representative example, our objectives are threefold: (1) to identify which aspects of its model structure and algorithms limit efficient GPU computation, (2) to explore and select appropriate GPU libraries and kernel implementations capable of addressing these limitations, and (3) to achieve global river simulations at kilometer-scale (<inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> arcmin) resolution within a practical computational cost. To this end, we developed CaMa-Flood-GPU, a GPU-based reimplementation of the CaMa-Flood hydrodynamic core that reformulates the original CPU algorithms through computational science techniques optimized for modern GPU architectures. This work aims to bridge the gap between hydrological modeling and high-performance computing, providing a foundation for scalable, physically consistent global flood simulations at resolutions and runtimes that were previously impractical.</p>
      <p id="d2e690">In the following sections, we detail the design and evaluation of CaMa-Flood-GPU. Section <xref ref-type="sec" rid="Ch1.S2"/> examines the original CaMa-Flood algorithms to identify which aspects of their structure and computation limit efficient GPU acceleration, and then proposes schemes to resolve these bottlenecks through a redesigned hydrodynamic core. Section <xref ref-type="sec" rid="Ch1.S3"/> presents results from a series of tests, including performance benchmarks to evaluate the speed and scalability of the GPU model across different hardware configurations, as well as validation of its numerical correctness against the CPU version. We discuss the speedups observed and analyze the remaining bottlenecks. Section <xref ref-type="sec" rid="Ch1.S4"/> offers conclusions, highlighting the implications of this work and potential future developments. Through this paper, we aim to demonstrate that GPU acceleration can substantially empower global flood modeling, and we provide the tools and references to facilitate broader adoption and further improvements of such approaches.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Implementation</title>
      <p id="d2e707">Implementing an efficient GPU version of a large-scale hydrodynamic model requires more than simply rewriting the original Fortran code in Python or replacing explicit loops with vectorized array operations. GPUs operate under a fundamentally different execution model from CPUs – one that favors massive, uniform parallelism and coalesced memory access. Consequently, a direct code translation of CaMa-Flood's CPU algorithms would yield suboptimal performance and potentially lose the model's numerical stability. Instead, we reinterpreted the core computational patterns of CaMa-Flood in terms of GPU-native primitives, systematically identifying and employing appropriate libraries and kernels that align with the model’s hydrodynamic equations and data structures. This design philosophy enables us to integrate computational science insights – particularly those related to memory hierarchy and parallel reduction – into a physically based global river model, allowing high-performance GPU solvers to be applied to global river routing for the first time.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e712">Schematic architecture of the CaMa-Flood-GPU system. Input data (terrain, river network, initial conditions) are partitioned across GPUs via domain sharding. The modular <monospace>CaMaFloodModel</monospace> coordinates GPU kernels at each time step, with optional sub-modules for adaptive time stepping, bifurcation flows, and logging. Outputs are collected by a <monospace>StatisticsAggregator</monospace>, combined by a multi-rank reader, and written for final results and post-processing.</p></caption>
        <graphic xlink:href="https://gmd.copernicus.org/articles/19/5623/2026/gmd-19-5623-2026-f01.png"/>

      </fig>

      <p id="d2e727">The implementation of CaMa-Flood-GPU follows a layered, modular architecture  to separate concerns and maximize flexibility (Fig. <xref ref-type="fig" rid="F1"/>). At the highest level, a <monospace>CaMaFloodModel</monospace> class controls the simulation, coordinating data flow and computation across modules and GPUs. The model is constructed with a registry of sub-modules, which implement optional features such as bifurcation flows, adaptive time-stepping, and logging/diagnostics. Underneath the model, we define abstract base classes for data handling and computational modules, allowing multiple implementations or extensions. For example, the input data interface is abstracted so that different forcing datasets (binary files, NetCDF files) can be used without changing the core model code. The model and modules together manage the GPU memory for all relevant state variables (such as water storage, discharge, etc.) and ensure that each GPU holds only the portion of data needed for its assigned catchments. By organizing the code in this modular way, we facilitate customization (users can enable/disable components via configuration) and make the system easier to maintain or extend (each module focuses on one aspect, e.g., bifurcation handling or time-step adaptation).</p>
      <p id="d2e736">The implementation strategy is twofold. First, we focus on overcoming the key performance challenges inherent in porting a global, irregular-network model to a massively parallel GPU architecture. This involves addressing issues of data representation and communication overhead. Second, we aim to build a system that offers flexible customization, allowing users to easily adapt the model for different scientific applications, data sources, and regional focuses. The following sections detail our approach to these two goals.</p>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Challenges</title>
<sec id="Ch1.S2.SS1.SSS1">
  <label>2.1.1</label><title>Irregular network topology</title>
      <p id="d2e753">The core difficulty in adapting CaMa-Flood for GPUs is representing its irregular, directed catchment network (Fig. <xref ref-type="fig" rid="F2"/>a) in a way that maps efficiently to parallel hardware. In the CaMa-Flood river network map, each grid cell, or unit-catchment, has exactly one downstream connection, forming a topology where water flows from many upstream sources toward a single outlet. From a computer science viewpoint, this structure can be encoded as a directed graph where each node (catchment) has an out-degree of one, except for terminal nodes (outlets). However, the introduction of bifurcation flows in later versions of CaMa-Flood (Yamazaki et al., 2014b) complicates this topology, allowing some nodes to have an out-degree greater than one. This forms a collection of directed trees rooted at the river mouths. A key computational challenge in this graph is the need to sum fluxes from a variable number of neighboring catchments. The local inertial model used in CaMa-Flood allows for backwater effects, where flow is not always unidirectional; reverse flow can occur if the water surface gradient changes. This means that at any given time, any neighbor can become an upstream contributor, requiring a flexible summation of fluxes. This <italic>gather</italic> or <italic>reduction</italic> operation over an irregular graph is a classic problem in parallel computing (Fig. <xref ref-type="fig" rid="F3"/>a), as it leads to irregular memory access patterns and potential load imbalance, which are inefficient on GPUs that thrive on structured, uniform workloads.</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e768">Illustration of catchment–grid misalignment. <bold>(a)</bold> Irregular catchment network with river, outlet, basin boundary, and flow direction. <bold>(b)</bold> Regular input grid of runoff cells, where grid cells do not coincide with catchment boundaries.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/19/5623/2026/gmd-19-5623-2026-f02.png"/>

          </fig>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e785">Illustration of parallel flux summation. <bold>(a)</bold> A schematic river network where numbered squares are unit catchments. The numbers on them represent the flux index, which in traditional methods dictates a sequential processing order (e.g., main flows, then bifurcations). <bold>(b)</bold> For main flows, outflow from each source catchment (top green array) is added in parallel to its downstream target (bottom blue array). <bold>(c)</bold> A similar parallel summation is used for bifurcation flows. Atomic operations are required to prevent race conditions when multiple source catchments flow into the same target, ensuring mass conservation.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/19/5623/2026/gmd-19-5623-2026-f03.png"/>

          </fig>

      <p id="d2e804">Fortunately, this challenge is a well-studied problem in computational science, allowing us to reframe the algorithm by adapting established parallel computing solutions. To solve this, we leverage the highly-parallelized and extensively-optimized <monospace>scatter_add</monospace> operation available in the Triton language. In brief, a <monospace>scatter_add</monospace> distributes outflows from many source catchments into their downstream destinations in a single parallel pass; whenever several sources write to the same destination, as at confluences and bifurcation receivers, <monospace>atomic_add</monospace> serializes those individual increments at the hardware level, so the accumulated inflow is order-independent and mass-conserving without explicit locks. Instead of a <italic>gather</italic> operation where each catchment would need to read from multiple upstream locations, we use a <italic>scatter</italic> approach. As illustrated in Fig. <xref ref-type="fig" rid="F3"/>b–c, the flux calculated for each river reach is atomically subtracted from the upstream catchment's storage and added to the downstream catchment's storage. This operation enables thousands of threads to safely and simultaneously update shared variables without conflicts, automatically managing the order of memory access. In essence, it lets every catchment update its inflow and outflow at once, while the GPU hardware ensures that no data are lost or overwritten – greatly improving both speed and efficiency compared to sequential processing. A potential race condition occurs when multiple upstream catchments attempt to write to the same downstream catchment's buffer simultaneously. Triton's <monospace>scatter_add</monospace> implementation automatically handles this by using <italic>atomic add operations</italic>, which guarantee that concurrent updates to the same memory location are correctly serialized, thus ensuring mass conservation without manual synchronization. We encode the river topology using index-based data structures. Catchments are renumbered in a topological order (upstream to downstream), and their state variables (e.g., water storage) are stored in structure-of-arrays format. This layout ensures that memory access is mostly contiguous and coalesced. Upstream connectivity is captured by compact adjacency lists, which allows for efficient, parallel summation using segmented reduction algorithms. This approach regularizes the execution over the irregular graph: GPU kernels iterate over contiguous index ranges, while indirect addressing handles the underlying graph connectivity.</p>
</sec>
<sec id="Ch1.S2.SS1.SSS2">
  <label>2.1.2</label><title>Runoff interpolation</title>
      <p id="d2e839">Another challenge arising from the irregular network topology is the mismatch between gridded external forcing and our catchment units (Fig. <xref ref-type="fig" rid="F2"/>b). Most runoff products are delivered on latitude-longitude grids that do not align with catchment boundaries. At each time step, the forcing reader supplies runoff from an upstream land-surface model (LSM) to the dataset object. If using a global grid, <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the number of runoff-generating land grid cells, i.e., the output cells of an upstream LSM. The resulting runoff vector is then passed to <monospace>shard_forcing</monospace>, which maps the runoff-generating cells to the local GPU catchments. The original CaMa-Flood model addresses this by pre-computing an input matrix that maps runoff grid cells to catchment units. This is accomplished using a dedicated program, which calculates the overlapping area between the input data grid and the river network grid, generating a dense matrix that specifies the contribution of each runoff grid cell to each catchment. While effective, this approach produces a data structure that is not optimized for modern parallel hardware.</p>

      <fig id="F4"><label>Figure 4</label><caption><p id="d2e860">Schematic of the data input and runoff aggregation process. Multiple dataset types (daily binary files, annual NetCDF files, and user-defined datasets) are supported through a common abstract data layer. Runoff inputs are mapped to catchments via a sparse weight matrix, with each GPU holding the portion relevant to its assigned catchments.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/19/5623/2026/gmd-19-5623-2026-f04.png"/>

          </fig>

      <p id="d2e869">For CaMa-Flood-GPU, we reinterpret this mapping as a sparse matrix operation, which is significantly more efficient on GPUs. We define a sparse runoff aggregation matrix that maps gridded runoff data to irregular catchments (Fig. <xref ref-type="fig" rid="F4"/>). Let <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> be the vector of runoff values for all grid cells at a given time (for instance, if using a global grid, <inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> would be the number of runoff-generating land grid cells), and let <inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> be the vector of total runoff inputs for all <inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> catchments in the model. We define a sparse matrix <inline-formula><mml:math id="M31" display="inline"><mml:mi mathvariant="bold">M</mml:mi></mml:math></inline-formula> of size <inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>×</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> such that:

              <disp-formula id="Ch1.E4" content-type="numbered"><label>4</label><mml:math id="M33" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="bold">M</mml:mi><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:math></disp-formula>

            where each element <inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the fraction of runoff in grid cell <inline-formula><mml:math id="M35" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula> that contributes to catchment <inline-formula><mml:math id="M36" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>. We compute these weights based on area overlap: if <inline-formula><mml:math id="M37" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the area of catchment <inline-formula><mml:math id="M38" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> that lies within grid cell <inline-formula><mml:math id="M39" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M40" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mo>∑</mml:mo><mml:mi>j</mml:mi></mml:msub><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the total area of catchment <inline-formula><mml:math id="M41" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>, then we set <inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>/</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (with <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> if catchment <inline-formula><mml:math id="M44" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> has no land area in cell <inline-formula><mml:math id="M45" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>). In other words, <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the proportion of catchment <inline-formula><mml:math id="M47" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>'s area that falls in the cell <inline-formula><mml:math id="M48" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>, so that the catchment's runoff <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is essentially the area-weighted average runoff over that catchment (assuming uniform runoff within each grid cell). The matrix <inline-formula><mml:math id="M50" display="inline"><mml:mi mathvariant="bold">M</mml:mi></mml:math></inline-formula> is extremely sparse – each catchment overlaps only a few grid cells, and each grid cell contributes to only a few catchments. This sparsity is ideal for GPU acceleration, as sparse matrix-vector multiplication (SpMV) is also a highly-parallelized and extensively-optimized operation in parallel computing libraries.</p>
      <p id="d2e1188">In practice, before a simulation, we either precompute or read in this matrix <inline-formula><mml:math id="M51" display="inline"><mml:mi mathvariant="bold">M</mml:mi></mml:math></inline-formula>. For multi-GPU runs, <inline-formula><mml:math id="M52" display="inline"><mml:mi mathvariant="bold">M</mml:mi></mml:math></inline-formula> is partitioned per GPU. Each GPU stores only the rows of <inline-formula><mml:math id="M53" display="inline"><mml:mi mathvariant="bold">M</mml:mi></mml:math></inline-formula> corresponding to its assigned catchments and the columns corresponding to the grid cells influencing those catchments. During the simulation, the rank 0 process reads the full runoff vector <inline-formula><mml:math id="M54" display="inline"><mml:mi mathvariant="bold-italic">r</mml:mi></mml:math></inline-formula> and broadcasts it to all other ranks. Each GPU then performs the SpMV operation <inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="bold">M</mml:mi><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:math></inline-formula> using its local sparse matrix to compute its catchment runoff inputs.</p>
      <p id="d2e1236">Beyond this optimized implementation, we provide a flexible interface for runoff inputs to handle various data sources. By abstracting the data loading and mapping procedure in a general <monospace>shard_forcing</monospace> interface, we allow for considerable customization. A user can create a new dataset subclass with a custom <monospace>shard_forcing</monospace> method that defines how to broadcast and map their particular data source onto the catchments. In particular, <monospace>shard_forcing</monospace> is intentionally designed as the coupling hand-off point for online or offline coupling with an LSM: an LSM can pass its per-time-step runoff tensor, in PyTorch or any compatible array, directly to <monospace>shard_forcing</monospace>, which performs the grid-to-catchment aggregation on-device without going through the disk. For instance, if a runoff input is provided already as catchment-specific values, <monospace>shard_forcing</monospace> could simply distribute those values directly to the appropriate GPUs without any grid-to-catchment aggregation. The core model remains unaware of these preprocessing details – as long as the dataset object supplies catchment runoff values <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> for each time step, the model will route them. We have implemented default dataset classes for common formats and provided a template for a <monospace>UserDefinedDataset</monospace> to guide custom implementations, making the system highly extensible.</p>
</sec>
<sec id="Ch1.S2.SS1.SSS3">
  <label>2.1.3</label><title>Diagnosing water depth from storage</title>
      <p id="d2e1277">A core challenge in hydrodynamic modeling is to efficiently diagnose water depth and inundation extent from storage volume without resorting to computationally expensive 2D simulations. CaMa-Flood addresses this by discretizing the river network into sub-catchments and using pre-computed topographic profiles. This method relies on two key assumptions: (1) inundation progresses from the lowest elevations upward without being trapped in local depressions, and (2) the water surface elevation is uniform across the floodplain within a single sub-catchment. Under these assumptions, the relationship between water storage, water level, and inundation extent (including the flooded fraction) is represented by a monotonic, piecewise function derived from high-resolution elevation data, establishing a one-to-one correspondence for each catchment. This preserves CaMa-Flood's sub-grid floodplain treatment while avoiding any explicit two-dimensional floodplain solve inside the GPU routing loop. Channel bifurcations and overbank-routing pathways are instead represented as additional network links whose fluxes are accumulated through the same routing reductions as the main channel. This function, which acts as a topographic profile for each unit-catchment, is pre-computed and stored. In the original CPU implementation, this continuous function is discretized into a lookup table for computational efficiency. The detailed topographic profile, derived from the cumulative distribution of relative elevations within the catchment, is simplified into a series of points. These points define a piecewise function that approximates the original complex profile, enabling fast queries during simulation by interpolating between the stored values.</p><boxed-text content-type="algorithm" position="float" id="Ch1.Prog1" specific-use="star"><label>Algorithm 1</label><caption><p id="d2e1281">Parallel diagnosis of water depth from storage on GPU.</p></caption><disp-quote content-type="algorithmic" specific-use="numbering{1}"><list>

    <list-item><label><bold>Require:</bold></label>

      <p id="d2e1291" specific-use="REQUIRE">For each catchment <inline-formula><mml:math id="M57" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>: total storage <inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">total</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, i.e. the sum of river storage, floodplain storage and incoming runoff.</p>
              </list-item>

    <list-item><label><bold>Require:</bold></label>

      <p id="d2e1324" specific-use="REQUIRE">Lookup tables (<inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">sto</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">dep</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">wdt</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">grad</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) flattened into 1D arrays in memory; here <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">sto</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> stores cumulative storage thresholds, <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">dep</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> stores corresponding depths, <inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">wdt</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> stores total widths, and <inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">grad</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> stores width–depth gradients.</p>
              </list-item>

    <list-item><label><bold>Ensure:</bold></label>

      <p id="d2e1409" specific-use="ENSURE">For each catchment <inline-formula><mml:math id="M64" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>: flood depth <inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi mathvariant="normal">fld</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, the diagnosed floodplain depth returned by the lookup/interpolation.</p>
              </list-item>

    <list-item>

      <p id="d2e1439" specific-use="STATE">{1. Find profile level (<inline-formula><mml:math id="M66" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>) using a branchless scan (parallel over catchments)}</p>
              </list-item>

    <list-item>

      <p id="d2e1453" specific-use="STATE"><inline-formula><mml:math id="M67" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>←</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula></p>
              </list-item>

    <list-item>

      <p id="d2e1472" specific-use="FOR"><bold>for</bold> <inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:mi>j</mml:mi><mml:mo>←</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi mathvariant="normal">levels</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> <bold>do</bold> <list>
    <list-item>
      <p id="d2e1506" specific-use="STATE"><inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:msub><mml:mi>o</mml:mi><mml:mi mathvariant="normal">tmp</mml:mi></mml:msub><mml:mo>←</mml:mo><mml:mi>i</mml:mi><mml:mo>⋅</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi mathvariant="normal">levels</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:math></inline-formula> {temporary memory offset}</p></list-item>
    <list-item>
      <p id="d2e1545" specific-use="STATE"><inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>←</mml:mo><mml:mtext mathvariant="bold">where</mml:mtext><mml:mo>(</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">total</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:mo>≥</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">sto</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:msub><mml:mi>o</mml:mi><mml:mi mathvariant="normal">tmp</mml:mi></mml:msub><mml:mo>]</mml:mo><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> {branchless update}</p></list-item></list></p>
              </list-item>

    <list-item>

      <p id="d2e1603" specific-use="ENDFOR"><bold>end</bold> <bold>for</bold></p>
              </list-item>

    <list-item>

      <p id="d2e1613" specific-use="STATE">{2. Compute flood depth based on the identified level (<inline-formula><mml:math id="M72" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>)}</p>
              </list-item>

    <list-item>

      <p id="d2e1627" specific-use="IF"><bold>if</bold> <inline-formula><mml:math id="M73" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> <bold>then</bold> <list>
    <list-item>
      <p id="d2e1650" specific-use="STATE"><inline-formula><mml:math id="M74" display="inline"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi mathvariant="normal">fld</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:mo>←</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula></p></list-item></list></p>
              </list-item>

    <list-item>

      <p id="d2e1675" specific-use="ELSE"><bold>else</bold> <list>
    <list-item>
      <p id="d2e1683" specific-use="STATE">{compute final memory offset o for level <inline-formula><mml:math id="M75" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>}</p></list-item>
    <list-item>
      <p id="d2e1696" specific-use="STATE"><inline-formula><mml:math id="M76" display="inline"><mml:mrow><mml:mi>o</mml:mi><mml:mo>←</mml:mo><mml:mi>i</mml:mi><mml:mo>⋅</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi mathvariant="normal">levels</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:math></inline-formula></p></list-item>
    <list-item>
      <p id="d2e1730" specific-use="STATE"><inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">prev</mml:mi></mml:msub><mml:mo>←</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">sto</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:mi>o</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>; <inline-formula><mml:math id="M78" display="inline"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi mathvariant="normal">prev</mml:mi></mml:msub><mml:mo>←</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">dep</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:mi>o</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>; <inline-formula><mml:math id="M79" display="inline"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi mathvariant="normal">prev</mml:mi></mml:msub><mml:mo>←</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">wdt</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:mi>o</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>; <inline-formula><mml:math id="M80" display="inline"><mml:mrow><mml:msub><mml:mi>G</mml:mi><mml:mi mathvariant="normal">prev</mml:mi></mml:msub><mml:mo>←</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">grad</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:mi>o</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></p></list-item>
    <list-item>
      <p id="d2e1829" specific-use="IF"><bold>if</bold> <inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi mathvariant="normal">levels</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> <bold>then</bold> <list>
    <list-item>
      <p id="d2e1855" specific-use="STATE"><inline-formula><mml:math id="M82" display="inline"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi mathvariant="normal">fld</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:mo>←</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mi mathvariant="normal">prev</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">total</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:mo>-</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">prev</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>/</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mi mathvariant="normal">prev</mml:mi></mml:msub><mml:mo>⋅</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi mathvariant="normal">riv</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></p></list-item></list></p></list-item>
    <list-item>
      <p id="d2e1932" specific-use="ELSE"><bold>else</bold> <list>
    <list-item>
      <p id="d2e1940" specific-use="STATE"><inline-formula><mml:math id="M83" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>W</mml:mi><mml:mo>←</mml:mo><mml:msqrt><mml:mrow><mml:msubsup><mml:mi>W</mml:mi><mml:mi mathvariant="normal">prev</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>⋅</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">total</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:mo>-</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">prev</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>/</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mi mathvariant="normal">prev</mml:mi></mml:msub><mml:mo>⋅</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi mathvariant="normal">riv</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:msqrt><mml:mo>-</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mi mathvariant="normal">prev</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></p></list-item>
    <list-item>
      <p id="d2e2024" specific-use="STATE"><inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi mathvariant="normal">fld</mml:mi></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:mo>←</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mi mathvariant="normal">prev</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>W</mml:mi><mml:mo>⋅</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mi mathvariant="normal">prev</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></p></list-item></list></p></list-item>
    <list-item>
      <p id="d2e2065" specific-use="ENDIF"><bold>end</bold> <bold>if</bold></p></list-item></list></p>
              </list-item>

    <list-item>

      <p id="d2e2075" specific-use="ENDIF"><bold>end</bold> <bold>if</bold></p>
              </list-item>
            </list></disp-quote></boxed-text>
      <p id="d2e2084">Porting this concept to a GPU requires an implementation that preserves massive parallelism. A naive, direct translation of the CPU logic might involve conditional branching (e.g., “if-else” statements) to find the correct interval in the lookup table for each catchment. However, such branching can severely degrade GPU performance by causing threads within the same warp to diverge and execute different code paths. To avoid this, we adopted a branchless approach. The topographic profiles for all catchments are organized as 2D lookup tables (catchment ID <inline-formula><mml:math id="M85" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> profile level), which are then flattened into one-dimensional arrays in GPU memory. We then implement a dedicated GPU kernel where each thread, assigned to a single catchment, iterates through all predefined profile levels in a fixed loop. A <monospace>where</monospace> clause (a conditional assignment) is used to update the water level only when the total storage exceeds the storage value at that level. This ensures every thread performs the exact same sequence of operations, eliminating thread divergence. Once the correct level is identified, the kernel computes the memory address offset for that catchment's profile data and performs an interpolation to find the precise flood depth. The process, detailed in Algorithm <xref ref-type="other" rid="Ch1.Prog1"/>, is executed in parallel for all catchments. This diagnostic step is an embarrassingly parallel problem, because the depth calculation for each catchment is entirely self-contained and does not depend on any other. The inundation diagnosis is based on the uniform-water-surface assumption within each unit-catchment, so it adds no inter-GPU communication. Water exchange outside the main downstream river path is represented only where the CaMa-Flood bifurcation and overbank-flow scheme defines additional routing links; those links are treated as explicit edges of the routing graph and are included in the basin-group decomposition. This means the GPU can process every catchment simultaneously, with each thread working on its own assigned catchment without needing to communicate or wait for others. In essence, the entire system of catchments can have its water depth updated at the same instant, fully leveraging the GPU's ability to handle millions of independent tasks at once. This allows the model to update the hydraulic state (water level, depth) for millions of catchments simultaneously, fully leveraging the GPU's massively parallel architecture and making it a highly efficient component of the simulation loop.</p>
</sec>
<sec id="Ch1.S2.SS1.SSS4">
  <label>2.1.4</label><title>Communication and overhead</title>
      <p id="d2e2107">The remaining key challenge in scaling hydrodynamic models to multiple GPUs is managing communication and data handling overhead. For multi-GPU runs, we decompose the reordered river network by basin group rather than by individual unit-catchment. Let <inline-formula><mml:math id="M86" display="inline"><mml:mi mathvariant="script">C</mml:mi></mml:math></inline-formula> be the set of all unit-catchments and let <inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> be the primitive basin draining to river mouth <inline-formula><mml:math id="M88" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula>. Cross-basin bifurcation links define an undirected graph among these primitive basins; each connected component of this graph is treated as one basin group <inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:msub><mml:mi>G</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. The global state vector is then ordered as <inline-formula><mml:math id="M90" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>=</mml:mo><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mrow><mml:msub><mml:mi>G</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mrow><mml:msub><mml:mi>G</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mrow><mml:msub><mml:mi>G</mml:mi><mml:mi>K</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, where basin groups are sorted by decreasing size and the entries within each <inline-formula><mml:math id="M91" display="inline"><mml:mrow><mml:msub><mml:mi>G</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> follow the upstream-to-downstream topological order of the main river network. For <inline-formula><mml:math id="M92" display="inline"><mml:mi>P</mml:mi></mml:math></inline-formula> GPU ranks, basin groups are assigned by a longest-processing-time-first (LPT) greedy rule: processing <inline-formula><mml:math id="M93" display="inline"><mml:mrow><mml:msub><mml:mi>G</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> in decreasing <inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula>, we assign it to the rank with the smallest current load. The subdomain owned by rank <inline-formula><mml:math id="M95" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> is therefore the union of all basin groups assigned to that rank. Because no bifurcation-formed basin group is split across ranks, all main-channel and bifurcation links used by the local inertial update connect unit-catchments inside the same rank. The layout keeps state arrays contiguous, balances the number of local unit-catchments among GPUs, and removes the need for peer-to-peer halo exchange during time stepping. With this multi-GPU decomposition in place, the remaining scalability bottlenecks no longer come from neighboring subdomains exchanging halo data, but from the global data movement and synchronization that still couple the ranks. Efficient GPU computation can then be easily undermined by bottlenecks in two main areas: (1) reading and distributing input data (e.g., runoff forcing) across all GPUs at each time step, and (2) synchronizing diagnostic variables and logging results, especially when using features like adaptive time-stepping. For input data, reading large files from disk is an I/O-bound operation that can stall the entire simulation if each GPU performs it independently. For outputs and internal diagnostics, frequent synchronization across GPUs – for example, to calculate a global time step or to verify mass balance – can introduce significant latency from communication, limiting overall performance.</p>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e2249">Runtime workflow for asynchronous input preparation, multi-GPU computation, and logging over two consecutive time steps.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/19/5623/2026/gmd-19-5623-2026-f05.png"/>

          </fig>

      <p id="d2e2258">To address these overheads, we implemented a suite of asynchronous and optimized data handling strategies (Fig. <xref ref-type="fig" rid="F5"/>). For input data, we use a multi-process data loader. While the GPUs are computing the current step, background worker processes are already pre-fetching and preparing the data for the next step, placing it into a memory buffer. This ensures that when the GPUs are ready for new input, the data is already in memory and can be quickly broadcast, effectively hiding the I/O latency. For output, we offload file writing to separate threads, preventing the main simulation loop from stalling. By construction of the basin-group decomposition described above, every pair of unit-catchments coupled by the local inertial step, whether through the main network or through a bifurcation link, lies inside a single bifurcation-formed basin group, and every such group is owned by a single GPU rank. Reading the downstream water state therefore reduces to a contiguous local memory access, and no peer-to-peer halo exchange between GPUs is required at run time. The only inter-GPU traffic per step is the three collective operations enumerated below.</p>
      <p id="d2e2264">For logging and synchronization, we employ a local accumulation and end-of-step reduction strategy. Instead of performing a global reduction at every sub-step, each GPU accumulates diagnostic variables in a local buffer. All GPUs therefore share the same global sub-step, taken as the minimum required <inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:math></inline-formula> across the whole domain, so that the GPU integration trajectory remains identical to the reference CPU run regardless of the number of GPUs. Only at the end of a full time step is a single, collective communication performed to aggregate the results for logging. This minimizes cross-GPU traffic, ensuring that communication occurs only when necessary. As a result, each time step involves three collective operations: the runoff broadcast, the all-rank minimum reduction used to select the global adaptive sub-step, and the end-of-step gather/reduction of diagnostic output. The runoff broadcast carries only <inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>r</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> floats per step, amounting to on the order of tens of megabytes at the resolutions tested, and is dominated by inter-GPU bandwidth, so its cost stays a small fraction of the per-step compute even at multi-GPU scale. This design significantly reduces communication overhead and allows the model to scale efficiently across multiple GPUs.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Flexible customization</title>
      <p id="d2e2297">CaMa-Flood-GPU also allows customizing the simulation domain and outputs to focus on specific geographic locations of interest, such as gauge stations (Fig. <xref ref-type="fig" rid="F6"/>). Instead of running the model on the entire globe, users can define a domain centered on their locations of interest by providing a list of catchment IDs or coordinates corresponding to gauges. The model then trims the global catchment graph to preserve only the subgraph that hydrologically contributes to those locations, ensuring that upstream water transport relevant to the gauges is preserved while unrelated portions of the network are removed. This strategy minimizes unnecessary computation and enables efficient, targeted simulations for comparison with observations or for calibration at selected sites.</p>

      <fig id="F6" specific-use="star"><label>Figure 6</label><caption><p id="d2e2304">Example of a customized simulation domain constructed after point-of-interest filtering and basin-group domain decomposition. The unit-catchment is the computational element; primitive basins are collections of unit-catchments draining to the same river mouth before cross-basin bifurcations are considered. Coloured patches denote retained basin groups, each containing the upstream unit-catchments needed for at least one point of interest; grey areas mark excluded basin groups. Some retained patches are spatially large because a downstream point of interest can require preserving an entire upstream primitive basin, and cross-basin bifurcation links can merge several primitive basins into one basin group. Markers indicate river-mouth locations and blue lines denote cross-basin bifurcation links.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/5623/2026/gmd-19-5623-2026-f06.png"/>

        </fig>

      <p id="d2e2313">Beyond spatial focus, the output variables and statistics produced by CaMa-Flood-GPU are highly configurable. The user can specify which model variables to record and choose what statistics to save for each variable on each time step. Supported statistical measures include, for instance, the mean over the time step (averaging across all sub-steps), the maximum and minimum during the time step, or simply the final value at the end of the time step. Moreover, instead of saving full 2D fields for all catchments globally, the user can limit the output to particular locations or catchments – such as the set of gauge points mentioned above, or any arbitrary subset of catchments that are of interest. This selective output greatly reduces storage requirements and post-processing effort for large simulations. To implement this efficiently, we developed a <monospace>StatisticsAggregator</monospace> class that performs on-the-fly computation of the requested statistics at the end of each time step, only for the selected output catchments, and streams the results to disk incrementally. The <monospace>StatisticsAggregator</monospace> uses a fused-kernel approach on the GPU to minimize overhead: essentially, it dynamically generates a single optimized GPU kernel (using the Triton just-in-time compiler) that will compute all the requested statistics for a given group of saved catchments in one pass through the data. In other words, instead of launching separate GPU operations for each variable and each statistic, the aggregator combines them into one operation per group of output catchments. After each time step, the computed statistics are immediately written to an output NetCDF file in a streaming fashion. Each GPU writes its own output for the catchments it handles, and the results are indexed by time and location. Because writing is done incrementally and in parallel (using asynchronous write operations, as described in Sect. <xref ref-type="sec" rid="Ch1.S2.SS1.SSS4"/>), the approach keeps memory usage bounded and overlaps I/O with computation. By adjusting the list of saved catchments and the types of statistics collected, the user can obtain exactly the desired outputs from the simulation with minimal performance penalty. This flexible output system enables, for instance, efficient calibration or validation runs (focusing only on certain gauge stations and error metrics) and generally allows the model to integrate seamlessly into workflows where specific results need to be extracted on the fly.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Results and evaluation</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Experimental setup</title>
      <p id="d2e2340">Our evaluation targets the two aspects that matter most for a production-quality global hydrodynamic model: simulation speed and numerical fidelity. To make the comparison fair and reproducible, CPU and GPU runs within each comparison share the same global parameterization (derived from MERIT Hydro), identical forcing and simulation periods, and a controlled I/O setup that minimizes non-compute bottlenecks. We benchmark across four representative compute environments to reflect common usage scenarios: (i) a personal custom workstation with a GeForce 4070 Ti GPU (Ubuntu via Windows Subsystem for Linux 2), (ii) GPU servers with Tesla V100 GPUs, (iii) GPU servers with A100 GPUs, and (iv) CPU-only servers equipped with Intel Xeon processors. All three server configurations run CentOS 7. The CPU version used for comparison is CaMa-Flood v4.23. The software demonstrates good portability: it runs smoothly both on a personal workstation with administrator privileges, where the latest CUDA/toolchains are available, and on shared servers where users typically lack root access and must rely on older system packages. Hardware details are summarized in Table <xref ref-type="table" rid="T1"/>.</p>

<table-wrap id="T1" specific-use="star"><label>Table 1</label><caption><p id="d2e2348">Summary of computing nodes and their hardware configurations.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Label</oasis:entry>
         <oasis:entry colname="col2">CPU</oasis:entry>
         <oasis:entry colname="col3">CPU Memory</oasis:entry>
         <oasis:entry colname="col4">GPU</oasis:entry>
         <oasis:entry colname="col5">CPU Cores</oasis:entry>
         <oasis:entry colname="col6">GPUs</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">(per node)</oasis:entry>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5">(per node)</oasis:entry>
         <oasis:entry colname="col6">(per node)</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">4070 Ti</oasis:entry>
         <oasis:entry colname="col2">Intel Core i7-13700</oasis:entry>
         <oasis:entry colname="col3">64 GB</oasis:entry>
         <oasis:entry colname="col4">NVIDIA GeForce RTX 4070 Ti (12 GB)</oasis:entry>
         <oasis:entry colname="col5">16</oasis:entry>
         <oasis:entry colname="col6">1</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">V100</oasis:entry>
         <oasis:entry colname="col2">Dual Intel Xeon E5-2640 v4</oasis:entry>
         <oasis:entry colname="col3">128 GB</oasis:entry>
         <oasis:entry colname="col4">NVIDIA Tesla V100 (16 GB)</oasis:entry>
         <oasis:entry colname="col5">20</oasis:entry>
         <oasis:entry colname="col6">4</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">A100</oasis:entry>
         <oasis:entry colname="col2">Dual AMD EPYC 7543</oasis:entry>
         <oasis:entry colname="col3">256 GB</oasis:entry>
         <oasis:entry colname="col4">NVIDIA A100 (40 GB)</oasis:entry>
         <oasis:entry colname="col5">64</oasis:entry>
         <oasis:entry colname="col6">4</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CPU</oasis:entry>
         <oasis:entry colname="col2">Dual Intel Xeon 6248R</oasis:entry>
         <oasis:entry colname="col3">256 GB</oasis:entry>
         <oasis:entry colname="col4">–</oasis:entry>
         <oasis:entry colname="col5">48</oasis:entry>
         <oasis:entry colname="col6">–</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e2504">To make the subsequent performance and numerical-stability analyses reproducible, we first summarize the common experimental setup and then distinguish the choices specific to the wall-clock benchmark and to the stability comparison. All performance benchmarks use CaMa-Flood global parameter sets derived from MERIT Hydro <xref ref-type="bibr" rid="bib1.bibx34" id="paren.25"/> at four spatial resolutions, 15, 6, 3 and 1 arcmin; the corresponding numbers of unit-catchments, bifurcation links and approximate adaptive sub-steps are listed in Table <xref ref-type="table" rid="T2"/>. The same model options are enabled in both implementations: adaptive sub-step integration, bifurcation routing and on-the-fly logging of the selected diagnostic output.</p>

<table-wrap id="T2"><label>Table 2</label><caption><p id="d2e2516">Catchment and bifurcation statistics at different spatial resolutions.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Spatial</oasis:entry>
         <oasis:entry colname="col2">Number of</oasis:entry>
         <oasis:entry colname="col3">Number of</oasis:entry>
         <oasis:entry colname="col4">Number of</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">resolution</oasis:entry>
         <oasis:entry colname="col2">catchments</oasis:entry>
         <oasis:entry colname="col3">bifurcations</oasis:entry>
         <oasis:entry colname="col4">sub-steps</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4">(approximate)</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">15 arcmin</oasis:entry>
         <oasis:entry colname="col2">252 383</oasis:entry>
         <oasis:entry colname="col3">17 242</oasis:entry>
         <oasis:entry colname="col4">260</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">6 arcmin</oasis:entry>
         <oasis:entry colname="col2">1 562 463</oasis:entry>
         <oasis:entry colname="col3">207 858</oasis:entry>
         <oasis:entry colname="col4">590</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">3 arcmin</oasis:entry>
         <oasis:entry colname="col2">6 222 566</oasis:entry>
         <oasis:entry colname="col3">722 883</oasis:entry>
         <oasis:entry colname="col4">1200</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">1 arcmin</oasis:entry>
         <oasis:entry colname="col2">55 812 946</oasis:entry>
         <oasis:entry colname="col3">6 864 274</oasis:entry>
         <oasis:entry colname="col4">5300</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e2641">For the speed benchmark, the runoff forcing is the daily 1° binary sample runoff distributed with the CPU CaMa-Flood release; the sample forcing includes year 2000 and was prepared from the output of the Ensemble Land State Estimator (ELSE) <xref ref-type="bibr" rid="bib1.bibx17" id="paren.26"/>. To limit non-computational bottlenecks, we also control the I/O path so that the wall-clock comparison reflects routing performance rather than file-format overhead. The GPU implementation overlaps forcing input with computation through its asynchronous dataloader, whereas the CPU reference reads forcing synchronously. Each run therefore saves only one key output variable, and the CPU reference writes that output in binary format. The CPU reference is CaMa-Flood v4.23 compiled with Intel Fortran using the flags “-O3 -fp-model precise” and run in hybrid MPI/OpenMP mode; we tested several MPI/OpenMP combinations and used the fastest configuration for each CPU benchmark, with 16 MPI ranks following the official CaMa-Flood recommendation on the multi-core hosts. GPU runs use a Triton block size of 128, which specifies the number of unit-catchments processed by one kernel instance. All reported wall-times are means over five repeated runs.</p>
      <p id="d2e2647">For the numerical-stability comparison, we use daily forcing series prepared from 0.1° ERA5-Land NetCDF runoff <xref ref-type="bibr" rid="bib1.bibx23" id="paren.27"/> and from the eartH2Observe (E2O) Tier-1 ensemble runoff <xref ref-type="bibr" rid="bib1.bibx26" id="paren.28"/>. The 1980–2014 period is adopted to remain consistent with the temporal coverage of the E2O Tier-1 dataset. For each numerical-stability comparison, CPU and GPU runs use the same 6 arcmin parameter set, identical forcing, initial states and adaptive sub-step settings, with the bifurcation module enabled in both runs. The E2O and ERA5-Land numerical-stability experiments are not used to quantify wall-clock performance overhead from forcing input; they are used to test numerical and input consistency under independent runoff products.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Performance comparison</title>
      <p id="d2e2664">Table <xref ref-type="table" rid="T3"/> summarizes the benchmark timing results for a 1-year simulation period at the four resolutions on various hardware configurations: the workstation GPU, the V100 server with 1–4 GPUs, the A100 server with 1–4 GPUs, and the CPU-only server (a combination of four CPU nodes) with different numbers of CPU cores. At the coarsest 15 arcmin resolution the gap between server-class GPUs and CPUs is small: on V100, 1–4 GPUs take 2 min 21 s–2 min 24 s; on A100, 1–4 GPUs take 1 min 14 s–1 min 31 s; and the CPU configuration with 48/96/192 cores takes 2 min 19 s/1 min 32 s/1 min 11 s, respectively. The only clear outlier is the workstation-grade 4070 Ti, finishing 15 arcmin in about 38 s. This behavior is expected for a light problem where wall-time is dominated by non-compute components and frequency/I/O headroom on desktops (e.g., aggressive CPU turbo and SSD-backed reads) can outweigh architectural differences. Focusing on the strongest CPU setup (192 cores), performance still lags significantly behind even single-GPU runs. At the 6 arcmin resolution, the CPU requires 18 min 37 s, whereas a single V100 finishes in 7 min 50 s and a single A100 in 4 min 24 s. The gap widens at finer resolutions: at 3 arcmin, the CPU takes 2 h 45 min 19 s, compared to 11 min 47 s on 4 <inline-formula><mml:math id="M98" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> V100 and 7 min 21 s on 4 <inline-formula><mml:math id="M99" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> A100; at 1 arcmin, the CPU needs 140 h 58 min 20 s, versus 6 h 51 min 36 s on 4 <inline-formula><mml:math id="M100" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> V100 and 3 h 51 min 24 s on 4 <inline-formula><mml:math id="M101" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> A100. At the 1 arcmin resolution, the aggregate GPU memory footprint is about 30 GB in total, including hydrodynamic state arrays, auxiliary routing and bifurcation buffers, runoff-mapping buffers, and temporary work arrays. This footprint is distributed across GPU ranks by the LPT assignment of bifurcation-formed basin groups. Cases exceeding the available single-card GPU memory are marked OOM in Table <xref ref-type="table" rid="T3"/>. These results demonstrate that multi-GPU acceleration yields order-of-magnitude speedups, with 4 <inline-formula><mml:math id="M102" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> A100 achieving up to 36.6 <inline-formula><mml:math id="M103" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> faster runtimes than the best CPU configuration. Multi-GPU experiments reveal generally good scaling from 1 to 4 GPUs at finer resolution. At 3 arcmin resolution, runtimes on the V100 decrease from 40 min 5 s (1 GPU) to 11 min 46 s (4 GPUs), corresponding to a 3.4 <inline-formula><mml:math id="M104" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> speedup. On the A100, the same case improves from 22 min 49 s to 7 min 21 s, a 3.1 <inline-formula><mml:math id="M105" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> speedup. At the finest 1 arcmin resolution, where the workload and sub-step count are largest, scaling is even stronger: 816 min with 1 A100 is reduced to 231 min with 4 A100s, achieving a 3.5 <inline-formula><mml:math id="M106" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> speedup. In contrast, at the coarsest 15 arcmin resolution, adding GPUs can degrade performance (e.g., 4 A100s slower than a single card) because parallelization overheads dominate once the per-GPU workload becomes too small. These results highlight the balance between computation and overhead (I/O, reductions, inter-GPU communication). In summary, CaMa-Flood-GPU delivers minutes-to-hours runtimes for global domains that previously required hours-to-days on large CPU machines, with the biggest wins at fine resolutions and longer runs where computation dwarfs overhead.</p>

<table-wrap id="T3" specific-use="star"><label>Table 3</label><caption><p id="d2e2738">Performance comparison across different hardware configurations for the year 2000 (1-year simulation).</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="7">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="center"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Label</oasis:entry>
         <oasis:entry colname="col2">GPUs</oasis:entry>
         <oasis:entry colname="col3">CPU cores</oasis:entry>
         <oasis:entry colname="col4">15 arcmin</oasis:entry>
         <oasis:entry colname="col5">6 arcmin</oasis:entry>
         <oasis:entry colname="col6">3 arcmin</oasis:entry>
         <oasis:entry colname="col7">1 arcmin</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">4070 Ti</oasis:entry>
         <oasis:entry colname="col2">1</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
         <oasis:entry colname="col4">38 s</oasis:entry>
         <oasis:entry colname="col5">6 min 46 s</oasis:entry>
         <oasis:entry colname="col6">48 min 42 s</oasis:entry>
         <oasis:entry colname="col7"><sup>*</sup></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">V100</oasis:entry>
         <oasis:entry colname="col2">1</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
         <oasis:entry colname="col4">2 min 21 s</oasis:entry>
         <oasis:entry colname="col5">6 min 42 s</oasis:entry>
         <oasis:entry colname="col6">40 min 5 s</oasis:entry>
         <oasis:entry colname="col7"><sup>*</sup></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">2</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
         <oasis:entry colname="col4">2 min 17 s</oasis:entry>
         <oasis:entry colname="col5">4 min 26 s</oasis:entry>
         <oasis:entry colname="col6">21 min 7 s</oasis:entry>
         <oasis:entry colname="col7"><sup>*</sup></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">3</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
         <oasis:entry colname="col4">2 min 23 s</oasis:entry>
         <oasis:entry colname="col5">4 min 22 s</oasis:entry>
         <oasis:entry colname="col6">14 min 58 s</oasis:entry>
         <oasis:entry colname="col7">9 h 1 min 58 s</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">4</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
         <oasis:entry colname="col4">2 min 24 s</oasis:entry>
         <oasis:entry colname="col5">4 min 16 s</oasis:entry>
         <oasis:entry colname="col6">11 min 47 s</oasis:entry>
         <oasis:entry colname="col7">6 h 51 min 36 s</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">A100</oasis:entry>
         <oasis:entry colname="col2">1</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
         <oasis:entry colname="col4">1 min 14 s</oasis:entry>
         <oasis:entry colname="col5">4 min 24 s</oasis:entry>
         <oasis:entry colname="col6">22 min 49 s</oasis:entry>
         <oasis:entry colname="col7">13 h 36 min 2 s</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">2</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
         <oasis:entry colname="col4">1 min 17 s</oasis:entry>
         <oasis:entry colname="col5">2 min 58 s</oasis:entry>
         <oasis:entry colname="col6">12 min 21 s</oasis:entry>
         <oasis:entry colname="col7">6 h 56 min 0 s</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">3</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
         <oasis:entry colname="col4">1 min 23 s</oasis:entry>
         <oasis:entry colname="col5">2 min 35 s</oasis:entry>
         <oasis:entry colname="col6">8 min 56 s</oasis:entry>
         <oasis:entry colname="col7">4 h 50 min 57 s</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">4</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
         <oasis:entry colname="col4">1 min 31 s</oasis:entry>
         <oasis:entry colname="col5">2 min 34 s</oasis:entry>
         <oasis:entry colname="col6">7 min 21 s</oasis:entry>
         <oasis:entry colname="col7">3 h 51 min 24 s</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CPU</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">48</oasis:entry>
         <oasis:entry colname="col4">2 min 19 s</oasis:entry>
         <oasis:entry colname="col5">48 min 31 s</oasis:entry>
         <oasis:entry colname="col6">6 h 29 min 1 s</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M111" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 14 d</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">96</oasis:entry>
         <oasis:entry colname="col4">1 min 32 s</oasis:entry>
         <oasis:entry colname="col5">28 min 50 s</oasis:entry>
         <oasis:entry colname="col6">3 h 59 min 16 s</oasis:entry>
         <oasis:entry colname="col7">198 h 0 min 12 s</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">192</oasis:entry>
         <oasis:entry colname="col4">1 min 11 s</oasis:entry>
         <oasis:entry colname="col5">18 min 37 s</oasis:entry>
         <oasis:entry colname="col6">2 h 45 min 19 s</oasis:entry>
         <oasis:entry colname="col7">140 h 58 min 20 s</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d2e2741"><sup>*</sup> Indicates that the task exceeded the available GPU memory.</p></table-wrap-foot></table-wrap>

</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Numerical stability</title>
      <p id="d2e3132">The numerical-stability comparison follows the setup described in Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/> and covers 1980–2014 with two daily runoff forcing series: E2O Tier-1 ensemble runoff at 0.25° and a daily series prepared from ERA5-Land surface runoff at 0.1°. Using these two forcing datasets, we compare CaMa-Flood-GPU with the reference CPU implementation through spatial mean fields and station hydrographs, and assess whether floating-point differences remain bounded over the multi-decadal simulation.</p>

      <fig id="F7" specific-use="star"><label>Figure 7</label><caption><p id="d2e3139">CPU vs. GPU mean fields with the bifurcation module enabled, under E2O Tier-1 ensemble-mean runoff at 0.25° for 1980–2014. <bold>(a–c)</bold> Mean river outflow for CPU, GPU and their relative difference. <bold>(d–f)</bold> Mean floodplain outflow for CPU, GPU and their relative difference. <bold>(g–i)</bold> Mean river depth for CPU, GPU and their relative difference. The third-column colour bar is bipolar with a flat gray band at <inline-formula><mml:math id="M112" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.001</mml:mn></mml:mrow></mml:math></inline-formula> % to mark the floating-point noise floor.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/5623/2026/gmd-19-5623-2026-f07.png"/>

        </fig>

      <fig id="F8" specific-use="star"><label>Figure 8</label><caption><p id="d2e3169">Same as Fig. <xref ref-type="fig" rid="F7"/>, but using the daily ERA5-Land runoff series prepared at 0.1° for 1980–2014.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/5623/2026/gmd-19-5623-2026-f08.png"/>

        </fig>

      <p id="d2e3181">Figure <xref ref-type="fig" rid="F7"/> presents the CPU–GPU comparison under E2O runoff. The CPU and GPU mean river outflow fields (Fig. <xref ref-type="fig" rid="F7"/>a, b) are visually indistinguishable, and the relative-difference map in Fig. <xref ref-type="fig" rid="F7"/>c is dominated by the gray “below-noise-floor” band of <inline-formula><mml:math id="M113" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.001</mml:mn></mml:mrow></mml:math></inline-formula> %, with non-zero values appearing mainly on the largest river stems. Figure <xref ref-type="fig" rid="F7"/>d–f repeats the comparison for floodplain outflow and Fig. <xref ref-type="fig" rid="F7"/>g–i for river depth, both with the same noise-floor behaviour and the same concentration of residual signal on the main channels of large basins. Across the three variables, the field-mean relative difference remains below <inline-formula><mml:math id="M114" display="inline"><mml:mrow><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> %. Figure <xref ref-type="fig" rid="F8"/> reproduces the same comparison under ERA5-Land runoff. The residual magnitude and spatial pattern remain essentially unchanged between the two forcing products, although the products differ in resolution and temporal variability. This consistency indicates that the residuals arise from the floating-point ordering of the routing calculation rather than from the runoff forcing itself.</p>

      <fig id="F9" specific-use="star"><label>Figure 9</label><caption><p id="d2e3223">Simulated daily river discharge at the locations of six GRDC gauges spanning four orders of magnitude in drainage area, for the last four years of the 1980–2014 simulation under E2O Tier-1 runoff at 0.25°. The gauge panels are ordered by drainage area from large to small. CPU (green solid) and GPU (orange dashed) curves overlap; the difference is plotted as a red solid line with values given on the right-hand axis.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/5623/2026/gmd-19-5623-2026-f09.png"/>

        </fig>

      <fig id="F10" specific-use="star"><label>Figure 10</label><caption><p id="d2e3234">Same as Fig. <xref ref-type="fig" rid="F9"/>, but with ERA5-Land surface runoff at 0.1°.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/5623/2026/gmd-19-5623-2026-f10.png"/>

        </fig>

      <p id="d2e3245">Figures <xref ref-type="fig" rid="F9"/> and <xref ref-type="fig" rid="F10"/> compare simulated discharge for the final four years of the full simulation at six GRDC gauge locations spanning four orders of magnitude in drainage area. At every gauge location, the CPU and GPU curves overlay each other almost exactly, and the right-hand red axis shows the day-by-day CPU–GPU difference at a magnified scale. The residual amplitude follows the depth of the upstream reduction at the station location rather than the absolute discharge magnitude. On the large main stems, GPU <monospace>atomic_add</monospace> operations are associative and commutative but their execution order is not deterministic, whereas the CPU reduction follows a deterministic order. The two reductions therefore return slightly different floating-point bit patterns, producing a small bounded difference at the m<sup>3</sup> s<sup>−1</sup> level, three to four orders of magnitude smaller than the simulated discharge. At the mid-scale tributary and headwater gauge locations, the reduction is shallower and the red difference curve collapses toward zero for most of the displayed period. Neither forcing shows a monotonic component in the residual over the multi-year window. We therefore do not see evidence of numerical drift accumulating through the integration.</p>
      <p id="d2e3276">Mass conservation on the irregular unit-catchment graph is preserved by construction. The GPU implementation applies the same source-to-target routing fluxes as the CPU algorithm: main-channel fluxes are accumulated through <monospace>scatter_add</monospace> reductions, and bifurcation fluxes are accumulated through <monospace>atomic_add</monospace> reductions rather than overwriting destination storage. Each outgoing flux is therefore added to the corresponding downstream storage term, with the CPU–GPU difference limited to the floating-point summation order discussed above. Because bifurcation routing is enabled together with the main-channel routing in the numerical-stability runs, the GPU executes the coupled routing workflow that includes both the base main-channel operations and the bifurcation-specific accumulation pathway. The comparison therefore provides an indirect CPU–GPU consistency check of the complete bifurcation-enabled model configuration, rather than a separate evaluation of individual bifurcation split ratios or point-by-point bifurcation hydraulics. The no-bifurcation configuration is a strict subset of these operations, so agreement with bifurcation enabled also supports agreement for the simpler routing case.</p>
      <p id="d2e3286">Taken together, the field comparisons, station hydrographs and routing-flux accounting show that CaMa-Flood-GPU reproduces the CPU reference without detectable numerical drift over multi-decadal integrations, across both forcing products and with bifurcations enabled.</p>
</sec>
</sec>
<sec id="Ch1.S4" sec-type="conclusions">
  <label>4</label><title>Conclusions</title>
      <p id="d2e3299">In this study, we sought to identify and overcome the fundamental challenges hindering the application of GPU acceleration to global river models. We have successfully addressed this by developing CaMa-Flood-GPU, a version of the global flood model refactored for massively parallel hardware. Our work systematically pinpointed the primary bottlenecks for GPU computation – the irregular network topology, the mismatch between gridded inputs and catchments, the conditional logic in water depth diagnosis, and data communication overheads. In response, we explored and implemented a suite of targeted algorithmic solutions: reframing river routing as a parallel scatter-add operation, replacing traditional runoff interpolation with efficient sparse matrix-vector multiplication, designing a branchless kernel for water depth diagnosis, and minimizing data handling bottlenecks through asynchronous I/O. These solutions, implemented using Triton and PyTorch, preserve the model's physical integrity, ensuring numerical differences from the CPU version are minimal.</p>
      <p id="d2e3302">The resulting performance improvements fulfill our objective of making kilometer-scale global simulations computationally practical. Simulations at high resolutions (e.g., 3 arcmin) that previously took hours on a multi-core server can now run in minutes on a multi-GPU system, making simulations at 1 arcmin resolution feasible within hours. This leap in performance broadens the model's applicability, enabling more detailed analyses and the generation of large ensembles for robust uncertainty assessment. The model's flexible, modular design further allows researchers to tailor it to their needs, facilitating its integration into larger Earth system modeling frameworks.</p>
      <p id="d2e3305">Looking ahead, there are several avenues for further development. First, additional physics and processes could be incorporated. CaMa-Flood-GPU realizes this through a sub-module layer in which each physical process is encapsulated as a self-contained component. A component declares its own per-unit-catchment state fields, registers one update kernel called once per time step by the main integrator, and contributes any fluxes back to the channel network through the same <monospace>scatter_add</monospace> reduction used by the core routing. New processes are therefore added by writing one such sub-module rather than by modifying the existing flood-routing kernels or the time-step loop. For reservoir operation, the CaMa-Flood community has already developed schemes such as the global flood-control reservoir module of <xref ref-type="bibr" rid="bib1.bibx8" id="text.29"/> and the H08–CaMa-Flood coupling of <xref ref-type="bibr" rid="bib1.bibx29" id="text.30"/>. These schemes both express reservoir storage and release as per-time-step updates at the unit-catchment level, which maps directly onto this interface. Sediment transport schemes such as the global sediment-dynamics model of <xref ref-type="bibr" rid="bib1.bibx9" id="text.31"/> similarly reduce, on the GPU side, to a small number of additional catchment-level state fields and one update kernel. These extensions can therefore be ported into the GPU framework as optional modules without rewriting the existing code.</p>
      <p id="d2e3320">Second, a natural research direction for CaMa-Flood-GPU is end-to-end differentiable global routing. The integrator is built in PyTorch and uses custom Triton kernels for the core routing solver: the PyTorch layer already provides automatic differentiation for everything expressed as standard tensor operations, and the Triton kernels can be paired with hand-written backward kernels in the same way as in the wider PyTorch-Triton ecosystem. Once that pairing is in place, parameters such as Manning roughness, river width, floodplain elevation profile and even reservoir operating rules can be calibrated by gradient descent against discharge or altimetry observations, or trained jointly with PyTorch-based land-surface or AI components, in line with the differentiable-geoscience programme advocated by <xref ref-type="bibr" rid="bib1.bibx28" id="text.32"/>. As part of our commitment to community engagement, we will provide thorough documentation and example cases to encourage broader adoption.</p>
      <p id="d2e3327">In summary, the development of CaMa-Flood-GPU successfully bridges the gap between hydrological modeling and high-performance computing. It provides a scalable, physically consistent foundation for global flood simulations at resolutions and runtimes that were previously impractical. We believe this tool, offering a unique combination of speed, scale, and physical fidelity, will help advance global-scale hydrological assessments and risk analysis under climate change.</p>
</sec>

      
      </body>
    <back><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d2e3334">The source code and scripts for CaMa-Flood-GPU are available from Zenodo at <ext-link xlink:href="https://doi.org/10.5281/zenodo.18137445" ext-link-type="DOI">10.5281/zenodo.18137445</ext-link> <xref ref-type="bibr" rid="bib1.bibx15" id="paren.33"/> under the Apache 2.0 license. The CPU version of CaMa-Flood (v4.23) used in this study is available from Zenodo at <ext-link xlink:href="https://doi.org/10.5281/zenodo.14214989" ext-link-type="DOI">10.5281/zenodo.14214989</ext-link> <xref ref-type="bibr" rid="bib1.bibx35" id="paren.34"/>. The input datasets used for the simulations, including the CaMa-Flood river topography maps, catchment parameters, and runoff forcing data, are publicly available from the CaMa-Flood project website (<uri>https://global-hydrodynamics.github.io/CaMa-Flood/</uri>, last access: 25 June 2026).</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e3355">S.K. coded the GPU model, performed the experiments, and wrote the initial draft. J.Y. and D.Y. contributed to the model design, provided supervision, assisted with code optimization and debugging. All authors contributed to review and editing of the manuscript.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e3361">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e3367">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e3373">The numerical calculations in this paper have been performed on the supercomputing system in the Supercomputing Centre of Wuhan University. We also acknowledge the open-source community behind PyTorch and Triton, which made the GPU implementation efficient and attainable.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e3378">JY is supported by the National Natural Science Foundation of China (grant nos. T2522026, 52441902, W2521014). DY received support from the NSFC-JSPS Bilateral Joint Research Project (grant no. JPJSBP120257408). This work is also supported by the National Natural Science Foundation of China (grant nos. 52361145864, W2421111), the Natural Science Foundation of Hubei Province (grant no. 2024AFA055), and the Major Science and Technology Project of the Ministry of Water Resources (grant no. SKS-2025043).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e3384">This paper was edited by Thomas B. Wild and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Alvanos and Christoudias(2017)</label><mixed-citation>Alvanos, M. and Christoudias, T.: GPU-accelerated atmospheric chemical kinetics in the ECHAM/MESSy (EMAC) Earth system model (version 2.52), Geosci. Model Dev., 10, 3679–3693, <ext-link xlink:href="https://doi.org/10.5194/gmd-10-3679-2017" ext-link-type="DOI">10.5194/gmd-10-3679-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Bates et al.(2010)Bates, Horritt, and Fewtrell</label><mixed-citation>Bates, P. D., Horritt, M. S., and Fewtrell, T. J.: A simple inertial formulation of the shallow water equations for efficient two-dimensional flood inundation modelling, J. Hydrol., 387, 33–45, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2010.03.027" ext-link-type="DOI">10.1016/j.jhydrol.2010.03.027</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Caviedes-Voullième et al.(2023)</label><mixed-citation>Caviedes-Voullième, D., Morales-Hernández, M., Norman, M. R., and Özgen-Xian, I.: SERGHEI (SERGHEI-SWE) v1.0: a performance-portable high-performance parallel-computing shallow-water solver for hydrology and environmental hydraulics, Geosci. Model Dev., 16, 977–1008, <ext-link xlink:href="https://doi.org/10.5194/gmd-16-977-2023" ext-link-type="DOI">10.5194/gmd-16-977-2023</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Collins et al.(2024)Collins, David, Riggs, Allen, Pavelsky, Lin, Pan, Yamazaki, Meentemeyer, and Sanchez</label><mixed-citation>Collins, E. L., David, C. H., Riggs, R., Allen, G. H., Pavelsky, T. M., Lin, P., Pan, M., Yamazaki, D., Meentemeyer, R. K., and Sanchez, G. M.: Global patterns in river water storage dependent on residence time, Nat. Geosci., 17, 433–439, <ext-link xlink:href="https://doi.org/10.1038/s41561-024-01421-5" ext-link-type="DOI">10.1038/s41561-024-01421-5</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>De Almeida et al.(2012)De Almeida, Bates, Freer, and Souvignet</label><mixed-citation>De Almeida, G. A. M., Bates, P., Freer, J. E., and Souvignet, M.: Improving the stability of a simple formulation of the shallow water equations for 2‐D flood modeling, Water Resour. Res., 48, 2011WR011570, <ext-link xlink:href="https://doi.org/10.1029/2011WR011570" ext-link-type="DOI">10.1029/2011WR011570</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Emerton et al.(2016)Emerton, Stephens, Pappenberger, Pagano, Weerts, Wood, Salamon, Brown, Hjerdt, Donnelly, Baugh, and Cloke</label><mixed-citation>Emerton, R. E., Stephens, E. M., Pappenberger, F., Pagano, T. C., Weerts, A. H., Wood, A. W., Salamon, P., Brown, J. D., Hjerdt, N., Donnelly, C., Baugh, C. A., and Cloke, H. L.: Continental and global scale flood forecasting systems, WIREs Water, 3, 391–418, <ext-link xlink:href="https://doi.org/10.1002/wat2.1137" ext-link-type="DOI">10.1002/wat2.1137</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Hamitouche et al.(2025)Hamitouche, Fosser, Anav, He, and Lin</label><mixed-citation>Hamitouche, M., Fosser, G., Anav, A., He, C., and Lin, T.-S.: Impact of runoff schemes on global flow discharge: a comprehensive analysis using the Noah-MP and CaMa-Flood models, Hydrol. Earth Syst. Sci., 29, 1221–1240, <ext-link xlink:href="https://doi.org/10.5194/hess-29-1221-2025" ext-link-type="DOI">10.5194/hess-29-1221-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Hanazaki et al.(2022)Hanazaki, Yamazaki, and Yoshimura</label><mixed-citation>Hanazaki, R., Yamazaki, D., and Yoshimura, K.: Development of a reservoir flood control scheme for global flood models, J. Adv. Model. Earth Sy., 14, e2021MS002944, <ext-link xlink:href="https://doi.org/10.1029/2021MS002944" ext-link-type="DOI">10.1029/2021MS002944</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Hatono and Yoshimura(2020)</label><mixed-citation>Hatono, M. and Yoshimura, K.: Development of a global sediment dynamics model, Prog. Earth  Planet. Sc., 7, 59, <ext-link xlink:href="https://doi.org/10.1186/s40645-020-00368-6" ext-link-type="DOI">10.1186/s40645-020-00368-6</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Heinicke et al.(2024)</label><mixed-citation>Heinicke, S., Volkholz, J., Schewe, J., Gosling, S. N., Müller Schmied, H., Zimmermann, S., Mengel, M., Sauer, I. J., Burek, P., Chang, J., Kou-Giesbrecht, S., Grillakis, M., Guillaumot, L., Hanasaki, N., Koutroulis, A., Otta, K., Qi, W., Satoh, Y., Stacke, T., Yokohata, T., and Frieler, K.: Global hydrological models continue to overestimate river discharge, Environ. Res. Lett., 19, 074005, <ext-link xlink:href="https://doi.org/10.1088/1748-9326/ad52b0" ext-link-type="DOI">10.1088/1748-9326/ad52b0</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Hirabayashi et al.(2013)Hirabayashi, Mahendran, Koirala, Konoshima, Yamazaki, Watanabe, Kim, and Kanae</label><mixed-citation>Hirabayashi, Y., Mahendran, R., Koirala, S., Konoshima, L., Yamazaki, D., Watanabe, S., Kim, H., and Kanae, S.: Global flood risk under climate change, Nat. Clim. Change, 3, 816–821, <ext-link xlink:href="https://doi.org/10.1038/nclimate1911" ext-link-type="DOI">10.1038/nclimate1911</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Hokkanen et al.(2021)Hokkanen, Kollet, Kraus, Herten, Hrywniak, and Pleiter</label><mixed-citation>Hokkanen, J., Kollet, S., Kraus, J., Herten, A., Hrywniak, M., and Pleiter, D.: Leveraging HPC accelerator architectures with modern techniques: hydrologic modeling on GPUs with ParFlow, Comput. Geosci., 25, 1579–1590, <ext-link xlink:href="https://doi.org/10.1007/s10596-021-10051-4" ext-link-type="DOI">10.1007/s10596-021-10051-4</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Huang and Hattermann(2018)</label><mixed-citation>Huang, S. and Hattermann, F. F.: Coupling a global hydrodynamic algorithm and a regional hydrological model for large-scale flood inundation simulations, Hydrol. Res., 49, 438–449, <ext-link xlink:href="https://doi.org/10.2166/nh.2017.061" ext-link-type="DOI">10.2166/nh.2017.061</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Hunter et al.(2005)Hunter, Horritt, Bates, Wilson, and Werner</label><mixed-citation>Hunter, N. M., Horritt, M. S., Bates, P. D., Wilson, M. D., and Werner, M. G.: An adaptive time step solution for raster-based storage cell modelling of floodplain inundation, Adv. Water Resour., 28, 975–991, <ext-link xlink:href="https://doi.org/10.1016/j.advwatres.2005.03.007" ext-link-type="DOI">10.1016/j.advwatres.2005.03.007</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Kang(2026)</label><mixed-citation>Kang, S.: CaMa-Flood-GPU, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.18137445" ext-link-type="DOI">10.5281/zenodo.18137445</ext-link>, 2026.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Kang et al.(2025)Kang, Yin, Slater, Liu, Sun, Liu, and Xia</label><mixed-citation>Kang, S., Yin, J., Slater, L., Liu, P., Sun, F., Liu, D., and Xia, J.: Global Flood Projection and Socioeconomic Implications Under a Deep Learning Framework, Water Resour. Res., 61, <ext-link xlink:href="https://doi.org/10.1029/2024wr037139" ext-link-type="DOI">10.1029/2024wr037139</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Kim et al.(2009)Kim, Yeh, Oki, and Kanae</label><mixed-citation>Kim, H., Yeh, P. J.-F., Oki, T., and Kanae, S.: Role of rivers in the seasonal variations of terrestrial water storage over global basins, Geophys. Res. Lett., 36, L17402, <ext-link xlink:href="https://doi.org/10.1029/2009GL039006" ext-link-type="DOI">10.1029/2009GL039006</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Kimura et al.(2023)Kimura, Hirabayashi, Kita, Zhou, and Yamazaki</label><mixed-citation>Kimura, Y., Hirabayashi, Y., Kita, Y., Zhou, X., and Yamazaki, D.: Methodology for constructing a flood-hazard map for a future climate, Hydrol. Earth Syst. Sci., 27, 1627–1644, <ext-link xlink:href="https://doi.org/10.5194/hess-27-1627-2023" ext-link-type="DOI">10.5194/hess-27-1627-2023</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Marthews et al.(2022)Marthews, Dadson, Clark, Blyth, Hayman, Yamazaki, Becher, Martínez-de la Torre, Prigent, and Jiménez</label><mixed-citation>Marthews, T. R., Dadson, S. J., Clark, D. B., Blyth, E. M., Hayman, G. D., Yamazaki, D., Becher, O. R. E., Martínez-de la Torre, A., Prigent, C., and Jiménez, C.: Inundation prediction in tropical wetlands from JULES-CaMa-Flood global land surface simulations, Hydrol. Earth Syst. Sci., 26, 3151–3175, <ext-link xlink:href="https://doi.org/10.5194/hess-26-3151-2022" ext-link-type="DOI">10.5194/hess-26-3151-2022</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Mateo et al.(2017)Mateo, Yamazaki, Kim, Champathong, Vaze, and Oki</label><mixed-citation>Mateo, C. M. R., Yamazaki, D., Kim, H., Champathong, A., Vaze, J., and Oki, T.: Impacts of spatial resolution and representation of flow connectivity on large-scale simulation of floods, Hydrol. Earth Syst. Sci., 21, 5143–5163, <ext-link xlink:href="https://doi.org/10.5194/hess-21-5143-2017" ext-link-type="DOI">10.5194/hess-21-5143-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Mizukami et al.(2016)</label><mixed-citation>Mizukami, N., Clark, M. P., Sampson, K., Nijssen, B., Mao, Y., McMillan, H., Viger, R. J., Markstrom, S. L., Hay, L. E., Woods, R., Arnold, J. R., and Brekke, L. D.: mizuRoute version 1: a river network routing tool for a continental domain water resources applications, Geosci. Model Dev., 9, 2223–2238, <ext-link xlink:href="https://doi.org/10.5194/gmd-9-2223-2016" ext-link-type="DOI">10.5194/gmd-9-2223-2016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Morales-Hernández et al.(2021)</label><mixed-citation>Morales-Hernández, M., Sharif, M. B., Kalyanapu, A., Ghafoor, S. K., Dullo, T. T., Gangrade, S., Kao, S.-C., Norman, M. R., and Evans, K. J.: TRITON: a multi-GPU open source 2D hydrodynamic flood model, Environ. Model. Softw., 141, 105034, <ext-link xlink:href="https://doi.org/10.1016/j.envsoft.2021.105034" ext-link-type="DOI">10.1016/j.envsoft.2021.105034</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Muñoz-Sabater et al.(2021)Muñoz-Sabater, Dutra, Agusti-Panareda, Albergel, Arduini et al.</label><mixed-citation>Muñoz-Sabater, J., Dutra, E., Agustí-Panareda, A., Albergel, C., Arduini, G., Balsamo, G., Boussetta, S., Choulga, M., Harrigan, S., Hersbach, H., Martens, B., Miralles, D. G., Piles, M., Rodríguez-Fernández, N. J., Zsoter, E., Buontempo, C., and Thépaut, J.-N.: ERA5-Land: a state-of-the-art global reanalysis dataset for land applications, Earth Syst. Sci. Data, 13, 4349–4383, <ext-link xlink:href="https://doi.org/10.5194/essd-13-4349-2021" ext-link-type="DOI">10.5194/essd-13-4349-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Neal et al.(2021)Neal, Hawker, Savage, Durand, Bates, and Sampson</label><mixed-citation>Neal, J., Hawker, L., Savage, J., Durand, M., Bates, P., and Sampson, C.: Estimating River Channel Bathymetry in Large Scale Flood Inundation Models, Water Resour. Res., 57, e2020WR028301, <ext-link xlink:href="https://doi.org/10.1029/2020WR028301" ext-link-type="DOI">10.1029/2020WR028301</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Rong et al.(2024)Rong, Bates, and Neal</label><mixed-citation>Rong, Y., Bates, P., and Neal, J.: GPU‐Accelerated Urban Flood Modeling Using a Nonuniform Structured Grid and a Super Grid Scale River Channel, Water Resour. Res., 60, e2023WR036128, <ext-link xlink:href="https://doi.org/10.1029/2023WR036128" ext-link-type="DOI">10.1029/2023WR036128</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Schellekens et al.(2017)Schellekens, Dutra, Martínez-de la Torre, Balsamo, van Dijk et al.</label><mixed-citation>Schellekens, J., Dutra, E., Martínez-de la Torre, A., Balsamo, G., van Dijk, A., Sperna Weiland, F., Minvielle, M., Calvet, J.-C., Decharme, B., Eisner, S., Fink, G., Flörke, M., Peßenteiner, S., van Beek, R., Polcher, J., Beck, H., Orth, R., Calton, B., Burke, S., Dorigo, W., and Weedon, G. P.: A global water resources ensemble of hydrological models: the eartH2Observe Tier-1 dataset, Earth Syst. Sci. Data, 9, 389–413, <ext-link xlink:href="https://doi.org/10.5194/essd-9-389-2017" ext-link-type="DOI">10.5194/essd-9-389-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Sharifian et al.(2023)Sharifian, Kesserwani, Chowdhury, Neal, and Bates</label><mixed-citation>Sharifian, M. K., Kesserwani, G., Chowdhury, A. A., Neal, J., and Bates, P.: LISFLOOD-FP 8.1: new GPU-accelerated solvers for faster fluvial/pluvial flood simulations, Geosci. Model Dev., 16, 2391–2413, <ext-link xlink:href="https://doi.org/10.5194/gmd-16-2391-2023" ext-link-type="DOI">10.5194/gmd-16-2391-2023</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Shen et al.(2023)Shen, Appling, Gentine et al.</label><mixed-citation>Shen, C., Appling, A. P., Gentine, P., Bandai, T., Gupta, H., Tartakovsky, A., Baity-Jesi, M., Fenicia, F., Kifer, D., Li, L., Liu, X., Ren, W., Zheng, Y., Harman, C. J., Clark, M., Farthing, M., Feng, D., Kumar, P., Aboelyazeed, D., Rahmani, F., Song, Y., Beck, H. E., Bindas, T., Dwivedi, D., Fang, K., Höge, M., Rackauckas, C., Mohanty, B., Roy, T., Xu, C., and Lawson, K.: Differentiable modelling to unify machine learning and physical models for geosciences, Nat. Rev. Earth Environ., 4, 552–567, <ext-link xlink:href="https://doi.org/10.1038/s43017-023-00450-9" ext-link-type="DOI">10.1038/s43017-023-00450-9</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Shin et al.(2020)Shin, Pokhrel, Yamazaki, Huang, Torbick, Qi, Pattanakiat, Ngo-Duc, and Nguyen</label><mixed-citation>Shin, S., Pokhrel, Y., Yamazaki, D., Huang, X., Torbick, N., Qi, J., Pattanakiat, S., Ngo-Duc, T., and Nguyen, T. D.: High resolution modeling of river-floodplain-reservoir inundation dynamics in the Mekong River Basin, Water Resour. Res., 56, e2019WR026449, <ext-link xlink:href="https://doi.org/10.1029/2019WR026449" ext-link-type="DOI">10.1029/2019WR026449</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Tayefi et al.(2007)Tayefi, Lane, Hardy, and Yu</label><mixed-citation>Tayefi, V., Lane, S. N., Hardy, R. J., and Yu, D.: A comparison of one- and two-dimensional approaches to modelling flood inundation over complex upland floodplains, Hydrol. Process., 21, 3190–3202, <ext-link xlink:href="https://doi.org/10.1002/hyp.6523" ext-link-type="DOI">10.1002/hyp.6523</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Yamazaki(2025)</label><mixed-citation>Yamazaki, D.: Advancing global river hydrodynamics simulations by catchment-based macro-scale floodplain modeling approach, Geosci. Lett., 12, 72, <ext-link xlink:href="https://doi.org/10.1186/s40562-025-00452-z" ext-link-type="DOI">10.1186/s40562-025-00452-z</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Yamazaki et al.(2011)Yamazaki, Kanae, Kim, and Oki</label><mixed-citation>Yamazaki, D., Kanae, S., Kim, H., and Oki, T.: A physically based description of floodplain inundation dynamics in a global river routing model, Water Resour. Res., 47, 2010WR009726, <ext-link xlink:href="https://doi.org/10.1029/2010WR009726" ext-link-type="DOI">10.1029/2010WR009726</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Yamazaki et al.(2014)Yamazaki, Sato, Kanae, Hirabayashi, and Bates</label><mixed-citation>Yamazaki, D., Sato, T., Kanae, S., Hirabayashi, Y., and Bates, P. D.: Regional flood dynamics in a bifurcating mega delta simulated in a global river model, Geophys. Res. Lett., 41, 3127–3135, <ext-link xlink:href="https://doi.org/10.1002/2014GL059744" ext-link-type="DOI">10.1002/2014GL059744</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Yamazaki et al.(2019)Yamazaki, Ikeshima, Sosa, Bates, Allen, and Pavelsky</label><mixed-citation>Yamazaki, D., Ikeshima, D., Sosa, J., Bates, P. D., Allen, G. H., and Pavelsky, T. M.: MERIT Hydro: A High‐Resolution Global Hydrography Map Based on Latest Topography Dataset, Water Resour. Res., 55, 5053–5073, <ext-link xlink:href="https://doi.org/10.1029/2019WR024873" ext-link-type="DOI">10.1029/2019WR024873</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Yamazaki et al.(2024)Yamazaki, Revel, Hatono, Hanazaki, Nitta, Wortmann, Zhou, DirkEilander, Kang, Pilz, and Zhao</label><mixed-citation>Yamazaki, D., Revel, M., Hatono, M., Hanazaki, R., Nitta, T., Wortmann, M., Zhou, X., DirkEilander, Kang, S., Pilz, T., and Zhao, F.: global-hydrodynamics/CaMa-Flood_v4: Release_v4.23, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.14214989" ext-link-type="DOI">10.5281/zenodo.14214989</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Zahura et al.(2020)Zahura, Goodall, Sadler, Shen, Morsy, and Behl</label><mixed-citation>Zahura, F. T., Goodall, J. L., Sadler, J. M., Shen, Y., Morsy, M. M., and Behl, M.: Training Machine Learning Surrogate Models From a High‐Fidelity Physics‐Based Model: Application for Real‐Time Street‐Scale Flood Prediction in an Urban Coastal Community, Water Resour. Res., 56, e2019WR027038, <ext-link xlink:href="https://doi.org/10.1029/2019WR027038" ext-link-type="DOI">10.1029/2019WR027038</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Zhao et al.(2017)Zhao, Veldkamp, Frieler, Schewe, Ostberg, Willner, Schauberger, Gosling, Müller Schmied, Portmann, Leng, Huang, Liu, Tang, Hanasaki, Biemans, Gerten, Satoh, Pokhrel, Stacke, Ciais, Chang, Ducharne, Guimberteau, Wada, Kim, and Yamazaki</label><mixed-citation>Zhao, F., Veldkamp, T. I. E., Frieler, K., Schewe, J., Ostberg, S., Willner, S., Schauberger, B., Gosling, S. N., Müller Schmied, H., Portmann, F. T., Leng, G., Huang, M., Liu, X., Tang, Q., Hanasaki, N., Biemans, H., Gerten, D., Satoh, Y., Pokhrel, Y., Stacke, T., Ciais, P., Chang, J., Ducharne, A., Guimberteau, M., Wada, Y., Kim, H., and Yamazaki, D.: The critical role of the routing scheme in simulating peak river discharge in global hydrological models, Environ. Res. Lett., 12, 075003, <ext-link xlink:href="https://doi.org/10.1088/1748-9326/aa7250" ext-link-type="DOI">10.1088/1748-9326/aa7250</ext-link>, 2017.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>CaMa-Flood-GPU: a GPU-based hydrodynamic model implementation for scalable global simulations</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Alvanos and Christoudias(2017)</label><mixed-citation>
      
Alvanos, M. and Christoudias, T.: GPU-accelerated atmospheric chemical kinetics in the ECHAM/MESSy (EMAC) Earth system model (version 2.52), Geosci. Model Dev., 10, 3679–3693, <a href="https://doi.org/10.5194/gmd-10-3679-2017" target="_blank">https://doi.org/10.5194/gmd-10-3679-2017</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Bates et al.(2010)Bates, Horritt, and Fewtrell</label><mixed-citation>
      
Bates, P. D., Horritt, M. S., and Fewtrell, T. J.: A simple inertial
formulation of the shallow water equations for efficient two-dimensional
flood inundation modelling, J. Hydrol., 387, 33–45,
<a href="https://doi.org/10.1016/j.jhydrol.2010.03.027" target="_blank">https://doi.org/10.1016/j.jhydrol.2010.03.027</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Caviedes-Voullième
et al.(2023)</label><mixed-citation>
      
Caviedes-Voullième, D., Morales-Hernández, M., Norman, M. R., and Özgen-Xian, I.: SERGHEI (SERGHEI-SWE) v1.0: a performance-portable high-performance parallel-computing shallow-water solver for hydrology and environmental hydraulics, Geosci. Model Dev., 16, 977–1008, <a href="https://doi.org/10.5194/gmd-16-977-2023" target="_blank">https://doi.org/10.5194/gmd-16-977-2023</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Collins et al.(2024)Collins, David, Riggs, Allen, Pavelsky, Lin, Pan,
Yamazaki, Meentemeyer, and Sanchez</label><mixed-citation>
      
Collins, E. L., David, C. H., Riggs, R., Allen, G. H., Pavelsky, T. M., Lin,
P., Pan, M., Yamazaki, D., Meentemeyer, R. K., and Sanchez, G. M.: Global
patterns in river water storage dependent on residence time, Nat.
Geosci., 17, 433–439, <a href="https://doi.org/10.1038/s41561-024-01421-5" target="_blank">https://doi.org/10.1038/s41561-024-01421-5</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>De Almeida et al.(2012)De Almeida, Bates, Freer, and
Souvignet</label><mixed-citation>
      
De Almeida, G. A. M., Bates, P., Freer, J. E., and Souvignet, M.: Improving the
stability of a simple formulation of the shallow water equations for 2‐D
flood modeling, Water Resour. Res., 48, 2011WR011570,
<a href="https://doi.org/10.1029/2011WR011570" target="_blank">https://doi.org/10.1029/2011WR011570</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Emerton et al.(2016)Emerton, Stephens, Pappenberger, Pagano, Weerts,
Wood, Salamon, Brown, Hjerdt, Donnelly, Baugh, and
Cloke</label><mixed-citation>
      
Emerton, R. E., Stephens, E. M., Pappenberger, F., Pagano, T. C., Weerts,
A. H., Wood, A. W., Salamon, P., Brown, J. D., Hjerdt, N., Donnelly, C.,
Baugh, C. A., and Cloke, H. L.: Continental and global scale flood
forecasting systems, WIREs Water, 3, 391–418, <a href="https://doi.org/10.1002/wat2.1137" target="_blank">https://doi.org/10.1002/wat2.1137</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Hamitouche et al.(2025)Hamitouche, Fosser, Anav, He, and
Lin</label><mixed-citation>
      
Hamitouche, M., Fosser, G., Anav, A., He, C., and Lin, T.-S.: Impact of runoff schemes on global flow discharge: a comprehensive analysis using the Noah-MP and CaMa-Flood models, Hydrol. Earth Syst. Sci., 29, 1221–1240, <a href="https://doi.org/10.5194/hess-29-1221-2025" target="_blank">https://doi.org/10.5194/hess-29-1221-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Hanazaki et al.(2022)Hanazaki, Yamazaki, and
Yoshimura</label><mixed-citation>
      
Hanazaki, R., Yamazaki, D., and Yoshimura, K.: Development of a reservoir flood
control scheme for global flood models, J. Adv. Model. Earth
Sy., 14, e2021MS002944, <a href="https://doi.org/10.1029/2021MS002944" target="_blank">https://doi.org/10.1029/2021MS002944</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Hatono and Yoshimura(2020)</label><mixed-citation>
      
Hatono, M. and Yoshimura, K.: Development of a global sediment dynamics model,
Prog. Earth  Planet. Sc., 7, 59,
<a href="https://doi.org/10.1186/s40645-020-00368-6" target="_blank">https://doi.org/10.1186/s40645-020-00368-6</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Heinicke et al.(2024)</label><mixed-citation>
      
Heinicke, S., Volkholz, J., Schewe, J., Gosling, S. N., Müller Schmied, H., Zimmermann, S., Mengel, M., Sauer, I. J., Burek, P., Chang, J., Kou-Giesbrecht, S., Grillakis, M., Guillaumot, L., Hanasaki, N., Koutroulis, A., Otta, K., Qi, W., Satoh, Y., Stacke, T., Yokohata, T., and Frieler, K.: Global hydrological models continue to overestimate river
discharge, Environ. Res. Lett., 19, 074005,
<a href="https://doi.org/10.1088/1748-9326/ad52b0" target="_blank">https://doi.org/10.1088/1748-9326/ad52b0</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Hirabayashi et al.(2013)Hirabayashi, Mahendran, Koirala, Konoshima,
Yamazaki, Watanabe, Kim, and Kanae</label><mixed-citation>
      
Hirabayashi, Y., Mahendran, R., Koirala, S., Konoshima, L., Yamazaki, D.,
Watanabe, S., Kim, H., and Kanae, S.: Global flood risk under climate change,
Nat. Clim. Change, 3, 816–821, <a href="https://doi.org/10.1038/nclimate1911" target="_blank">https://doi.org/10.1038/nclimate1911</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Hokkanen et al.(2021)Hokkanen, Kollet, Kraus, Herten, Hrywniak, and
Pleiter</label><mixed-citation>
      
Hokkanen, J., Kollet, S., Kraus, J., Herten, A., Hrywniak, M., and Pleiter, D.:
Leveraging HPC accelerator architectures with modern techniques: hydrologic
modeling on GPUs with ParFlow, Comput. Geosci., 25, 1579–1590,
<a href="https://doi.org/10.1007/s10596-021-10051-4" target="_blank">https://doi.org/10.1007/s10596-021-10051-4</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Huang and Hattermann(2018)</label><mixed-citation>
      
Huang, S. and Hattermann, F. F.: Coupling a global hydrodynamic algorithm and a
regional hydrological model for large-scale flood inundation simulations,
Hydrol. Res., 49, 438–449, <a href="https://doi.org/10.2166/nh.2017.061" target="_blank">https://doi.org/10.2166/nh.2017.061</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Hunter et al.(2005)Hunter, Horritt, Bates, Wilson, and
Werner</label><mixed-citation>
      
Hunter, N. M., Horritt, M. S., Bates, P. D., Wilson, M. D., and Werner, M. G.:
An adaptive time step solution for raster-based storage cell modelling of
floodplain inundation, Adv. Water Resour., 28, 975–991,
<a href="https://doi.org/10.1016/j.advwatres.2005.03.007" target="_blank">https://doi.org/10.1016/j.advwatres.2005.03.007</a>, 2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Kang(2026)</label><mixed-citation>
      
Kang, S.: CaMa-Flood-GPU, Zenodo [code], <a href="https://doi.org/10.5281/zenodo.18137445" target="_blank">https://doi.org/10.5281/zenodo.18137445</a>, 2026.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Kang et al.(2025)Kang, Yin, Slater, Liu, Sun, Liu, and
Xia</label><mixed-citation>
      
Kang, S., Yin, J., Slater, L., Liu, P., Sun, F., Liu, D., and Xia, J.: Global
Flood Projection and Socioeconomic Implications Under a Deep Learning
Framework, Water Resour. Res., 61, <a href="https://doi.org/10.1029/2024wr037139" target="_blank">https://doi.org/10.1029/2024wr037139</a>,
2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Kim et al.(2009)Kim, Yeh, Oki, and Kanae</label><mixed-citation>
      
Kim, H., Yeh, P. J.-F., Oki, T., and Kanae, S.: Role of rivers in the seasonal
variations of terrestrial water storage over global basins, Geophys.
Res. Lett., 36, L17402, <a href="https://doi.org/10.1029/2009GL039006" target="_blank">https://doi.org/10.1029/2009GL039006</a>, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Kimura et al.(2023)Kimura, Hirabayashi, Kita, Zhou, and
Yamazaki</label><mixed-citation>
      
Kimura, Y., Hirabayashi, Y., Kita, Y., Zhou, X., and Yamazaki, D.: Methodology for constructing a flood-hazard map for a future climate, Hydrol. Earth Syst. Sci., 27, 1627–1644, <a href="https://doi.org/10.5194/hess-27-1627-2023" target="_blank">https://doi.org/10.5194/hess-27-1627-2023</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Marthews et al.(2022)Marthews, Dadson, Clark, Blyth, Hayman,
Yamazaki, Becher, Martínez-de la Torre, Prigent, and
Jiménez</label><mixed-citation>
      
Marthews, T. R., Dadson, S. J., Clark, D. B., Blyth, E. M., Hayman, G. D., Yamazaki, D., Becher, O. R. E., Martínez-de la Torre, A., Prigent, C., and Jiménez, C.: Inundation prediction in tropical wetlands from JULES-CaMa-Flood global land surface simulations, Hydrol. Earth Syst. Sci., 26, 3151–3175, <a href="https://doi.org/10.5194/hess-26-3151-2022" target="_blank">https://doi.org/10.5194/hess-26-3151-2022</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Mateo et al.(2017)Mateo, Yamazaki, Kim, Champathong, Vaze, and
Oki</label><mixed-citation>
      
Mateo, C. M. R., Yamazaki, D., Kim, H., Champathong, A., Vaze, J., and Oki, T.: Impacts of spatial resolution and representation of flow connectivity on large-scale simulation of floods, Hydrol. Earth Syst. Sci., 21, 5143–5163, <a href="https://doi.org/10.5194/hess-21-5143-2017" target="_blank">https://doi.org/10.5194/hess-21-5143-2017</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Mizukami et al.(2016)</label><mixed-citation>
      
Mizukami, N., Clark, M. P., Sampson, K., Nijssen, B., Mao, Y., McMillan, H., Viger, R. J., Markstrom, S. L., Hay, L. E., Woods, R., Arnold, J. R., and Brekke, L. D.: mizuRoute version 1: a river network routing tool for a continental domain water resources applications, Geosci. Model Dev., 9, 2223–2238, <a href="https://doi.org/10.5194/gmd-9-2223-2016" target="_blank">https://doi.org/10.5194/gmd-9-2223-2016</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Morales-Hernández et al.(2021)</label><mixed-citation>
      
Morales-Hernández, M., Sharif, M. B., Kalyanapu, A., Ghafoor, S. K., Dullo, T. T., Gangrade, S., Kao, S.-C., Norman, M. R., and Evans, K. J.: TRITON: a multi-GPU open source 2D
hydrodynamic flood model, Environ. Model. Softw., 141, 105034,
<a href="https://doi.org/10.1016/j.envsoft.2021.105034" target="_blank">https://doi.org/10.1016/j.envsoft.2021.105034</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Muñoz-Sabater et al.(2021)Muñoz-Sabater, Dutra,
Agusti-Panareda, Albergel, Arduini et al.</label><mixed-citation>
      
Muñoz-Sabater, J., Dutra, E., Agustí-Panareda, A., Albergel, C., Arduini, G., Balsamo, G., Boussetta, S., Choulga, M., Harrigan, S., Hersbach, H., Martens, B., Miralles, D. G., Piles, M., Rodríguez-Fernández, N. J., Zsoter, E., Buontempo, C., and Thépaut, J.-N.: ERA5-Land: a state-of-the-art global reanalysis dataset for land applications, Earth Syst. Sci. Data, 13, 4349–4383, <a href="https://doi.org/10.5194/essd-13-4349-2021" target="_blank">https://doi.org/10.5194/essd-13-4349-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Neal et al.(2021)Neal, Hawker, Savage, Durand, Bates, and
Sampson</label><mixed-citation>
      
Neal, J., Hawker, L., Savage, J., Durand, M., Bates, P., and Sampson, C.:
Estimating River Channel Bathymetry in Large Scale Flood Inundation Models,
Water Resour. Res., 57, e2020WR028301, <a href="https://doi.org/10.1029/2020WR028301" target="_blank">https://doi.org/10.1029/2020WR028301</a>,
2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Rong et al.(2024)Rong, Bates, and Neal</label><mixed-citation>
      
Rong, Y., Bates, P., and Neal, J.: GPU‐Accelerated Urban Flood Modeling
Using a Nonuniform Structured Grid and a Super Grid Scale River Channel,
Water Resour. Res., 60, e2023WR036128, <a href="https://doi.org/10.1029/2023WR036128" target="_blank">https://doi.org/10.1029/2023WR036128</a>,
2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Schellekens et al.(2017)Schellekens, Dutra, Martínez-de la Torre,
Balsamo, van Dijk et al.</label><mixed-citation>
      
Schellekens, J., Dutra, E., Martínez-de la Torre, A., Balsamo, G., van Dijk, A., Sperna Weiland, F., Minvielle, M., Calvet, J.-C., Decharme, B., Eisner, S., Fink, G., Flörke, M., Peßenteiner, S., van Beek, R., Polcher, J., Beck, H., Orth, R., Calton, B., Burke, S., Dorigo, W., and Weedon, G. P.: A global water resources ensemble of hydrological models: the eartH2Observe Tier-1 dataset, Earth Syst. Sci. Data, 9, 389–413, <a href="https://doi.org/10.5194/essd-9-389-2017" target="_blank">https://doi.org/10.5194/essd-9-389-2017</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Sharifian et al.(2023)Sharifian, Kesserwani, Chowdhury, Neal, and
Bates</label><mixed-citation>
      
Sharifian, M. K., Kesserwani, G., Chowdhury, A. A., Neal, J., and Bates, P.: LISFLOOD-FP 8.1: new GPU-accelerated solvers for faster fluvial/pluvial flood simulations, Geosci. Model Dev., 16, 2391–2413, <a href="https://doi.org/10.5194/gmd-16-2391-2023" target="_blank">https://doi.org/10.5194/gmd-16-2391-2023</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Shen et al.(2023)Shen, Appling, Gentine
et al.</label><mixed-citation>
      
Shen, C., Appling, A. P., Gentine, P., Bandai, T., Gupta, H., Tartakovsky, A., Baity-Jesi, M., Fenicia, F., Kifer, D., Li, L., Liu, X., Ren, W., Zheng, Y., Harman, C. J., Clark, M., Farthing, M., Feng, D., Kumar, P., Aboelyazeed, D., Rahmani, F., Song, Y., Beck, H. E., Bindas, T., Dwivedi, D., Fang, K., Höge, M., Rackauckas, C., Mohanty, B., Roy, T., Xu, C., and Lawson, K.: Differentiable modelling to
unify machine learning and physical models for geosciences, Nat. Rev.
Earth Environ., 4, 552–567, <a href="https://doi.org/10.1038/s43017-023-00450-9" target="_blank">https://doi.org/10.1038/s43017-023-00450-9</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Shin et al.(2020)Shin, Pokhrel, Yamazaki, Huang, Torbick, Qi,
Pattanakiat, Ngo-Duc, and Nguyen</label><mixed-citation>
      
Shin, S., Pokhrel, Y., Yamazaki, D., Huang, X., Torbick, N., Qi, J.,
Pattanakiat, S., Ngo-Duc, T., and Nguyen, T. D.: High resolution modeling of
river-floodplain-reservoir inundation dynamics in the Mekong River Basin,
Water Resour. Res., 56, e2019WR026449, <a href="https://doi.org/10.1029/2019WR026449" target="_blank">https://doi.org/10.1029/2019WR026449</a>,
2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Tayefi et al.(2007)Tayefi, Lane, Hardy, and
Yu</label><mixed-citation>
      
Tayefi, V., Lane, S. N., Hardy, R. J., and Yu, D.: A comparison of one- and
two-dimensional approaches to modelling flood inundation over complex upland
floodplains, Hydrol. Process., 21, 3190–3202, <a href="https://doi.org/10.1002/hyp.6523" target="_blank">https://doi.org/10.1002/hyp.6523</a>,
2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Yamazaki(2025)</label><mixed-citation>
      
Yamazaki, D.: Advancing global river hydrodynamics simulations by
catchment-based macro-scale floodplain modeling approach, Geosci. Lett.,
12, 72, <a href="https://doi.org/10.1186/s40562-025-00452-z" target="_blank">https://doi.org/10.1186/s40562-025-00452-z</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Yamazaki et al.(2011)Yamazaki, Kanae, Kim, and
Oki</label><mixed-citation>
      
Yamazaki, D., Kanae, S., Kim, H., and Oki, T.: A physically based description
of floodplain inundation dynamics in a global river routing model, Water
Resour. Res., 47, 2010WR009726, <a href="https://doi.org/10.1029/2010WR009726" target="_blank">https://doi.org/10.1029/2010WR009726</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Yamazaki et al.(2014)Yamazaki, Sato, Kanae, Hirabayashi, and
Bates</label><mixed-citation>
      
Yamazaki, D., Sato, T., Kanae, S., Hirabayashi, Y., and Bates, P. D.: Regional
flood dynamics in a bifurcating mega delta simulated in a global river model,
Geophys. Res. Lett., 41, 3127–3135, <a href="https://doi.org/10.1002/2014GL059744" target="_blank">https://doi.org/10.1002/2014GL059744</a>,
2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Yamazaki et al.(2019)Yamazaki, Ikeshima, Sosa, Bates, Allen, and
Pavelsky</label><mixed-citation>
      
Yamazaki, D., Ikeshima, D., Sosa, J., Bates, P. D., Allen, G. H., and Pavelsky,
T. M.: MERIT Hydro: A High‐Resolution Global Hydrography Map Based on
Latest Topography Dataset, Water Resour. Res., 55, 5053–5073,
<a href="https://doi.org/10.1029/2019WR024873" target="_blank">https://doi.org/10.1029/2019WR024873</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Yamazaki et al.(2024)Yamazaki, Revel, Hatono, Hanazaki, Nitta,
Wortmann, Zhou, DirkEilander, Kang, Pilz, and
Zhao</label><mixed-citation>
      
Yamazaki, D., Revel, M., Hatono, M., Hanazaki, R., Nitta, T., Wortmann, M.,
Zhou, X., DirkEilander, Kang, S., Pilz, T., and Zhao, F.:
global-hydrodynamics/CaMa-Flood_v4: Release_v4.23, Zenodo [code],
<a href="https://doi.org/10.5281/zenodo.14214989" target="_blank">https://doi.org/10.5281/zenodo.14214989</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Zahura et al.(2020)Zahura, Goodall, Sadler, Shen, Morsy, and
Behl</label><mixed-citation>
      
Zahura, F. T., Goodall, J. L., Sadler, J. M., Shen, Y., Morsy, M. M., and Behl,
M.: Training Machine Learning Surrogate Models From a High‐Fidelity
Physics‐Based Model: Application for Real‐Time Street‐Scale Flood
Prediction in an Urban Coastal Community, Water Resour. Res., 56,
e2019WR027038, <a href="https://doi.org/10.1029/2019WR027038" target="_blank">https://doi.org/10.1029/2019WR027038</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Zhao et al.(2017)Zhao, Veldkamp, Frieler, Schewe, Ostberg, Willner,
Schauberger, Gosling, Müller Schmied, Portmann, Leng, Huang, Liu, Tang,
Hanasaki, Biemans, Gerten, Satoh, Pokhrel, Stacke, Ciais, Chang, Ducharne,
Guimberteau, Wada, Kim, and Yamazaki</label><mixed-citation>
      
Zhao, F., Veldkamp, T. I. E., Frieler, K., Schewe, J., Ostberg, S., Willner,
S., Schauberger, B., Gosling, S. N., Müller Schmied, H., Portmann, F. T.,
Leng, G., Huang, M., Liu, X., Tang, Q., Hanasaki, N., Biemans, H., Gerten,
D., Satoh, Y., Pokhrel, Y., Stacke, T., Ciais, P., Chang, J., Ducharne, A.,
Guimberteau, M., Wada, Y., Kim, H., and Yamazaki, D.: The critical role of
the routing scheme in simulating peak river discharge in global hydrological
models, Environ. Res. Lett., 12, 075003,
<a href="https://doi.org/10.1088/1748-9326/aa7250" target="_blank">https://doi.org/10.1088/1748-9326/aa7250</a>, 2017.

    </mixed-citation></ref-html>--></article>
