<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">GMD</journal-id><journal-title-group>
    <journal-title>Geoscientific Model Development</journal-title>
    <abbrev-journal-title abbrev-type="publisher">GMD</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Geosci. Model Dev.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1991-9603</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/gmd-18-5351-2025</article-id><title-group><article-title>GPTCast: a weather language model for precipitation nowcasting</article-title><alt-title>GPTCast</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Franch</surname><given-names>Gabriele</given-names></name>
          <email>franch@fbk.eu</email>
        <ext-link>https://orcid.org/0000-0002-1264-0529</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Tomasi</surname><given-names>Elena</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-1801-4991</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Wanjari</surname><given-names>Rishabh</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Poli</surname><given-names>Virginia</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Cardinali</surname><given-names>Chiara</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Alberoni</surname><given-names>Pier Paolo</given-names></name>
          
        <ext-link>https://orcid.org/0000-0003-2107-0289</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Cristoforetti</surname><given-names>Marco</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>Fondazione Bruno Kessler, Trento, Italy</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Arpae Emilia-Romagna, Bologna, Italy</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Gabriele Franch (franch@fbk.eu)</corresp></author-notes><pub-date><day>27</day><month>August</month><year>2025</year></pub-date>
      
      <volume>18</volume>
      <issue>16</issue>
      <fpage>5351</fpage><lpage>5371</lpage>
      <history>
        <date date-type="received"><day>25</day><month>September</month><year>2024</year></date>
           <date date-type="rev-request"><day>10</day><month>October</month><year>2024</year></date>
           <date date-type="rev-recd"><day>18</day><month>April</month><year>2025</year></date>
           <date date-type="accepted"><day>9</day><month>June</month><year>2025</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2025 Gabriele Franch et al.</copyright-statement>
        <copyright-year>2025</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025.html">This article is available from https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025.html</self-uri><self-uri xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025.pdf">The full text article is available as a PDF file from https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e142">This work introduces GPTCast, a generative deep learning method for ensemble nowcasting of radar-based precipitation, inspired by advancements in large language models (LLMs). We employ a generative pre-trained transformer (GPT) model as a forecaster to learn spatiotemporal precipitation dynamics using tokenized radar images. The tokenizer is based on a Variational Quantized Autoencoder (VQGAN) featuring a novel reconstruction loss tailored for the skewed distribution of precipitation that promotes faithful reconstruction of high rainfall rates. This approach produces realistic ensemble forecasts and provides probabilistic outputs with accurate uncertainty estimation. The core architecture operates deterministically during the forward pass; ensemble variability arises from sampling the categorical probability distribution predicted by the forecaster during inference, rather than requiring external random inputs such as noise injection common in other generative models. All forecast variability is thus learned solely from the data distribution. We train and test GPTCast using a 6-year radar dataset over the Emilia-Romagna region in northern Italy, showing superior results compared to state-of-the-art ensemble extrapolation methods.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction and prior work</title>
      <p id="d2e154">Nowcasting  – short-term forecasting up to 6 h –  of precipitation is a crucial tool for mitigating water-related hazards <xref ref-type="bibr" rid="bib1.bibx51" id="paren.1"/>. Sudden precipitation can result in landslides and floods, frequently compounded by strong winds, lightning, and hailstorms, which can seriously jeopardize human safety and damage infrastructure. The foundation of very short term (up to 2 h) precipitation nowcasting systems is the application of extrapolation techniques to weather radar reflectivity sequences <xref ref-type="bibr" rid="bib1.bibx4" id="paren.2"/> that ingest current and <inline-formula><mml:math id="M1" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> previous observations <inline-formula><mml:math id="M2" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> with the aim to extrapolate <inline-formula><mml:math id="M3" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula> future time steps <inline-formula><mml:math id="M4" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. These short-term precipitation forecasts are essential for emergency response when released timely and communicated properly via early warning systems <xref ref-type="bibr" rid="bib1.bibx22" id="paren.3"/>.</p>
      <p id="d2e245">The main contenders to extrapolation techniques are numerical weather prediction (NWP) models, which can be used to forecast the probability and estimate the intensity of precipitation across large regions, but their accuracy is limited at smaller geographical and temporal scales <xref ref-type="bibr" rid="bib1.bibx45" id="paren.4"/>. Convective precipitation, which produces high rainfall rates and small cells, is especially difficult to forecast correctly for NWP models <xref ref-type="bibr" rid="bib1.bibx44" id="paren.5"/>. For these reasons, operational weather agencies recognize the great value offered by short-term extrapolation forecasts and make heavy use of statistical and, more recently, data-driven models that utilize the most recent weather radar observations for nowcasting <xref ref-type="bibr" rid="bib1.bibx55 bib1.bibx47" id="paren.6"/>.</p>
      <p id="d2e257">Lagrangian extrapolation is the most well known method for nowcasting precipitation <xref ref-type="bibr" rid="bib1.bibx3" id="paren.7"/>. It generates motion vectors to forecast the future direction of precipitation systems by applying optical-flow algorithms to a series of radar-derived rain fields. However, this approach becomes less accurate for increasing lead time, particularly in convective situations where precipitation could increase or decrease quickly. Several alternative techniques have been studied to overcome these constraints, such as the seamless integration between nowcasting and NWP forecasts <xref ref-type="bibr" rid="bib1.bibx43 bib1.bibx5" id="paren.8"/> and the integration of orography data <xref ref-type="bibr" rid="bib1.bibx12 bib1.bibx31" id="paren.9"/>. Other, more sophisticated nowcasting methods improve the Lagrangian approach by generating ensemble nowcasts and preserving the precipitation field's structural characteristics. These sets of multiple forecasts aid in the assessment of forecast uncertainty by presenting multiple future scenarios. The most widespread example of this approach is the Short-Term Ensemble Prediction System (STEPS) <xref ref-type="bibr" rid="bib1.bibx5 bib1.bibx41" id="paren.10"/>.</p>
      <p id="d2e272">The most recent advancements in nowcasting precipitation have seen the application of data-driven methods and, more prominently, of deep neural networks (DNNs) and generative AI techniques to enhance forecast accuracy and realism. Deterministic DNNs have been instrumental in predicting the dynamics of precipitation, including its development and dissipation, overcoming one of the major shortcomings of extrapolation methods <xref ref-type="bibr" rid="bib1.bibx42 bib1.bibx1 bib1.bibx49 bib1.bibx15 bib1.bibx2" id="paren.11"/>. However, deterministic models tend to produce less precise forecasts over time due to increasing uncertainty that manifests itself as a forecast field that smooths progressively with the lead time. Similarly to Lagrangian extrapolation, to overcome this limitation, ensemble deep learning methods have been introduced. Generative methods have significantly improved the generation of realistic precipitation fields beyond deterministic average predictions. The forefront of this technology is embodied in models that employ techniques, such as generative adversarial networks (GANs) <xref ref-type="bibr" rid="bib1.bibx59 bib1.bibx37" id="paren.12"/>, which enable more accurate and detailed precipitation forecasts by learning to mimic real weather patterns closely, and more recently by latent diffusion models <xref ref-type="bibr" rid="bib1.bibx27 bib1.bibx19" id="paren.13"/>, which can not only generate realistic rainfall forecasts but also produce reliable ensembles that can provide accurate uncertainty quantification of future scenarios. Many of these techniques were originally born in the field of computer vision and have subsequently been adapted to the weather forecasting domain with resounding success <xref ref-type="bibr" rid="bib1.bibx21 bib1.bibx39" id="paren.14"/>.</p>
      <p id="d2e288">In this study, we take inspiration from the successful trend of applying large language model (LLM) architectures <xref ref-type="bibr" rid="bib1.bibx48 bib1.bibx54" id="paren.15"/> born in the field of natural language processing (NLP) to other disciplines <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx29" id="paren.16"/>, including the medium-range weather forecasting domain <xref ref-type="bibr" rid="bib1.bibx26 bib1.bibx28" id="paren.17"/>, intending to transfer this knowledge to the nowcasting domain. To do so, in our work, we follow a strategy that mimics the setup of natural language processing: a tokenization step, where an input tokenizer splits and maps the input to a finite vocabulary, and an autoregressive model trained on the tokens produced by the tokenizer. We show that such an approach produces realistic and reliable ensemble forecasts. Given the different characteristics of our input data compared to LLMs (i.e., spatiotemporal precipitation fields vs. texts or images), our adaptation introduces several novel contributions instrumental to our task.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>GPTCast model architecture</title>
      <p id="d2e308">There are two main components of our approach, which we call GPTCast: <list list-type="bullet"><list-item>
      <p id="d2e313"><italic>Spatial tokenizer (VQGAN).</italic> An image compression and discretization model that learns to map patches of the radar image from/to a finite number of possible representations (tokens). The learned codebook of tokens can be used to express a compact representation of any precipitation field. The tokenizer thus has a dual role: learning how to compress and decompress the information in the input image and how to discretize the compressed information (i.e., learn an optimal codebook).</p></list-item><list-item>
      <p id="d2e319"><italic>Spatiotemporal forecaster (generative pre-trained transformer, GPT).</italic> A model trained on token sequences to causally learn the evolutionary dynamics of precipitation over space and time. Given a tokenized spatiotemporal context (a compressed precipitation sequence), the model outputs probabilities over the fixed codebook for the next expected token for the context. The output probabilities can be leveraged for ensemble generation.</p></list-item></list></p>
      <p id="d2e324">This dual-stage architecture is an adaptation of the work of <xref ref-type="bibr" rid="bib1.bibx9" id="author.18"/>, which we repurposed from the task of image generation to the task of precipitation nowcasting by introducing two key modifications:</p>
      <p id="d2e330"><list list-type="bullet">
          <list-item>

      <p id="d2e335">In the spatial tokenizer (VQGAN) model, we replace the standard reconstruction loss (mean absolute error, MAE) with a specific loss that helps improve the reconstruction of precipitation patterns (magnitude weighted absolute error, MWAE). Moreover, the new loss also shows a promotion of the token utilization rate, where we achieve 100 % codebook utilization.</p>
          </list-item>
          <list-item>

      <p id="d2e341">The token sequences used to train the GPT model represent a fixed three-dimensional context of time <inline-formula><mml:math id="M5" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> height <inline-formula><mml:math id="M6" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> width of precipitation patterns. This allows the model to learn spatiotemporal dynamics of the evolution of radar sequences.</p>
          </list-item>
        </list></p>
      <p id="d2e360">The two components of the model are trained independently in cascade, starting with the tokenizer. This deliberate dual-stage architecture is crucial for achieving stable training and unlocking desirable properties for operational nowcasting run by meteorological services. Indeed, training the VQGAN and the GPT simultaneously with an end-to-end approach would introduce significant instability. As a probabilistic sequence model, the GPT relies on a fixed, finite vocabulary for stable operation: attempting to learn the token representation (vocabulary) concurrently with the complex spatiotemporal dynamics would force the GPT to learn dependencies over a constantly evolving vocabulary, likely hindering convergence. Furthermore, the fundamentally different architectures (CNN-based VQGAN with its specific loss functions versus the autoregressive transformer GPT) and the challenges of backpropagation through the VQGAN's discrete quantization step would exacerbate training instability. By firstly establishing a robust and fixed vocabulary through the VQGAN, we create a stable foundation for the GPT to learn the spatiotemporal dynamics of precipitation. This separation allows specialized and stable optimization of each component, ultimately enabling both realistic ensemble generation and accurate uncertainty estimation at the spatiotemporal (token) level, which are instrumental in meeting the requirements of operational nowcasting systems run by meteorological services.</p>
      <p id="d2e364">Another notable feature of GPTCast is that its core architecture operates deterministically, meaning it does not require stochastic elements such as injected noise during the forward pass for either training or inference. This contrasts with models such as GANs or diffusion models <xref ref-type="bibr" rid="bib1.bibx37 bib1.bibx27 bib1.bibx59" id="paren.19"/>, which often rely on random inputs to generate variability. In GPTCast, variability for ensemble generation stems from the learned data patterns: the tokenizer learns a discrete representation, allowing the forecaster to output a categorical probability distribution over the token vocabulary for each prediction step. Sampling from this distribution during autoregressive inference generates diverse ensemble members, ensuring all variability originates from the learned conditional probability of future states given the past, rather than external randomness (note: standard stochasticity in parameter initialization and optimization, e.g., stochastic gradient descent, is still employed during training).</p>
      <p id="d2e370">We describe the details of the model setup and novel contributions in the following subsections.</p>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Spatial tokenizer: VQGAN</title>
      <p id="d2e380">The spatial tokenizer is a Variational Quantized Autoencoder (VQGAN) featuring an adversarial loss <xref ref-type="bibr" rid="bib1.bibx9" id="paren.20"/> and a novel reconstruction loss specifically tailored to improve the reconstruction of precipitation. We carefully tune the architecture of the VQGAN to obtain a model that provides the highest possible compression while maintaining a good reconstruction performance and computational complexity. The architecture of the tokenizer is visually summarized in Fig. <xref ref-type="fig" rid="F1"/>.</p>
      <p id="d2e388">The encoder (<inline-formula><mml:math id="M7" display="inline"><mml:mi>E</mml:mi></mml:math></inline-formula>) and decoder (<inline-formula><mml:math id="M8" display="inline"><mml:mi>G</mml:mi></mml:math></inline-formula>) of the autoencoder are symmetric in design and formed mainly by convolutional blocks, with <inline-formula><mml:math id="M9" display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula> steps of downsampling and upsampling, respectively. With this setup, each latent vector at the bottleneck summarizes a patch of <inline-formula><mml:math id="M10" display="inline"><mml:mrow><mml:msup><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">α</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mn mathvariant="normal">2</mml:mn><mml:mn mathvariant="normal">4</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">16</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">16</mml:mn></mml:mrow></mml:math></inline-formula> pixels of the input image. Following recent studies <xref ref-type="bibr" rid="bib1.bibx57" id="paren.21"/>, we find it useful to set a number of channels at the bottleneck (i.e., the length of the latent vector) of 8 to obtain efficient utilization of the codebook, good training stability, and the effective capture of essential features in a space of reduced dimension. This choice was informed by the cited literature and our preliminary experiments, indicating a good balance between codebook utilization, training stability, and feature capture. The latent vectors at the bottleneck are discretized using a quantization layer that maps them to a finite codebook (<inline-formula><mml:math id="M11" display="inline"><mml:mi>Z</mml:mi></mml:math></inline-formula>) by finding the closest vector in the codebook. We define a codebook size of 1024 tokens in the quantization layer. The codebook vectors are initialized randomly and then learned during training.</p>
      <p id="d2e454">As an example, with an input precipitation map of <inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:mn mathvariant="normal">192</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">192</mml:mn></mml:mrow></mml:math></inline-formula> pixels with a dynamic range of 601 possible values for each pixel (from 0 to 60 dBZ with a 0.1 dBZ step, as described later in Table <xref ref-type="table" rid="T2"/>), the resulting feature vector at the bottleneck will have a dimensionality of <inline-formula><mml:math id="M13" display="inline"><mml:mrow><mml:mn mathvariant="normal">12</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>H</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">12</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>W</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">8</mml:mn></mml:mrow></mml:math></inline-formula> channels. Each 8-channel vector is then mapped to one of the possible 1024 vectors in the codebook, resulting in a compressed and discretized representation of <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:mn mathvariant="normal">12</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>H</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">12</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>W</mml:mi></mml:mrow></mml:math></inline-formula> with a dynamic range of 1024 values. The resulting total compression ratio of the spatial tokenizer is <inline-formula><mml:math id="M15" display="inline"><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mn mathvariant="normal">192</mml:mn><mml:mo>⋅</mml:mo><mml:mn mathvariant="normal">192</mml:mn><mml:mo>⋅</mml:mo><mml:mn mathvariant="normal">601</mml:mn></mml:mrow><mml:mrow><mml:mn mathvariant="normal">12</mml:mn><mml:mo>⋅</mml:mo><mml:mn mathvariant="normal">12</mml:mn><mml:mo>⋅</mml:mo><mml:mn mathvariant="normal">1024</mml:mn></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>≈</mml:mo><mml:mn mathvariant="normal">150</mml:mn></mml:mrow></mml:math></inline-formula> times.</p>
      <p id="d2e545">To support such a high compression ratio while maintaining good reconstruction ability, especially for the extreme values, we developed a novel reconstruction loss that we use in place of commonly used reconstruction losses (<inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:msub><mml:mi>l</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> or <inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:msub><mml:mi>l</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, a.k.a. mean absolute error or mean squared error), defined as follows:

            <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M18" display="block"><mml:mrow><mml:mtext>MWAE</mml:mtext><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mfenced open="|" close="|"><mml:mrow><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mfenced><mml:mo>⋅</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M19" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula> is the sigmoid function <inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>(</mml:mo><mml:mi>z</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>+</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mi>z</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M21" display="inline"><mml:mi mathvariant="bold-italic">x</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M22" display="inline"><mml:mi mathvariant="bold-italic">y</mml:mi></mml:math></inline-formula> are the input and output vectors of the autoencoder, respectively. We call this loss the magnitude weighted absolute error (MWAE). By giving more weight to pixels with higher rain rates (magnitude), this loss simultaneously serves two purposes: the first is to nudge the tokenizer towards reserving more learning capacity for the reconstruction of extremes, and the other is to help to rebalance the notoriously skewed distribution of precipitation data, which by nature leans towards low rain rates. While the sigmoid function can saturate for very large input values, potentially diminishing the sensitivity to differences in extreme rain rates, this effect is mitigated by our data preprocessing. The input radar reflectivity values (0–60 dBZ) are linearly rescaled to the range <inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> before being fed into the VQGAN. Within this range, the sigmoid function operates in a quasi-linear manner, ensuring that the absolute difference term <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:mo>|</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula> appropriately reflects differences between the scaled true and reconstructed values, even for high rain rates within the considered 0–60 dBZ range. The primary reason for using the sigmoid, rather than a purely linear weighting, is to provide robustness against potential out-of-range predictions from the decoder during training, which can occur due to the perturbations introduced by adversarial training. The sigmoid gracefully handles such out-of-range values without assigning excessively large loss values, thereby improving training stability.</p>
      <p id="d2e753">Alongside MWAE and the adversarial loss, the model incorporates the Learned Perceptual Image Patch Similarity (LPIPS) loss <xref ref-type="bibr" rid="bib1.bibx58" id="paren.22"/>, as shown in Fig. <xref ref-type="fig" rid="F1"/>, which further encourages perceptually realistic reconstructions by comparing feature activations in a pre-trained network. In our preliminary experiments, while not affecting the final reconstruction performance, this loss term enabled a faster model convergence.</p>
      <p id="d2e761">The interactions between loss terms during training follow the original VQGAN implementation <xref ref-type="bibr" rid="bib1.bibx9" id="paren.23"/>. The total size of the VQGAN model is 90 million trainable parameters.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e769">The spatial tokenizer architecture. The four loss terms (MWAE reconstruction loss, adversarial loss, LPIPS perceptual loss, and codebook loss) are enclosed in boxes with green borders. The blue square (<inline-formula><mml:math id="M25" display="inline"><mml:mi mathvariant="bold-italic">i</mml:mi></mml:math></inline-formula>) is the input image, and the yellow square (<inline-formula><mml:math id="M26" display="inline"><mml:mi mathvariant="bold-italic">o</mml:mi></mml:math></inline-formula>) is the reconstructed autoencoder output. The codebook loss is formed by two complementary parts: the gradient from the detached <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mi>q</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> primarily updates the codebook weights, while the gradient involving the detached <inline-formula><mml:math id="M28" display="inline"><mml:mi mathvariant="bold-italic">z</mml:mi></mml:math></inline-formula> primarily updates the encoder layers preceding the quantization, encouraging them to produce easily quantizable representations.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025-f01.png"/>

        </fig>

</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Spatiotemporal forecaster: GPT</title>
      <p id="d2e818">Similarly to <xref ref-type="bibr" rid="bib1.bibx9" id="author.24"/>, the core predictive component of GPTCast is an autoregressive transformer model based on the GPT-2 architecture <xref ref-type="bibr" rid="bib1.bibx36" id="paren.25"/>. We chose this specific architecture, as it represents a well-established, robust, and widely understood foundation, allowing us to focus on the novel application of the tokenization and autoregressive generation paradigm to radar nowcasting, rather than optimizing for the latest transformer variants. GPT-2 provides a strong baseline whose components are readily adaptable for spatiotemporal forecasting tasks.</p>
      <p id="d2e827">The GPTCast transformer utilizes 24 layers and 16 attention heads, resulting in a total of 304 million trainable parameters for this forecasting component. When combined with the VQGAN tokenizer (approximately 90 million parameters; see Sect. <xref ref-type="sec" rid="Ch1.S2.SS1"/>), the entire GPTCast system comprises roughly 394 million parameters. While potentially smaller than the largest models currently used in natural language processing, this scale is substantial within the atmospheric sciences. For context, it exceeds the size of ECMWF's operational AI Forecasting System (AIFS; approx. 253 million parameters according to its public checkpoint <xref ref-type="bibr" rid="bib1.bibx26" id="paren.26"/>), is comparable to recent diffusion models for dynamical downscaling (e.g., approx. 300 million parameters in <xref ref-type="bibr" rid="bib1.bibx46" id="altparen.27"/>), and is significantly larger than prominent graph-based models such as GraphCast (36.7 million parameters; <xref ref-type="bibr" rid="bib1.bibx25" id="altparen.28"/>). This highlights that GPTCast, despite using an established architecture, represents a large-scale deep learning approach for precipitation nowcasting. While GPT-2 serves as an effective proof of concept, future work could certainly explore the potential benefits of more recent or specialized transformer architectures (e.g., those optimized for efficiency or long-context modeling) for this task.</p>
      <p id="d2e841">We train two configurations, one with a spatiotemporal context size of 8 time steps (40 min) <inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:mo>×</mml:mo><mml:mn mathvariant="normal">256</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">256</mml:mn></mml:mrow></mml:math></inline-formula> pixels and one with 8 time steps <inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:mo>×</mml:mo><mml:mn mathvariant="normal">128</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">128</mml:mn></mml:mrow></mml:math></inline-formula> pixels. At the token level, the two configurations amount to a context length of 2048 (<inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:mn mathvariant="normal">8</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">16</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">16</mml:mn></mml:mrow></mml:math></inline-formula> tokens) and 512 (<inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:mn mathvariant="normal">8</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">8</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">8</mml:mn></mml:mrow></mml:math></inline-formula> tokens), respectively. We refer to the two models as GPTCast-16x16 and GPTCast-8x8, respectively. In a GPT-like transformer model, the context size (or sequence length) does not affect the number of parameters; instead, it influences the computational complexity and memory requirements of the model during training and (more crucially) inference. For these reasons, careful consideration in balancing computational complexity and model performance should be made, since timely forecasts are crucial for nowcasting. A summary of the two GPT models' settings is reported in Table <xref ref-type="table" rid="T1"/>.</p>

<table-wrap id="T1" specific-use="star"><label>Table 1</label><caption><p id="d2e910">GPTCast model configurations with large and small spatial domain.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Configuration/model name</oasis:entry>
         <oasis:entry colname="col2">GPTCast-16x16</oasis:entry>
         <oasis:entry colname="col3">GPTCast-8x8</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Vocabulary size</oasis:entry>
         <oasis:entry colname="col2">1024</oasis:entry>
         <oasis:entry colname="col3">1024</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Context length</oasis:entry>
         <oasis:entry colname="col2">2048 (<inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:mn mathvariant="normal">8</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>T</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">16</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>H</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">16</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>W</mml:mi></mml:mrow></mml:math></inline-formula> tokens)</oasis:entry>
         <oasis:entry colname="col3">512 (<inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:mn mathvariant="normal">8</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>T</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">8</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>H</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">8</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>W</mml:mi></mml:mrow></mml:math></inline-formula> tokens)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Number of layers</oasis:entry>
         <oasis:entry colname="col2">24</oasis:entry>
         <oasis:entry colname="col3">24</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Number of heads</oasis:entry>
         <oasis:entry colname="col2">16</oasis:entry>
         <oasis:entry colname="col3">16</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Embedding dimension</oasis:entry>
         <oasis:entry colname="col2">1024</oasis:entry>
         <oasis:entry colname="col3">1024</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e1052">The training process of the forecaster is schematized in Fig. <xref ref-type="fig" rid="F2"/>: contiguous spatiotemporal sequences of radar data are retrieved from the training dataset and encoded into codebook indices through the frozen VQGAN encoder and passed to the GPT model as training samples. The GPT forecaster is trained autoregressively to predict the probability distribution for each token <inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> given the sequence of preceding tokens <inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mo>&lt;</mml:mo><mml:mi mathvariant="normal">t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>. The tokens are ordered starting with the oldest image using a row-first format. The ordering is instrumental to the nowcasting task: in inference, we can provide the model with a context that is pre-filled with the past seven time steps to generate the tokens for the eighth time step.</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e1084">The spatiotemporal forecaster architecture. During the training of the forecaster, the tokenizer encoder (<inline-formula><mml:math id="M37" display="inline"><mml:mi>E</mml:mi></mml:math></inline-formula>) weights are frozen.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025-f02.png"/>

        </fig>

</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Inference</title>
      <p id="d2e1108">At inference time, the two models are combined in a sandwich-like configuration, with the encoding of the context input images through the VQGAN encoder, the autoregressive generation of the indices of multiple forecast steps via the transformer model, and the final decoding of the tokens back to pixel space using the VQGAN decoder (see Fig. <xref ref-type="fig" rid="F3"/>). To obtain multiple ensemble members, the autoregressive generation of the indices can be repeated multiple times while applying a multinomial draw over the output probabilities to pick different tokens.</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e1115">The GPTCast architecture during inference. The trained tokenizer and forecaster are combined (tokenizer encoder (<inline-formula><mml:math id="M38" display="inline"><mml:mi>E</mml:mi></mml:math></inline-formula>) <inline-formula><mml:math id="M39" display="inline"><mml:mo>→</mml:mo></mml:math></inline-formula> forecaster <inline-formula><mml:math id="M40" display="inline"><mml:mo>→</mml:mo></mml:math></inline-formula> tokenizer decoder (<inline-formula><mml:math id="M41" display="inline"><mml:mi>G</mml:mi></mml:math></inline-formula>)) to generate forecasts. In the standard unconditional setting, the next token is chosen by applying a multinomial draw over the codebook probabilities to generate different ensemble members.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025-f03.png"/>

        </fig>

      <p id="d2e1152">To generate forecasts for spatial domains larger than the specific training context size, we employ a sliding window inference strategy, illustrated in Fig. <xref ref-type="fig" rid="F4"/> and detailed in Algorithm <xref ref-type="other" rid="Ch1.Prog1"/>. We process the target forecast frame sequentially, following the row-first raster scan order. To predict the token index <inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> for a specific spatial location <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> in the forecast frame, we construct an input context sequence for the transformer. This sequence comprises relevant tokens from previous time steps within a defined spatiotemporal window around <inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, along with any tokens already predicted in the current forecast frame that precede <inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> in the row-first sequential order. The transformer then predicts the probability distribution for the next token based on this context. Sampling from this distribution yields the predicted token <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>. This sequential, conditioned generation ensures that spatial and temporal consistency is learned and maintained across the domain via the transformer's attention mechanism, as each token prediction depends on its previously generated neighbors in space and time. The handling of domain edges occurs naturally as the available context within the sliding window adapts based on the target token's position.</p>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e1243">An illustration of the sliding window approach for a forecaster trained with a context length of 4 steps <inline-formula><mml:math id="M47" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 3 height <inline-formula><mml:math id="M48" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 3 width (36 tokens). Forecasts for domains of arbitrary sizes can be generated by moving the context window across the forecasting domain to predict a target token in the larger domain (starting with the token at the top-left position). A fixed start-of-sequence token (index 0) is prepended to the context to provide an initial conditioning for the first token.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025-f04.png"/>

        </fig>

<boxed-text content-type="algorithm" position="float" id="Ch1.Prog1"><label>Algorithm 1</label><caption><p id="d2e1268">Pseudocode for sliding window prediction algorithm.</p></caption><disp-quote content-type="algorithmic" specific-use="numbering{0}"><list>

    <list-item><label><bold>Require:</bold></label>

      <p id="d2e1278" specific-use="REQUIRE">input_indices {Tensor of shape <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mi>B</mml:mi><mml:mo>,</mml:mo><mml:mi>S</mml:mi><mml:mo>,</mml:mo><mml:mi>H</mml:mi><mml:mo>,</mml:mo><mml:mi>W</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>}</p>
            </list-item>

    <list-item><label><bold>Require:</bold></label>

      <p id="d2e1312" specific-use="REQUIRE">c_indices {Conditioning tokens (Start of Sequence)}</p>
            </list-item>

    <list-item><label><bold>Require:</bold></label>

      <p id="d2e1322" specific-use="REQUIRE">window_size {Size of sliding context window}</p>
            </list-item>

    <list-item><label><bold>Ensure:</bold></label>

      <p id="d2e1332" specific-use="ENSURE">predicted_indices {Next frame token indices}</p>
            </list-item>

    <list-item>

      <p id="d2e1339" specific-use="STATE"><inline-formula><mml:math id="M50" display="inline"><mml:mrow><mml:mi>B</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="italic">_</mml:mi><mml:mo>,</mml:mo><mml:mi>H</mml:mi><mml:mo>,</mml:mo><mml:mi>W</mml:mi><mml:mo>←</mml:mo><mml:mtext>shape</mml:mtext><mml:mo>(</mml:mo><mml:mtext>input_indices</mml:mtext><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></p>
            </list-item>

    <list-item>

      <p id="d2e1375" specific-use="STATE"><inline-formula><mml:math id="M51" display="inline"><mml:mrow><mml:mtext>half_window</mml:mtext><mml:mo>←</mml:mo><mml:mo>⌊</mml:mo><mml:mtext>window_size</mml:mtext><mml:mo>/</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>⌋</mml:mo></mml:mrow></mml:math></inline-formula></p>
            </list-item>

    <list-item>

      <p id="d2e1399" specific-use="STATE"><inline-formula><mml:math id="M52" display="inline"><mml:mrow><mml:mtext>predicted_indices</mml:mtext><mml:mo>←</mml:mo><mml:mtext>Tensor</mml:mtext><mml:mo>(</mml:mo><mml:mi>B</mml:mi><mml:mo>,</mml:mo><mml:mi>H</mml:mi><mml:mo>,</mml:mo><mml:mi>W</mml:mi><mml:mo>)</mml:mo><mml:mtext> filled with </mml:mtext><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula></p>
            </list-item>

    <list-item>

      <p id="d2e1436" specific-use="STATE"><inline-formula><mml:math id="M53" display="inline"><mml:mrow><mml:mtext>conditioning</mml:mtext><mml:mo>←</mml:mo><mml:mtext>reshape</mml:mtext><mml:mo>(</mml:mo><mml:mtext>c_indices</mml:mtext><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> {Flatten conditioning}</p>
            </list-item>

    <list-item>

      <p id="d2e1460" specific-use="FOR"><bold>for</bold> <inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> <bold>to</bold> <inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:mi>H</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> <bold>do</bold> <list>
    <list-item>
      <p id="d2e1498" specific-use="FOR"><bold>for</bold> <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> <bold>to</bold> <inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:mi>W</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> <bold>do</bold> <list>
    <list-item>
      <p id="d2e1536" specific-use="STATE">/* Calculate window boundaries with edge handling */</p></list-item>
    <list-item>
      <p id="d2e1541" specific-use="STATE"><inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:msub><mml:mi>i</mml:mi><mml:mtext>start</mml:mtext></mml:msub><mml:mo>←</mml:mo><mml:mo>max⁡</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo>-</mml:mo><mml:mtext>half_window</mml:mtext><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></p></list-item>
    <list-item>
      <p id="d2e1573" specific-use="STATE"><inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:msub><mml:mi>i</mml:mi><mml:mtext>end</mml:mtext></mml:msub><mml:mo>←</mml:mo><mml:mo>min⁡</mml:mo><mml:mo>(</mml:mo><mml:mi>H</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mtext>start</mml:mtext></mml:msub><mml:mo>+</mml:mo><mml:mtext>window_size</mml:mtext><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></p></list-item>
    <list-item>
      <p id="d2e1608" specific-use="STATE"><inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:msub><mml:mi>i</mml:mi><mml:mtext>start</mml:mtext></mml:msub><mml:mo>←</mml:mo><mml:mo>max⁡</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mtext>end</mml:mtext></mml:msub><mml:mo>-</mml:mo><mml:mtext>window_size</mml:mtext><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> {Adjust if at bottom edge}</p></list-item>
    <list-item>
      <p id="d2e1645" specific-use="STATE"><inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:msub><mml:mi>j</mml:mi><mml:mtext>start</mml:mtext></mml:msub><mml:mo>←</mml:mo><mml:mo>max⁡</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>-</mml:mo><mml:mtext>half_window</mml:mtext><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></p></list-item>
    <list-item>
      <p id="d2e1678" specific-use="STATE"><inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:msub><mml:mi>j</mml:mi><mml:mtext>end</mml:mtext></mml:msub><mml:mo>←</mml:mo><mml:mo>min⁡</mml:mo><mml:mo>(</mml:mo><mml:mi>W</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>j</mml:mi><mml:mtext>start</mml:mtext></mml:msub><mml:mo>+</mml:mo><mml:mtext>window_size</mml:mtext><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></p></list-item>
    <list-item>
      <p id="d2e1713" specific-use="STATE"><inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:msub><mml:mi>j</mml:mi><mml:mtext>start</mml:mtext></mml:msub><mml:mo>←</mml:mo><mml:mo>max⁡</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:msub><mml:mi>j</mml:mi><mml:mtext>end</mml:mtext></mml:msub><mml:mo>-</mml:mo><mml:mtext>window_size</mml:mtext><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> {Adjust if at right edge}</p></list-item>
    <list-item>
      <p id="d2e1750" specific-use="STATE">/* Extract past context and already predicted tokens */</p></list-item>
    <list-item>
      <p id="d2e1755" specific-use="STATE"><inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:mtext>past_tokens</mml:mtext><mml:mo>←</mml:mo><mml:mtext>flatten</mml:mtext><mml:mo>(</mml:mo><mml:mtext>input_indices</mml:mtext><mml:mo>[</mml:mo><mml:mo>:</mml:mo><mml:mo>,</mml:mo><mml:mo>:</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mtext>start</mml:mtext></mml:msub><mml:mo>:</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mtext>end</mml:mtext></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>j</mml:mi><mml:mtext>start</mml:mtext></mml:msub><mml:mo>:</mml:mo><mml:msub><mml:mi>j</mml:mi><mml:mtext>end</mml:mtext></mml:msub><mml:mo>]</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></p></list-item>
    <list-item>
      <p id="d2e1815" specific-use="STATE"><inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:mtext>pred_patch</mml:mtext><mml:mo>←</mml:mo><mml:mtext>predicted_indices</mml:mtext><mml:mo>[</mml:mo><mml:mo>:</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mtext>start</mml:mtext></mml:msub><mml:mo>:</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mtext>end</mml:mtext></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>j</mml:mi><mml:mtext>start</mml:mtext></mml:msub><mml:mo>:</mml:mo><mml:msub><mml:mi>j</mml:mi><mml:mtext>end</mml:mtext></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></p></list-item>
    <list-item>
      <p id="d2e1865" specific-use="STATE"><inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:msub><mml:mtext>window_pos</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo>←</mml:mo><mml:mi>i</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mtext>start</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula></p></list-item>
    <list-item>
      <p id="d2e1891" specific-use="STATE"><inline-formula><mml:math id="M67" display="inline"><mml:mrow><mml:msub><mml:mtext>window_pos</mml:mtext><mml:mi>j</mml:mi></mml:msub><mml:mo>←</mml:mo><mml:mi>j</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi>j</mml:mi><mml:mtext>start</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula></p></list-item>
    <list-item>
      <p id="d2e1916" specific-use="STATE"><inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:mtext>tokens_count</mml:mtext><mml:mo>←</mml:mo><mml:msub><mml:mtext>window_pos</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo>×</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>j</mml:mi><mml:mtext>end</mml:mtext></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>j</mml:mi><mml:mtext>start</mml:mtext></mml:msub><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mtext>window_pos</mml:mtext><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></p></list-item>
    <list-item>
      <p id="d2e1959" specific-use="STATE"><inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:mtext>pred_tokens</mml:mtext><mml:mo>←</mml:mo><mml:mtext>first </mml:mtext><mml:mtext>tokens_count</mml:mtext><mml:mtext> elements from flattened</mml:mtext></mml:mrow></mml:math></inline-formula>
pred_patch</p></list-item>
    <list-item>
      <p id="d2e1979" specific-use="STATE">/* Build context and predict next token */</p></list-item>
    <list-item>
      <p id="d2e1984" specific-use="STATE"><inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:mtext>context</mml:mtext><mml:mo>←</mml:mo><mml:mtext>concatenate</mml:mtext><mml:mo>(</mml:mo><mml:mtext>conditioning</mml:mtext><mml:mo>,</mml:mo><mml:mtext>past_tokens</mml:mtext><mml:mo>,</mml:mo></mml:mrow></mml:math></inline-formula>
<inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:mtext>pred_tokens</mml:mtext><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></p></list-item>
    <list-item>
      <p id="d2e2019" specific-use="STATE"><inline-formula><mml:math id="M72" display="inline"><mml:mrow><mml:mtext>next_token</mml:mtext><mml:mo>←</mml:mo><mml:mtext>predict_next_index</mml:mtext><mml:mo>(</mml:mo><mml:mtext>context</mml:mtext><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></p></list-item>
    <list-item>
      <p id="d2e2041" specific-use="STATE"><inline-formula><mml:math id="M73" display="inline"><mml:mrow><mml:mtext>predicted_indices</mml:mtext><mml:mo>[</mml:mo><mml:mo>:</mml:mo><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>]</mml:mo><mml:mo>←</mml:mo><mml:mtext>next_token.squeeze()</mml:mtext></mml:mrow></mml:math></inline-formula> {Fix shape mismatch}</p></list-item></list></p></list-item>
    <list-item>
      <p id="d2e2072" specific-use="ENDFOR"><bold>end</bold> <bold>for</bold></p></list-item></list></p>
            </list-item>

    <list-item>

      <p id="d2e2082" specific-use="ENDFOR"><bold>end</bold> <bold>for</bold></p>
            </list-item>

    <list-item>

      <p id="d2e2092" specific-use="RETURN"><bold>return</bold>  predicted_indices</p>
            </list-item>
          </list></disp-quote></boxed-text>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Dataset</title>
      <p id="d2e2109">The dataset we propose for the study is the radar reflectivity composite produced by the Hydrometeorlogical Service of the Regional Agency for the Environment and Energy of Emilia-Romagna Region in northern Italy (Arpae Emilia-Romagna). The agency operates two dual-polarization C-band radars in the area of the Po Valley, located in Gattatico (<inline-formula><mml:math id="M74" display="inline"><mml:mrow><mml:mn mathvariant="normal">44</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:msup><mml:mn mathvariant="normal">47</mml:mn><mml:mo>′</mml:mo></mml:msup><mml:msup><mml:mn mathvariant="normal">27</mml:mn><mml:mrow><mml:mo>′</mml:mo><mml:mo>′</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> N, <inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:msup><mml:mn mathvariant="normal">29</mml:mn><mml:mo>′</mml:mo></mml:msup><mml:msup><mml:mn mathvariant="normal">54</mml:mn><mml:mrow><mml:mo>′</mml:mo><mml:mo>′</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> E) and San Pietro Capofiume (<inline-formula><mml:math id="M76" display="inline"><mml:mrow><mml:mn mathvariant="normal">44</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:msup><mml:mn mathvariant="normal">39</mml:mn><mml:mo>′</mml:mo></mml:msup><mml:msup><mml:mn mathvariant="normal">19</mml:mn><mml:mrow><mml:mo>′</mml:mo><mml:mo>′</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> N, <inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:mn mathvariant="normal">11</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:msup><mml:mn mathvariant="normal">37</mml:mn><mml:mo>′</mml:mo></mml:msup><mml:msup><mml:mn mathvariant="normal">23</mml:mn><mml:mrow><mml:mo>′</mml:mo><mml:mo>′</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> E), respectively. The scanning strategy allows coverage of the entire region every 5 min.  The area is characterized by a complex morphology, and it spans from the flat basin of the Po Valley in the north to the upper Apennines in the south and from the Ligurian coast in the west to the Adriatic Sea in the east. For the purpose of this work, scans with a radius of 125 km were chosen with a total coverage of 71 172 km<sup>2</sup>, summarized in Fig. <xref ref-type="fig" rid="F5"/>.</p>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e2218">Extent of the dataset. Effective coverage is the composite of the 125 km range of the Gattatico and San Pietro Capofiume radars (green area). The hatched area is the Emilia-Romagna region.</p></caption>
        <graphic xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025-f05.png"/>

      </fig>

      <p id="d2e2227">Arpae fully manages both the radar acquisition strategy and the data processing pipeline, including several stages of data quality control and error correction developed to reduce the effect of topographical beam blockage, ground clutter, and anomalous propagation  <xref ref-type="bibr" rid="bib1.bibx13" id="paren.29"/>. Specific corrections are applied over the vertical reflectivity profile to improve precipitation estimates at the ground level <xref ref-type="bibr" rid="bib1.bibx14" id="paren.30"/>. While these quality controls mitigate major issues, residual errors inherent to radar measurements are still present, also affecting the corresponding quantitative precipitation estimation (QPE). No rain gauge correction is applied given the challenges of reconciling the two sources at the short integration time of 5 min.</p>
      <p id="d2e2237">The resulting product is a 2D reflectivity composite map on a <inline-formula><mml:math id="M79" display="inline"><mml:mrow><mml:mn mathvariant="normal">290</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">373</mml:mn></mml:mrow></mml:math></inline-formula> km grid at a resolution of 1 km<sup>2</sup> per pixel, with a time step of 5 min. The data are provided in units of dBZ (reflectivity factor), with original values ranging from <inline-formula><mml:math id="M81" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>20 to 60 dBZ. To further minimize the presence of spurious echoes and drizzle, the reflectivity values are clipped between the range of 0 and 60 dBZ, where 0 dBZ represents no precipitation and 60 dBZ represents a rain rate of 205 mm h<sup>−1</sup> (the radar saturation point). The conversion from dBZ to rain rate is done by applying the standard Marshall–Palmer <inline-formula><mml:math id="M83" display="inline"><mml:mi>Z</mml:mi></mml:math></inline-formula>-<inline-formula><mml:math id="M84" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx30" id="paren.31"/> transformation with parameters <inline-formula><mml:math id="M85" display="inline"><mml:mrow><mml:mi>a</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">200</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M86" display="inline"><mml:mrow><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1.6</mml:mn></mml:mrow></mml:math></inline-formula>.</p>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Data selection, preprocessing, and augmentation</title>
      <p id="d2e2329">For the purposes of our study, we extract all contiguous precipitating sequences in the 6 years between 2015 and 2020. Non-precipitating sequences are discarded, resulting in the selection of 179 264 time steps out of 630 720 (71.5 % of the data is discarded). Specifically, we remove all time steps where the average precipitation over all pixels in the entire domain is less than 0.01 mm h<sup>−1</sup> for at least 1 h. The remaining sequences are retained only if they form a contiguous sequence of at least 3 h. This focus on precipitating events aims to concentrate the model's learning on the complex dynamics of precipitation itself. The handling of non-precipitating inputs, which are common in operational scenarios, is discussed further in Sect. <xref ref-type="sec" rid="Ch1.S5"/> and addressed empirically in Sect. <xref ref-type="sec" rid="Ch1.S4.SS2.SSS5"/>, where we test the model's behavior with entirely non-precipitating synthetic inputs.</p>
      <p id="d2e2348">The precipitating sequences are divided between training, validation, and test sets, and the data values are preprocessed by rounding the values to the first decimal digit, resulting in an effective dynamic range of 601 values (from 0 to 60 with a 0.1 step) per pixel.</p>
      <p id="d2e2351">We prepare two test sets, one for the testing of the spatial tokenizer and one for the testing of the forecaster. To test the spatial tokenizer, we isolate all time steps belonging to the days in the years 2019 and 2020 where extreme events happened by analyzing historical weather reports, resulting in a total of 21 871 radar images (time steps). We call this the <italic>Tokenizer Test Set</italic> (TTS). To test the forecaster, we follow the same validation approach of <xref ref-type="bibr" rid="bib1.bibx33" id="text.32"/>, and we extract out of the TTS 10 sequences of 12 h each representative of the most relevant events. This 120 h subset, namely the <italic>Forecaster Test Set</italic> (FTS), is used for the testing of the forecaster.</p>
      <p id="d2e2363">The remaining sequences are randomly divided between training and validation, with the following final result: 149 524 steps for training, 7869 steps for validation, and 21 871 steps for the TTS including 1450 steps (12 h <inline-formula><mml:math id="M88" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 10 events) of the FTS. To further increase the training dataset size and promote generalization, we apply random cropping, random 90° rotation, and flipping to the training dataset during the training phase. The primary motivation for this augmentation strategy is pragmatic: to increase the effective size and variability of the training dataset and, crucially, to mitigate overfitting. We observed, particularly for the larger GPTCast-16x16 model, that training without augmentation led to overfitting on the validation set relatively early. Introducing these random transformations allows significantly longer training periods, improving the model's generalization by encouraging invariance to the orientation of precipitation features.</p>
      <p id="d2e2374">We acknowledge that this approach has trade-offs. By making the dataset invariant to orientation, we prevent the model from explicitly learning geographically fixed patterns, such as precipitation enhancement due to specific orography or effects related to dominant wind directions within the fixed geographical domain. We do not provide additional contextual information (e.g., topography, large-scale wind fields) to the model, partly to maintain a fair comparison with the baseline extrapolation methods (introduced in Sect. <xref ref-type="sec" rid="Ch1.S4.SS2.SSS1"/>), which also operate solely on the precipitation fields. The chosen augmentation strategy therefore prioritizes learning the inherent dynamics, structure, and evolution of precipitation patterns themselves, aiming for a model that generalizes well to these dynamics regardless of their orientation within the frame, at the expense of capturing location-specific effects.</p>
      <p id="d2e2379">Table <xref ref-type="table" rid="T2"/> summarizes the resulting dataset characteristics.</p>

<table-wrap id="T2" specific-use="star"><label>Table 2</label><caption><p id="d2e2387">Summary of dataset characteristics.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="2">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Attribute</oasis:entry>
         <oasis:entry colname="col2">Details</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Product description</oasis:entry>
         <oasis:entry colname="col2">Arpae radar reflectivity composite (northern Italy)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Map size</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:mn mathvariant="normal">290</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">373</mml:mn></mml:mrow></mml:math></inline-formula> pixels</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Pixel size</oasis:entry>
         <oasis:entry colname="col2">1 km resolution</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Time step</oasis:entry>
         <oasis:entry colname="col2">5 min</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Reflectivity range</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M90" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>20 to 60 dBZ (clipped to 0–60 dBZ, 0.1 step <inline-formula><mml:math id="M91" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 601 values of dynamic range)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Date range</oasis:entry>
         <oasis:entry colname="col2">Precipitation sequences in the years 2015–2020</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Dataset size</oasis:entry>
         <oasis:entry colname="col2">630 720 total time steps (179 264 precipitating time steps selected)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Training and validation</oasis:entry>
         <oasis:entry colname="col2">149 524 time steps for training, 7869 for validation</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Test datasets</oasis:entry>
         <oasis:entry colname="col2">TTS: 21 871 time steps; FTS: 1450 time steps (10 events of 12 h)</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Results</title>
      <p id="d2e2529">Before presenting the quantitative and qualitative results, we clarify the roles of the different data subsets used throughout model development and evaluation.</p>
      <p id="d2e2532">All model development, hyperparameter tuning, and selection processes were performed using only the training and validation sets. This includes the selection of the final VQGAN tokenizer architecture (based on reconstruction fidelity and downstream performance on the validation set, comparing MAE and MWAE variants) and the selection of the best-performing GPTCast forecaster checkpoint (based on metrics evaluated exclusively on the validation set).</p>
      <p id="d2e2535">The two test sets (FTS and TTS) were used for the final evaluation presented in the following sections, after all model architectures and checkpoints were finalized based on validation performance.  To further assess generalization to truly independent data beyond the scope of the original dataset, we also present an evaluation on a separate, out-of-distribution dataset over Germany in Sect. <xref ref-type="sec" rid="Ch1.S4.SS2.SSS4"/>.</p>
      <p id="d2e2540">We analyze the performance of our model at two stages: firstly, we analyze the amount of information loss introduced by the data compression in the tokenizer, and then we analyze the performance of GPTCast as a whole for the nowcasting of precipitation up to 2 h in the future. All scores and measures in the Results section are computed on rain rate values (after applying <inline-formula><mml:math id="M92" display="inline"><mml:mi>Z</mml:mi></mml:math></inline-formula>-<inline-formula><mml:math id="M93" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula> conversion).</p>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Spatial tokenizer reconstruction performance</title>
      <p id="d2e2565">Given the high compression ratio that we introduce in the VQGAN, it is crucial to understand how much and what type of information is lost during the compression and discretization step operated by the tokenizer. Depending on the nature of the information loss, certain phenomena may be completely lost, and this can compromise the ability of the transformer to learn and forecast some precipitation dynamics (e.g., extreme events). The new MWAE loss introduced in Sect. <xref ref-type="sec" rid="Ch1.S2.SS1"/> is specifically built to improve the reconstruction performance of the tokenizer and reach a good level of data reconstruction while maintaining a high compression factor.</p>
      <p id="d2e2570">Table <xref ref-type="table" rid="T3"/> shows the performance in reconstruction ability on the TTS between a VQGAN trained using as reconstruction loss a standard mean absolute error (MAE) and using our proposed MWAE loss. We consider both global regression scores, such as the mean absolute error (MAE), the mean squared error (MSE), and the structural similarity index measure (SSIM;  <xref ref-type="bibr" rid="bib1.bibx50" id="altparen.33"/>), along with categorical scores computed by thresholding the precipitation at multiple rain rates (1, 10 and 50 mm h<sup>−1</sup>), such as the critical success index (CSI) and the frequency bias (BIAS).</p>

<table-wrap id="T3" specific-use="star"><label>Table 3</label><caption><p id="d2e2593">Reconstruction performance on the TTS of VQGAN trained with mean absolute error (MAE) loss and with our proposed MWAE loss. <inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mo>↓</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> means lower is better, and <inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mo>↑</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> means higher is better; for frequency bias (BIAS), closer to 1 is better. The best model is in bold.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="7">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Model/performance</oasis:entry>
         <oasis:entry colname="col2">MAE <inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mo>↓</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">RMSE <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mo>↓</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">SSIM <inline-formula><mml:math id="M99" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mo>↑</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5">CSI <inline-formula><mml:math id="M100" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mo>↑</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/BIAS @ 1 mm<sup>−<italic>h</italic></sup></oasis:entry>
         <oasis:entry colname="col6">CSI/BIAS @ 10 mm<sup>−<italic>h</italic></sup></oasis:entry>
         <oasis:entry colname="col7">CSI/BIAS @ 50 mm<sup>−<italic>h</italic></sup></oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">VQGAN MWAE</oasis:entry>
         <oasis:entry colname="col2"><bold>0.204</bold></oasis:entry>
         <oasis:entry colname="col3"><bold>2.02</bold></oasis:entry>
         <oasis:entry colname="col4"><bold>0.988</bold></oasis:entry>
         <oasis:entry colname="col5"><bold>0.81</bold>/<bold>1.03</bold></oasis:entry>
         <oasis:entry colname="col6"><bold>0.56</bold>/<bold>0.94</bold></oasis:entry>
         <oasis:entry colname="col7"><bold>0.44</bold>/ <bold>0.92</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">VQGAN MAE</oasis:entry>
         <oasis:entry colname="col2">0.265</oasis:entry>
         <oasis:entry colname="col3">2.66</oasis:entry>
         <oasis:entry colname="col4">0.981</oasis:entry>
         <oasis:entry colname="col5">0.74/0.93</oasis:entry>
         <oasis:entry colname="col6">0.38/0.62</oasis:entry>
         <oasis:entry colname="col7">0.13/0.22</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e2815">The autoencoder trained with MWAE shows significant improvements over all the considered metrics, but it is crucial to notice that the improvements are more pronounced for higher rain rates, whose frequency is almost precisely reconstructed by the autoencoder. This is clearly visible in the improvements in BIAS at 50 mm h<sup>−1</sup>, which is defined as the fraction between the number of pixels in the input image over 50 mm h<sup>−1</sup> and the number of pixels that surpass the same threshold in the reconstruction, where we obtain a jump in performance from 0.22 to 0.92 (where 0 is total underestimation, 1 is the perfect score, and greater than 1 is overestimation).</p>
      <p id="d2e2842">The recovery in frequency is also confirmed by analyzing the radially averaged power spectral density (i.e., the amount of energy) of the input and reconstruction: as shown in Fig. <xref ref-type="fig" rid="F6"/>, the average power spectra of the MWAE autoencoder closely resemble the input (albeit with an overestimation at the smallest wavelengths), while the standard autoencoder distribution is constantly shifted and underestimated at all wavelengths.</p>

      <fig id="F6" specific-use="star"><label>Figure 6</label><caption><p id="d2e2849">Comparison of radially averaged power spectral density reconstruction performance by adopting the MWAE loss function compared to MAE. The adoption of MWAE improves the ability of the autoencoder to reproduce the energy distribution of precipitation at all wavelengths.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025-f06.png"/>

        </fig>

      <p id="d2e2858">Improvement in CSI score is also significant (at 50 mm h<sup>−1</sup>, more than 3 times higher), albeit not as thorough as the frequency recovery. This implies that the remaining source of error is that the reconstructed precipitation fields have either a different structure or a different location when compared to the input (i.e., the amounts of the reconstructed precipitation are correct but misplaced at the spatial level).</p>
      <p id="d2e2873">To better characterize this remaining source of error, we compute the SAL measure  <xref ref-type="bibr" rid="bib1.bibx52 bib1.bibx53" id="paren.34"/>, which evaluates three key aspects of the precipitation field within a specified domain: structure (S), amplitude (A), and location (L). The amplitude component (A) measures the relative deviation of the domain-averaged reconstructed precipitation amount from the input. Positive values indicate an overestimation of total precipitation, while negative values indicate an underestimation. The structure component (S) assesses the shape and size of predicted precipitation areas. Positive values occur when these areas are too large or too flat, while negative values indicate that they are too small or too peaked. The location component (L) evaluates the accuracy of the predicted location of precipitation. It combines information about the displacement of the reconstructed precipitation field’s center of mass compared to the input and the error in the weighted average distance of the precipitation objects from the center of the total field. Perfect forecasts result in zero values for all three components, indicating no deviation between input and reconstructed precipitation patterns.</p>

      <fig id="F7" specific-use="star"><label>Figure 7</label><caption><p id="d2e2882">Structure, amplitude, and location (SAL) plot that compares the performance of the MAE and MWAE autoencoders. Each dot on the plot represents the scores of one image in the TTS. Structure and amplitude are plotted on the horizontal and vertical axes, respectively, while the location component is represented by the color. The dashed vertical and horizontal lines indicate the median values of the structure (S) and amplitude (A) scores, respectively. The rectangular box represents the area between the 25th and 75th percentiles (i.e., the vertical and horizontal sides of the box contain 50 % of the points). The numbers on the top right show the mean absolute values.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025-f07.png"/>

        </fig>

      <p id="d2e2891">The SAL analysis plot for both autoencoders is shown in Fig. <xref ref-type="fig" rid="F7"/>. The MWAE autoencoder improves over the baseline autoencoder on all scores, with a median value that is close to zero for all three components. A residual source of absolute error remains in the structure component, while both amplitude and location errors are negligible.</p>
      <p id="d2e2896">In summary, divergences in the size and shape of the reconstructed precipitation patterns account for the majority of the error for our new autoencoder, while the locations, frequencies, and energy contents of the precipitation patches are mostly accurate. Overall, this is a good compromise for the nowcasting task, since we can tolerate higher compromises for errors in structure, whereas systematic errors in amplitude, frequency, or location can seriously impair the forecaster's ability to accurately predict the evolutionary dynamics of precipitation. Some qualitative examples of the input and reconstruction from both autoencoders are presented in Fig. <xref ref-type="fig" rid="F8"/>.</p>

      <fig id="F8" specific-use="star"><label>Figure 8</label><caption><p id="d2e2903">Qualitative comparison between precipitation snapshots reconstructed by the VQGAN autoencoder trained with MWAE loss and MAE loss, taken from the TTS. The autoencoder trained with MWAE loss shows a marked improvement in the reconstruction of precipitation, with crucial improvements in the reconstruction of higher rain rates (thunderstorms). </p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025-f08.jpg"/>

        </fig>

      <p id="d2e2912">The last test involves an assessment of the ability of the autoencoder to reconstruct saturation-level inputs. We create a synthetic image with a saturated <inline-formula><mml:math id="M107" display="inline"><mml:mrow><mml:mn mathvariant="normal">64</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">64</mml:mn></mml:mrow></mml:math></inline-formula> km patch of 205 mm h<sup>−1</sup> (60 dBZ) at the center, encode it through the tokenizer, and decode the resulting token map. The reconstruction in Fig. <xref ref-type="fig" rid="F9"/> visually confirms that end-of-scale values are much better represented in the learned codebook of the MWAE autoencoder, which is able to express rain rates up to the saturation level, although not for large extents like the one provided in the input. This limitation is expected due to the absence of such extensive saturated areas in the training data. Consequently, this could potentially affect the model's performance when encountering record-breaking extreme events that might exhibit such large areas of maximum intensity.</p>

      <fig id="F9" specific-use="star"><label>Figure 9</label><caption><p id="d2e2944">Qualitative comparison between precipitation snapshots reconstructed by the VQGAN autoencoder trained with MWAE loss and MAE loss on a synthetic saturated image. The MWAE-trained model can reach saturation-level intensities, although only over small areas.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025-f09.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>GPTCast nowcasting performance</title>
<sec id="Ch1.S4.SS2.SSS1">
  <label>4.2.1</label><title>Baseline model: LINDA</title>
      <p id="d2e2968">We examine and compare GPTCast forecasting performance with that of the Lagrangian INtegro-Difference equation model with Autoregression (LINDA) <xref ref-type="bibr" rid="bib1.bibx35" id="paren.35"/>, the state-of-the-art ensemble nowcasting model included in the pySTEPS package <xref ref-type="bibr" rid="bib1.bibx33" id="paren.36"/>. LINDA is a nowcasting technique intended to provide superior forecast skill in situations with intense localized rainfall compared to other extrapolation methods (S-PROG or STEPS). Extrapolation, S-PROG <xref ref-type="bibr" rid="bib1.bibx40" id="paren.37"/>, STEPS <xref ref-type="bibr" rid="bib1.bibx5" id="paren.38"/>, ANVIL <xref ref-type="bibr" rid="bib1.bibx34" id="paren.39"/>, an integro-difference equation (IDE), and cell tracking techniques <xref ref-type="bibr" rid="bib1.bibx7" id="paren.40"/> are all combined in this model.</p>
</sec>
<sec id="Ch1.S4.SS2.SSS2">
  <label>4.2.2</label><title>Verification scores</title>
      <p id="d2e2998">For verification assessment, we rely on the continuous ranked probability score (CRPS) and the rank histogram, which are essential tools for verifying ensemble forecasts. By showing the frequency of observed values among the forecast ranks, the rank histogram evaluates the dispersion and reliability of ensemble forecasts and highlights biases such as under- or over-dispersion. By comparing the prediction's cumulative distribution function to the actual value, CRPS calculates a numerical score for forecast skill that indicates how accurate a probabilistic forecast is. The two scores complement each other, with the CRPS providing a measure of forecast accuracy as a whole and the rank histogram emphasizing the ensemble spread and reliability.</p>
</sec>
<sec id="Ch1.S4.SS2.SSS3">
  <label>4.2.3</label><title>Performance on the forecast test set</title>
      <p id="d2e3009">We use the FTS for our main performance comparison. Out of the 10 events in FTS, 7 are convective events occurring in spring or summer and 3 are winter precipitation events. For each event, we produce a forecast every 30 min, and each forecast is a 20-member ensemble forecast with 5 min time steps and a maximum lead time of 2 h (i.e., 24 forecasting steps) for both LINDA and GPTCast. This results in a total of 200 forecasts (20 forecasts per event) generated per model. For GPTCast, we test both of the two model configurations, GPTCast-16x16 and GPTCast-8x8.</p>

      <fig id="F10" specific-use="star"><label>Figure 10</label><caption><p id="d2e3014">Continuous ranked probability score (CRPS) comparison of GPTCast and LINDA over the FTS (lower is better) at different lead times.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025-f10.png"/>

          </fig>

      <p id="d2e3023">The CRPS score for each of the three models – LINDA, GPTCast-16x16, and GPTCast-8x8 – is displayed in Fig. <xref ref-type="fig" rid="F10"/>: both variants of GPTCast outperform LINDA across all lead times, with GPTCast-16x16 outperforming all other models. This result clearly shows that the model can learn a more thorough dynamic of the evolution of precipitation patterns when the context size is more spatially extended. It is important to notice that this improvement comes with a non-negligible increase in terms of computational time at inference, which in our experiments was close to 1 order of magnitude (GPTCast-8x8 computes a time step in 2 s compared to 17 s for the larger model on an NVIDIA RTX 4090).</p>
      <p id="d2e3029">Figure <xref ref-type="fig" rid="F11"/> analyzes the rank histogram at different lead times for all three models, including information on the Kullback–Leibler (KL) divergence from the uniform distribution. Both versions of GPTCast provide a better overall score than LINDA, which tends to be under-dispersed, with GPTCast-8x8 being the best model. Moreover, GPTCast-8x8 shows a rank distribution close to optimal up to the first hour, with a KL divergence from the uniform distribution of 0.006 at 60 min lead time (12 steps). GPTCast-16x16 displays an overall better rank histogram than LINDA up to the first 60 min, with a tendency to underestimate that compounds over time: we attribute this behavior to the increased ability of the GPTCast-16x16 to capture the training distribution, which has a higher ratio of dissipating precipitation events than the FTS (which is filtered to contain only extreme events).</p>
      <p id="d2e3034">Figure <xref ref-type="fig" rid="F12"/> shows an example of nowcast for a convective case in the FTS, with two ensemble members and the ensemble mean for both LINDA and GPTCast. GPTCast generates two realistic and diverse forecasts, with an ensemble mean that features a better location accuracy than LINDA compared to the observations.</p>

      <fig id="F11" specific-use="star"><label>Figure 11</label><caption><p id="d2e3041">Rank histogram comparison of GPTCast and LINDA on the FTS. The horizontal gray line represents the ideal value (the closer the better). The numbers in the legend indicate the Kullback–Leibler divergence from the uniform distribution (lower is better). </p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025-f11.png"/>

          </fig>

      <fig id="F12" specific-use="star"><label>Figure 12</label><caption><p id="d2e3052">Example comparison of GPTCast-16x16 and LINDA nowcast on a convective case in the Forecaster Test Set (8 June 2020, 11:00 UTC). The domain is cropped on the central area for visualization convenience.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025-f12.jpg"/>

          </fig>

</sec>
<sec id="Ch1.S4.SS2.SSS4">
  <label>4.2.4</label><title>Out-of-distribution evaluation on German radar data</title>
      <p id="d2e3069">To assess the generalization capability of GPTCast beyond the primary dataset used for training and testing, we perform an additional evaluation on an independent dataset from a different geographical region and source. We utilized the radar dataset over Germany presented alongside RainNet <xref ref-type="bibr" rid="bib1.bibx2" id="paren.41"/>. From the first 150 000 time steps available in this dataset, we selected the 10 cases exhibiting the highest domain-average precipitation to focus on challenging forecasting scenarios.</p>
      <p id="d2e3075">For each selected case, we extracted the central <inline-formula><mml:math id="M109" display="inline"><mml:mrow><mml:mn mathvariant="normal">256</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">256</mml:mn></mml:mrow></mml:math></inline-formula> pixel domain, matching the spatial dimensions used in our primary experiments. We then generated 60 min precipitation forecasts using a 20-member ensemble for both GPTCast (specifically, GPTCast-16x16) and LINDA.</p>
      <p id="d2e3090">The results indicate that GPTCast achieves a lower (better) average CRPS compared to LINDA over these 10 selected cases, suggesting better overall probabilistic forecast skill in this out-of-distribution setting. However, the rank histogram for GPTCast still exhibited a tendency towards lower ranks, consistent with the underestimation characteristic observed in the primary evaluation (Sect. <xref ref-type="sec" rid="Ch1.S4.SS2.SSS3"/>).</p>
      <p id="d2e3095">It is important to interpret these results with caution. Firstly, the evaluation comprises only 10 cases, which limits the statistical significance of the findings. Secondly, as noted by <xref ref-type="bibr" rid="bib1.bibx38" id="text.42"/>, LINDA's performance is often optimized for and excels during high-intensity convective events. The case selection based on domain-average precipitation might not perfectly align with the scenarios where LINDA demonstrates its peak performance relative to other models. Nonetheless, this preliminary out-of-distribution evaluation provides encouraging evidence that the precipitation dynamics learned by GPTCast possess a degree of transferability to different geographical regions and data sources.</p>

      <fig id="F13" specific-use="star"><label>Figure 13</label><caption><p id="d2e3104">CRPS and rank histogram of GPTCast-16x16 and LINDA on 10 precipitation events over central Germany.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025-f13.png"/>

          </fig>

</sec>
<sec id="Ch1.S4.SS2.SSS5">
  <label>4.2.5</label><title>Behavior with non-precipitating input</title>
      <p id="d2e3122">To address the model's behavior when presented with input sequences entirely devoid of precipitation  (a scenario excluded during training),  we conduct an additional experiment using synthetic data. We initialized the GPTCast-16x16 model with an input sequence consisting entirely of zero-value radar reflectivity images (representing “all clear” conditions) across the <inline-formula><mml:math id="M110" display="inline"><mml:mrow><mml:mn mathvariant="normal">256</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">256</mml:mn></mml:mrow></mml:math></inline-formula> pixel domain for the standard 7-time-step context window. We then generated an ensemble forecast of 20 members for the next time step.</p>
      <p id="d2e3137">The results show that most ensemble members correctly predicted continued zero (or near-zero) precipitation, consistent with a persistence forecast expected under such conditions. However, in particular, one ensemble member generates a significant spurious, albeit localized and physically plausible-looking, precipitation pattern. This highlights a potential drawback of the generative nature of the model: the possibility of “hallucinating” precipitation features when initialized with data far outside its training distribution (i.e., entirely empty sequences). While infrequent in this test (1 member out of 20), this behavior warrants consideration for operational deployment and is discussed further in Sect. <xref ref-type="sec" rid="Ch1.S5"/>. Figure <xref ref-type="fig" rid="F14"/> illustrates the behavior of the members and the generated pattern from the deviating ensemble member.</p>

      <fig id="F14" specific-use="star"><label>Figure 14</label><caption><p id="d2e3146">Behavior of the model initialized from a zero-precipitation input sequence of <inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:mn mathvariant="normal">256</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">256</mml:mn></mml:mrow></mml:math></inline-formula> pixel domain. Only 1 out of the 20 ensemble members develops a significant precipitating pattern.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/18/5351/2025/gmd-18-5351-2025-f14.png"/>

          </fig>

</sec>
</sec>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <label>5</label><title>Discussion and future work</title>
<sec id="Ch1.S5.SS1">
  <label>5.1</label><title>Summary and contributions</title>
      <p id="d2e3185">GPTCast introduces a novel approach to ensemble nowcasting of radar-based precipitation, leveraging a GPT model and a specialized spatial tokenizer to produce realistic and accurate ensemble forecasts. We show that this approach can provide reliable forecasts, outperforming the state-of-the-art extrapolation method in both accuracy and uncertainty estimation.</p>
      <p id="d2e3188">GPTCast's deterministic architecture enhances interpretability and reliability by generating realistic ensemble forecasts without random noise inputs. The model can be scaled to different sizes, both in context length and in terms of parameters (which we postponed to future analyses), allowing a balance in the trade-off between accuracy and computational demands and providing flexibility for different operational settings.</p>
      <p id="d2e3191">We believe that our method, by adopting an architecture influenced by large language models (LLMs), paves the way for future promising research in precipitation nowcasting that can incorporate all the improvements and developments from the quickly developing field of LLM research. This includes more efficient architectures, improved training techniques, and better interpretability tools. Such integration can potentially enhance GPTCast's performance, scalability, and usability, ensuring that it remains a state-of-the-art nowcasting tool.</p>
</sec>
<sec id="Ch1.S5.SS2">
  <label>5.2</label><title>Implementation challenges</title>
      <p id="d2e3202">Despite its strengths, the approach poses specific challenges that must be considered for the operational usage of the model.</p>
      <p id="d2e3205">The approach requires the training of two models in cascade, each with its own set of challenges. In our experiments, it was hard to find a stable configuration to train the spatial tokenizer that has to balance multiple competing losses. The MWAE reconstruction loss we introduced helped substantially in terms of both convergence and stability, although at the cost of slower training induced by the smoothing effect of the sigmoid (<inline-formula><mml:math id="M112" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula>) terms in the loss. On the other hand, we found the forecaster to be very stable in training (as expected by transformers) but computationally intensive in inference, especially for the long context configuration (GPTCast-16x16), making its use in a real-time application such as nowcasting challenging without significant resources.</p>
</sec>
<sec id="Ch1.S5.SS3">
  <label>5.3</label><title>Handling non-precipitating conditions and generative artifacts</title>
      <p id="d2e3223">The ability of the model to effectively capture the training distribution is both its main strength and potential pitfall. A key aspect of our training strategy was the exclusion of entirely non-precipitating sequences, representing a significant portion (71.5 %) of the raw data. This decision aimed to focus the model's learning capacity on the core challenge: capturing the complex dynamics of precipitation initiation, evolution, and decay, rather than diluting the learning signal with vast amounts of “all clear” data. Operationally, if the recent radar sequence shows no precipitation, a simple persistence forecast (predicting continued “no precipitation”) is often sufficient and computationally inexpensive for the very short term, making the deployment of a complex model such as GPTCast potentially wasteful in such specific situations. Our training strategy thus aligns with a targeted use case where the model is primarily invoked when precipitation is present or developing.</p>
      <p id="d2e3226">However, this raises the question of how the model behaves when presented with the non-precipitating inputs it might encounter operationally. While the model learns to handle the cessation of precipitation within partly precipitating sequences present in the training data, its behavior on entirely clear inputs was not explicitly trained. Our analysis in Sect. <xref ref-type="sec" rid="Ch1.S4.SS2.SSS5"/>, using synthetic all-zero inputs, showed that, while the model predominantly predicts continued clear conditions as expected, a small fraction of ensemble members (1 out of 20 in our test) can generate spurious precipitation patterns (“hallucinations”). This generative artifact, occurring when the input is significantly outside the training distribution, represents a potential drawback. While infrequent, this highlights the need for caution and potentially post-processing checks if the model were to be deployed in scenarios where it might frequently receive entirely non-precipitating inputs or, alternatively, highlights the need to implement a simple check to bypass the deep learning model when inputs are non-precipitating. Further investigation could explore fine-tuning strategies or architectural modifications to mitigate such behavior, although the current targeted training approach already aligns well with typical operational workflows where nowcasting models are most crucial during active precipitation events. Moreover, strategies exist to exert more control over the generation process during inference and potentially reduce the occurrence of undesirable outcomes. One common technique, adapted from natural language processing, is top-<inline-formula><mml:math id="M113" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> sampling <xref ref-type="bibr" rid="bib1.bibx11 bib1.bibx23" id="paren.43"/>. Instead of sampling from the entire probability distribution over the VQGAN codebook indices predicted by the transformer, top-<inline-formula><mml:math id="M114" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> sampling restricts the selection pool to only the <inline-formula><mml:math id="M115" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> tokens (codebook indices) with the highest predicted probabilities at each step. By filtering out low-probability options, this can make the generated sequences more focused and less likely to contain highly improbable or spurious transitions. However, this comes at the cost of potentially reduced forecast diversity and the risk of suppressing genuinely rare but physically valid meteorological events. Choosing an appropriate value for <inline-formula><mml:math id="M116" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>, or exploring related techniques such as nucleus sampling (top-<inline-formula><mml:math id="M117" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>) <xref ref-type="bibr" rid="bib1.bibx23" id="paren.44"/>, involves a trade-off between forecast creativity/diversity and robustness against potential hallucinations. Further investigation into optimal decoding strategies for precipitation nowcasting with GPTCast, possibly incorporating physical constraints or adaptive sampling methods, remains an area for future research to enhance reliability for operational use.</p>
</sec>
<sec id="Ch1.S5.SS4">
  <label>5.4</label><title>Geographical generalizability</title>
      <p id="d2e3281">A further consideration regarding the generalizability of GPTCast pertains to the geographical scope of the data used for training and primary evaluation. Our main experiments were conducted using radar data covering the Emilia-Romagna region, which possesses distinct topographical features and precipitation characteristics. Consequently, the model's performance might differ when applied to regions with significantly different environments, such as coastal areas or large flat plains, which exhibit distinct precipitation regimes or atmospheric dynamics.</p>
      <p id="d2e3284">To provide an initial assessment of the model's robustness beyond its training domain, we performed an additional evaluation on a completely independent dataset comprising recent precipitation events over Germany, a region with different geographical characteristics (as detailed in Sect. <xref ref-type="sec" rid="Ch1.S4.SS2.SSS4"/>). The promising results obtained in this out-of-distribution setting (Sect. <xref ref-type="sec" rid="Ch1.S4.SS2.SSS4"/>) suggest that GPTCast learns representations of precipitation dynamics that possess some degree of geographical transferability. While these findings are encouraging, they represent only a first step. More extensive validation across a wider variety of geographical regions and climatological conditions would be necessary to fully establish the broad applicability and potential regional biases of the model, representing an important avenue for future research.</p>
</sec>
<sec id="Ch1.S5.SS5">
  <label>5.5</label><title>Inference efficiency and optimization strategies</title>
      <p id="d2e3300">Another important practical consideration for deploying large autoregressive transformer models such as GPTCast in operational settings is their computational cost during inference. While powerful, the attention mechanism and the sheer number of parameters can lead to significant latency and memory requirements. However, the field has developed numerous optimization techniques specifically targeting these challenges, which could be applied to GPTCast to enhance its real-time feasibility.</p>
      <p id="d2e3303">One major advancement is the development of optimized attention algorithms, such as FlashAttention <xref ref-type="bibr" rid="bib1.bibx6" id="paren.45"/>, which reduces the memory footprint and increases the speed of the attention computation by avoiding materialization of the large attention matrix. Furthermore, model quantization techniques <xref ref-type="bibr" rid="bib1.bibx20" id="paren.46"/> can significantly reduce the model size and accelerate inference by representing weights and activations using lower-precision integer formats (e.g., INT8) instead of floating-point numbers, often with minimal impact on predictive performance. Relatedly, inference can be performed using reduced precision formats such as FP8 <xref ref-type="bibr" rid="bib1.bibx24" id="paren.47"/>, which speeds up matrix multiplications on hardware accelerators supporting these formats. For autoregressive generation, efficiently managing the key-value (KV) cache is crucial <xref ref-type="bibr" rid="bib1.bibx32" id="paren.48"/>; techniques optimizing KV cache storage and retrieval avoid redundant computations for previously processed tokens, drastically speeding up the generation of subsequent forecast steps. While the implementation and evaluation of these optimizations are beyond the scope of this initial study, their successful application in other domains suggests that they represent a viable path towards deploying models like GPTCast efficiently in time-critical operational nowcasting workflows.</p>
</sec>
<sec id="Ch1.S5.SS6">
  <label>5.6</label><title>Future work and outlook</title>
      <p id="d2e3326">Finally, in future studies, we also plan to explore the interpretability of the model to control and condition the model for different tasks. The peculiar characteristics of GPTCast open the possibility of guiding the generative process of the model by combining the probabilistic output of the forecaster with the interpretability of the learned codebook in terms of physical quantities. A possibility that we envision is to leverage GPTCast for tasks such as seamless forecasting (a.k.a. blending), generation of what-if scenarios, forecast conditioning, weather generation, and observation correction capabilities.</p>
</sec>
</sec>

      
      </body>
    <back><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d2e3334">Data are from Arpae Emilia-Romagna. The full, preprocessed dataset used for the presented experiments is available on Zenodo (<ext-link xlink:href="https://doi.org/10.5281/zenodo.13692016" ext-link-type="DOI">10.5281/zenodo.13692016</ext-link>; <xref ref-type="bibr" rid="bib1.bibx16" id="altparen.49"/>), including the generated ensemble forecasts to reproduce the verification scores. The pre-trained models are available on Zenodo  (<ext-link xlink:href="https://doi.org/10.5281/zenodo.13594332" ext-link-type="DOI">10.5281/zenodo.13594332</ext-link>; <xref ref-type="bibr" rid="bib1.bibx18" id="altparen.50"/>). A dedicated GitHub repository (<uri>https://github.com/DSIP-FBK/GPTCast</uri> (last access: 20 August 2025) hosts the PyTorch Lightning  (<ext-link xlink:href="https://doi.org/10.5281/zenodo.3828935" ext-link-type="DOI">10.5281/zenodo.3828935</ext-link>; <xref ref-type="bibr" rid="bib1.bibx10" id="altparen.51"/>) code of the models described in this paper, based on the Lightning-Hydra-Template (<uri>https://github.com/facebookresearch/hydra</uri>; <xref ref-type="bibr" rid="bib1.bibx56" id="altparen.52"/>), licensed under the MIT License. The repository also hosts the code to reproduce the images shown in this paper. GPTCast v1.0 GitHub release is archived on Zenodo (<ext-link xlink:href="https://doi.org/10.5281/zenodo.13832526" ext-link-type="DOI">10.5281/zenodo.13832526</ext-link>; <xref ref-type="bibr" rid="bib1.bibx17" id="altparen.53"/>) and allows users to download the code to reproduce the presented experiments.</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e3374">GF conceived and conceptualized the study, designed the GPTCast architecture, implemented the code, and ran the experiments. GF and ET performed the analysis and verification of the results and wrote the article. VP, CC, and PPA provided the data and performed the data extraction, data selection, and data quality control. RW performed data format conversion. All authors revised the results and reviewed the article. MC supervised the study from end to end.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e3380">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e3386">Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e3392">We acknowledge CINECA Consortium for providing the GPU resources for training and running the experiments presented in this study.</p></ack><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e3397">This paper was edited by David Topping and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Agrawal et al.(2019)Agrawal, Barrington, Bromberg, Burge, Gazen, and Hickey</label><mixed-citation>Agrawal, S., Barrington, L., Bromberg, C., Burge, J., Gazen, C., and Hickey, J.: Machine Learning for Precipitation Nowcasting from Radar Images, CoRR, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.1912.12132" ext-link-type="DOI">10.48550/arXiv.1912.12132</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Ayzel et al.(2020)Ayzel, Scheffer, and Heistermann</label><mixed-citation>Ayzel, G., Scheffer, T., and Heistermann, M.: RainNet v1.0: a convolutional neural network for radar-based precipitation nowcasting, Geosci. Model Dev., 13, 2631–2644, <ext-link xlink:href="https://doi.org/10.5194/gmd-13-2631-2020" ext-link-type="DOI">10.5194/gmd-13-2631-2020</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Bellon and Austin(1978)</label><mixed-citation> Bellon, A. and Austin, G. L.: The evaluation of two years of real-time operation of a short-term precipitation forecasting procedure (SHARP), J. Appl. Meteorol., 17, 1778–1787, 1978.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Bojinski et al.(2023)Bojinski, Blaauboer, Calbet, de Coning, Debie, Montmerle, Nietosvaara, Norman, Bañón Peregrín, Schmid, Strelec Mahović, and Wapler</label><mixed-citation>Bojinski, S., Blaauboer, D., Calbet, X., de Coning, E., Debie, F., Montmerle, T., Nietosvaara, V., Norman, K., Bañón Peregrín, L., Schmid, F., Strelec Mahović, N., and Wapler, K.: Towards nowcasting in Europe in 2030, Meteorol. Appl., 30, e2124, <ext-link xlink:href="https://doi.org/10.1002/met.2124" ext-link-type="DOI">10.1002/met.2124</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Bowler et al.(2006)Bowler, Pierce, and Seed</label><mixed-citation>Bowler, N. E., Pierce, C. E., and Seed, A. W.: STEPS: A probabilistic precipitation forecasting scheme which merges an extrapolation nowcast with downscaled NWP, Q. J. Roy. Meteor. Soc., 132, 2127–2155, <ext-link xlink:href="https://doi.org/10.1256/qj.04.100" ext-link-type="DOI">10.1256/qj.04.100</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Dao et al.(2022)Dao, Fu, Ermon, Rudra, and Ré</label><mixed-citation>Dao, T., Fu, D. Y., Ermon, S., Rudra, A., and Ré, C.: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, Advances in Neural Information Processing Systems (NeurIPS), <uri>https://dl.acm.org/doi/10.5555/3600270.3601459</uri> (last access: 20 August 2025), 2022.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Dixon and Wiener(1993)</label><mixed-citation> Dixon, M. and Wiener, G.: TITAN: Thunderstorm Identification, Tracking, Analysis, and Nowcasting – A radar-based methodology, J. Atmos. Ocean. Tech., 10, 785–797, 1993.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Dosovitskiy et al.(2020)Dosovitskiy, Beyer, Kolesnikov, Weissenborn, Zhai, Unterthiner, Dehghani, Minderer, Heigold, Gelly et al.</label><mixed-citation>Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.,  Uszkoreit, J., and Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.2010.11929" ext-link-type="DOI">10.48550/arXiv.2010.11929</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Esser et al.(2021)Esser, Rombach, and Ommer</label><mixed-citation>Esser, P., Rombach, R., and Ommer, B.: Taming transformers for high-resolution image synthesis, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Nashville, TN, USA, 20–25 June 2021, 12873–12883,  <ext-link xlink:href="https://doi.org/10.1109/CVPR46437.2021.01268" ext-link-type="DOI">10.1109/CVPR46437.2021.01268</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Falcon and The PyTorch Lightning team(2019)</label><mixed-citation>Falcon, W. and The PyTorch Lightning team: PyTorch Lightning, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.3828935" ext-link-type="DOI">10.5281/zenodo.3828935</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Fan et al.(2018)Fan, Lewis, and Dauphin</label><mixed-citation>Fan, A., Lewis, M., and Dauphin, Y.: Hierarchical Neural Story Generation, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), <ext-link xlink:href="https://doi.org/10.18653/v1/P18-1082" ext-link-type="DOI">10.18653/v1/P18-1082</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Foresti et al.(2018)Foresti, Sideris, Panziera, Nerini, and Germann</label><mixed-citation>Foresti, L., Sideris, I. V., Panziera, L., Nerini, D., and Germann, U.: A 10-year radar-based analysis of orographic precipitation growth and decay patterns over the Swiss Alpine region, Q. J. Roy. Meteor. Soc., 144, 2277–2301, <ext-link xlink:href="https://doi.org/10.1002/qj.3364" ext-link-type="DOI">10.1002/qj.3364</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Fornasiero et al.(2006)Fornasiero, Bech, and Alberoni</label><mixed-citation>Fornasiero, A., Bech, J., and Alberoni, P. P.: Enhanced radar precipitation estimates using a combined clutter and beam blockage correction technique, Nat. Hazards Earth Syst. Sci., 6, 697–710, <ext-link xlink:href="https://doi.org/10.5194/nhess-6-697-2006" ext-link-type="DOI">10.5194/nhess-6-697-2006</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Fornasiero et al.(2008)Fornasiero, Amorati, and Alberoni</label><mixed-citation> Fornasiero, A., Amorati, R., and Alberoni, P. P.: Radar Quantitative Precipitation Estimation at Arpa-Sim: A Critical Approach to Retrieve the Rainfall Rate at the Ground Level, in: Proceedings of the 5th European Radar Conference, Helsinki, vol. 30, ISBN 9789516976764, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Franch et al.(2020)Franch, Nerini, Pendesini, Coviello, Jurman, and Furlanello</label><mixed-citation>Franch, G., Nerini, D., Pendesini, M., Coviello, L., Jurman, G., and Furlanello, C.: Precipitation Nowcasting with Orographic Enhanced Stacked Generalization: Improving Deep Learning Predictions on Extreme Events, Atmosphere, 11, 267, <ext-link xlink:href="https://doi.org/10.3390/atmos11030267" ext-link-type="DOI">10.3390/atmos11030267</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Franch et al.(2024a)Franch, Tomasi, Cardinali, Poli, Alberoni, and Cristoforetti</label><mixed-citation>Franch, G., Tomasi, E., Cardinali, C., Poli, V., Alberoni, P. P., and Cristoforetti, M.: Dataset for “GPTCast: a weather language model for precipitation nowcasting”, Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.13692016" ext-link-type="DOI">10.5281/zenodo.13692016</ext-link>, 2024a.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Franch et al.(2024b)Franch, Tomasi, and Cristoforetti</label><mixed-citation>Franch, G., Tomasi, E., and Cristoforetti, M.: Code for “GPTCast: a weather language model for precipitation nowcasting”, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.13832526" ext-link-type="DOI">10.5281/zenodo.13832526</ext-link>, 2024b.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Franch et al.(2024c)Franch, Tomasi, and Cristoforetti</label><mixed-citation>Franch, G., Tomasi, E., and Cristoforetti, M.: Pretrained models for “GPTCast: a weather language model for precipitation nowcasting”, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.13594332" ext-link-type="DOI">10.5281/zenodo.13594332</ext-link>, 2024c.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Gao et al.(2023)Gao, Shi, Han, Wang, Jin, Maddix, Zhu, Li, and Wang</label><mixed-citation> Gao, Z., Shi, X., Han, B., Wang, H., Jin, X., Maddix, D., Zhu, Y., Li, M., and Wang, Y.: PreDiff: precipitation nowcasting with latent diffusion models, in: Proceedings of the 37th International Conference on Neural Information Processing Systems (NeurIPS '23), New Orleans, LA, USA, 10–16 December 2023, Curran Associates, Inc., Red Hook, NY, USA, 3439, 36 pp., ISBN 9781713899921, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Gholami et al.(2021)Gholami, Kim, Dong, Yao, Mahoney, and Keutzer</label><mixed-citation>Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M. W., and Keutzer, K.: A Survey of Quantization Methods for Efficient Neural Network Inference, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.2103.13630" ext-link-type="DOI">10.48550/arXiv.2103.13630</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Goodfellow et al.(2014)Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, and Bengio</label><mixed-citation>Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y.: Generative adversarial nets, Association for Computing Machinery, New York, NY, USA, 139–144, <ext-link xlink:href="https://doi.org/10.1145/3422622" ext-link-type="DOI">10.1145/3422622</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Göber et al.(2023)Göber, Christel, Hoffmann, Mooney, Rodriguez, Becker, Ebert, Fearnley, Fundel, Geiger, Golding, Jeurig, Kelman, Kox, Magro, Perrels, Postigo, Potter, Robbins, Rust, Schoster, Tan, Taylor, and Williams</label><mixed-citation>Göber, M., Christel, I., Hoffmann, D., Mooney, C. J., Rodriguez, L., Becker, N., Ebert, E. E., Fearnley, C., Fundel, V. J., Geiger, T., Golding, B., Jeurig, J., Kelman, I., Kox, T., Magro, F.-A., Perrels, A., Postigo, J. C., Potter, S. H., Robbins, J., Rust, H., Schoster, D., Tan, M. L., Taylor, A., and Williams, H.: Enhancing the Value of Weather and Climate Services in Society: Identified Gaps and Needs as Outcomes of the First WMO WWRP/SERA Weather and Society Conference, B. Am. Meteor. Soc., 104, E645–E651, <ext-link xlink:href="https://doi.org/10.1175/BAMS-D-22-0199.1" ext-link-type="DOI">10.1175/BAMS-D-22-0199.1</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Holtzman et al.(2020)Holtzman, Buys, Du, Forbes, and Choi</label><mixed-citation> Holtzman, A., Buys, J., Du, L., Forbes, M., and Choi, Y.: The Curious Case of Neural Text Degeneration, in: International Conference on Learning Representations (ICLR), ISBN 979-8-3313-2198-7, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Kuzmin et al.(2022)Kuzmin, Van Baalen, Ren, Nagel, Peters, and Blankevoort</label><mixed-citation> Kuzmin, A., Van Baalen, M., Ren, Y., Nagel, M., Peters, J., and Blankevoort, T.: FP8 quantization: the power of the exponent, in: Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS '22), New Orleans, LA, USA, 28 November–9 December 2022, Curran Associates, Inc., Red Hook, NY, USA, 1065, 12 pp., ISBN 9781713871088, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Lam et al.(2023)</label><mixed-citation>Lam, R., Pascanu, R., Puigdomènech Gimenez, M., Agrawal, S., Dapogny, C., Schmidt, M., Keck, T., Mudigonda, M., Brutlag, P., Wang, J., Chantry, M., Norman, C., Dudhia, A., Clark, R., Otte, N., Tirilly, P., Wiklendt, S., Zimmer, A., Merose, A., Petersen, S., Visram, R., Valter, D., Hess, F., See, A., Fritz, F., Bodin, T., Untema, B., Thurman, R., Targett, P., Ravenscroft, A., McGuire, P., Kabra, M., Keeling, J., Gopal, A., Cheng, H., Piotrowski, T., Battaglia, P., Kohli, P., Heess, N., and Hassabis, D.: GraphCast: AI model for faster and more accurate global weather forecasting, Science, 382, 1416–1421, <ext-link xlink:href="https://doi.org/10.1126/science.adi2336" ext-link-type="DOI">10.1126/science.adi2336</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Lang et al.(2024)</label><mixed-citation>Lang, S., Alexe, M., Chantry, M., Dramsch, J., Pinault, F., Raoult, B., Clare, M. C. A., Lessig, C., Maier-Gerber, M., Magnusson, L., Ben Bouallègue, Z., Prieto Nemesio, A., Dueben, P. D., Brown, A., Pappenberger, F., and Rabier, F.: AIFS-ECMWF's data-driven forecasting system, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.2406.01465" ext-link-type="DOI">10.48550/arXiv.2406.01465</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Leinonen et al.(2023)Leinonen, Hamann, Nerini, Germann, and Franch</label><mixed-citation>Leinonen, J., Hamann, U., Nerini, D., Germann, U., and Franch, G.: Latent diffusion models for generative precipitation nowcasting with accurate uncertainty quantification,  arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.2304.12891" ext-link-type="DOI">10.48550/arXiv.2304.12891</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Lessig et al.(2023)Lessig, Luise, Gong, Langguth, Stadler, and Schultz</label><mixed-citation>Lessig, C., Luise, I., Gong, B., Langguth, M., Stadler, S., and Schultz, M.: AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.2308.13280" ext-link-type="DOI">10.48550/arXiv.2308.13280</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Liu et al.(2021)Liu, Lin, Cao, Hu, Wei, Zhang, Lin, and Guo</label><mixed-citation>Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF international conference on computer vision, 11–17 October 2021, 10012–10022, <ext-link xlink:href="https://doi.org/10.1109/ICCV48922.2021.00986" ext-link-type="DOI">10.1109/ICCV48922.2021.00986</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Marshall and Palmer(1948)</label><mixed-citation>Marshall, J. S. and Palmer, W. M. K.: The distribution of raindrops with size, J. Atmos. Sci., 5, 165–166, <ext-link xlink:href="https://doi.org/10.1175/1520-0469(1948)005&lt;0165:TDORWS&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0469(1948)005&lt;0165:TDORWS&gt;2.0.CO;2</ext-link>, 1948.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Panziera et al.(2011)Panziera, Germann, Gabella, and Mandapaka</label><mixed-citation>Panziera, L., Germann, U., Gabella, M., and Mandapaka, P. V.: NORA – Nowcasting of Orographic Rainfall by means of Analogues, Q. J. Roy. Meteor. Soc., 137, 2106–2123, <ext-link xlink:href="https://doi.org/10.1002/qj.878" ext-link-type="DOI">10.1002/qj.878</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Pope et al.(2023)Pope, Douglas, Chowdhery, Devlin, Bradbury, Levskaya, Heek, Xiao, Agrawal, and Dean</label><mixed-citation>Pope, R., Douglas, S., Chowdhery, A., Devlin, J., Bradbury, J., Levskaya, A., Heek, J., Xiao, K., Agrawal, S., and Dean, J.: Efficiently scaling transformer inference, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.2211.05102" ext-link-type="DOI">10.48550/arXiv.2211.05102</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Pulkkinen et al.(2019)Pulkkinen, Nerini, Pérez Hortal, Velasco-Forero, Seed, Germann, and Foresti</label><mixed-citation>Pulkkinen, S., Nerini, D., Pérez Hortal, A. A., Velasco-Forero, C., Seed, A., Germann, U., and Foresti, L.: Pysteps: an open-source Python library for probabilistic precipitation nowcasting (v1.0), Geosci. Model Dev., 12, 4185–4219, <ext-link xlink:href="https://doi.org/10.5194/gmd-12-4185-2019" ext-link-type="DOI">10.5194/gmd-12-4185-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Pulkkinen et al.(2020)Pulkkinen, Chandrasekar, von Lerber, and Harri</label><mixed-citation>Pulkkinen, S., Chandrasekar, V., von Lerber, A., and Harri, A.-M.: Nowcasting of Convective Rainfall Using Volumetric Radar Observations, IEEE T. Geosci.  Remote S., 58, 7845–7859, <ext-link xlink:href="https://doi.org/10.1109/TGRS.2020.2984594" ext-link-type="DOI">10.1109/TGRS.2020.2984594</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Pulkkinen et al.(2021)Pulkkinen, Chandrasekar, and Niemi</label><mixed-citation>Pulkkinen, S., Chandrasekar, V., and Niemi, T.: Lagrangian Integro-Difference Equation Model for Precipitation Nowcasting, J. Atmos. Ocean. Tech., 38, 2125–2145, <ext-link xlink:href="https://doi.org/10.1175/JTECH-D-21-0013.1" ext-link-type="DOI">10.1175/JTECH-D-21-0013.1</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Radford et al.(2019)Radford, Wu, Child, Luan, Amodei, and Sutskever</label><mixed-citation>Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I.: Language Models are Unsupervised Multitask Learners, OpenAI Blog, 1, <uri>https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf</uri> (last access: 20 August 2025), 2019.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Ravuri et al.(2021)Ravuri, Lenc, Willson, Kangin, Lam, Mirowski, Fitzsimons, Athanassiadou, Kashem, Madge, Prudden, Mandhane, Clark, Brock, Simonyan, Hadsell, Robinson, Clancy, Arribas, and Mohamed</label><mixed-citation> Ravuri, S., Lenc, K., Willson, M., Kangin, D., Lam, R., Mirowski, P., Fitzsimons, M., Athanassiadou, M., Kashem, S., Madge, S., Prudden, R., Mandhane, A., Clark, A., Brock, A., Simonyan, K., Hadsell, R., Robinson, N., Clancy, E., Arribas, A., and Mohamed, S.: Skilful precipitation nowcasting using deep generative models of radar, Nature, 597, 672–677, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Ritvanen et al.(2025)Ritvanen, Pulkkinen, Moisseev, and Nerini</label><mixed-citation>Ritvanen, J., Pulkkinen, S., Moisseev, D., and Nerini, D.: Cell-tracking-based framework for assessing nowcasting model skill in reproducing growth and decay of convective rainfall, Geosci. Model Dev., 18, 1851–1878, <ext-link xlink:href="https://doi.org/10.5194/gmd-18-1851-2025" ext-link-type="DOI">10.5194/gmd-18-1851-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Rombach et al.(2022)Rombach, Blattmann, Lorenz, Esser, and Ommer</label><mixed-citation>Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B.: High-resolution image synthesis with latent diffusion models, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10684–10695, <ext-link xlink:href="https://doi.org/10.1109/CVPR46437.2021.01268" ext-link-type="DOI">10.1109/CVPR46437.2021.01268</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>Seed(2003)</label><mixed-citation>Seed, A. W.: A Dynamic and Spatial Scaling Approach to Advection Forecasting, J. Appl. Meteorol., 42, 381–388, <ext-link xlink:href="https://doi.org/10.1175/1520-0450(2003)042&lt;0381:ADASSA&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0450(2003)042&lt;0381:ADASSA&gt;2.0.CO;2</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Seed et al.(2013)Seed, Pierce, and Norman</label><mixed-citation>Seed, A. W., Pierce, C. E., and Norman, K.: Formulation and evaluation of a scale decomposition-based stochastic precipitation nowcast scheme, Water Resour. Res., 49, 6624–6641, <ext-link xlink:href="https://doi.org/10.1002/wrcr.20536" ext-link-type="DOI">10.1002/wrcr.20536</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Shi et al.(2015)Shi, Chen, Wang, Yeung, Wong, and Woo</label><mixed-citation> Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., and Woo, W.-C.: Convolutional LSTM network: A machine learning approach for precipitation nowcasting, Adv. Neur. In., 28, 802–810, ISBN 9781510825024, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Sideris et al.(2020)Sideris, Foresti, Nerini, and Germann</label><mixed-citation>Sideris, I. V., Foresti, L., Nerini, D., and Germann, U.: NowPrecip: localized precipitation nowcasting in the complex terrain of Switzerland, Q. J. Roy. Meteor. Soc., 146, 1768–1800, <ext-link xlink:href="https://doi.org/10.1002/qj.3766" ext-link-type="DOI">10.1002/qj.3766</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Sun et al.(2014)Sun, Xue, Wilson, Zawadzki, Ballard, Onvlee-Hooimeyer, Joe, Barker, Li, Golding, Xu, and Pinto</label><mixed-citation>Sun, J., Xue, M., Wilson, J. W., Zawadzki, I., Ballard, S. P., Onvlee-Hooimeyer, J., Joe, P., Barker, D. M., Li, P.-W., Golding, B., Xu, M., and Pinto, J.: Use of NWP for Nowcasting Convective Precipitation: Recent Progress and Challenges, B. Am. Meteorol. Soc., 95, 409–426, <ext-link xlink:href="https://doi.org/10.1175/BAMS-D-11-00263.1" ext-link-type="DOI">10.1175/BAMS-D-11-00263.1</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx45"><label>Surcel et al.(2015)Surcel, Zawadzki, and Yau</label><mixed-citation>Surcel, M., Zawadzki, I., and Yau, M. K.: A Study on the Scale Dependence of the Predictability of Precipitation Patterns, J. Atmos. Sci., 72, 216–235, <ext-link xlink:href="https://doi.org/10.1175/JAS-D-14-0071.1" ext-link-type="DOI">10.1175/JAS-D-14-0071.1</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx46"><label>Tomasi et al.(2025)Tomasi, Franch, and Cristoforetti</label><mixed-citation>Tomasi, E., Franch, G., and Cristoforetti, M.: Can AI be enabled to perform dynamical downscaling? A latent diffusion model to mimic kilometer-scale COSMO5.0_CLM9 simulations, Geosci. Model Dev., 18, 2051–2078, <ext-link xlink:href="https://doi.org/10.5194/gmd-18-2051-2025" ext-link-type="DOI">10.5194/gmd-18-2051-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx47"><label>Turner et al.(2004)Turner, Zawadzki, and Germann</label><mixed-citation>Turner, B. J., Zawadzki, I., and Germann, U.: Predictability of Precipitation from Continental Radar Images. Part III: Operational Nowcasting Implementation (MAPLE), J. Appl. Meteorol., 43, 231–248, <ext-link xlink:href="https://doi.org/10.1175/1520-0450(2004)043&lt;0231:POPFCR&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0450(2004)043&lt;0231:POPFCR&gt;2.0.CO;2</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx48"><label>Vaswani et al.(2017)Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, and Polosukhin</label><mixed-citation> Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.: Attention is all you need, Adv. Neur. In., 30, 5999–6009, ISBN 9781510860964, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx49"><label>Wang et al.(2018)Wang, Gao, Long, Wang, and Philip</label><mixed-citation> Wang, Y., Gao, Z., Long, M., Wang, J., and Philip, S. Y.: Predrnn++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning, in: International conference on machine learning,   PMLR,  5123–5132,ISBN 9781510867963, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx50"><label>Wang et al.(2004)Wang, Bovik, Sheikh, and Simoncelli</label><mixed-citation> Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P.: Image quality assessment: from error visibility to structural similarity, IEEE T. Image Process., 13, 600–612, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx51"><label>Werner and Cranston(2009)</label><mixed-citation> Werner, M. and Cranston, M.: Understanding the value of radar rainfall nowcasts in flood forecasting and warning in flashy catchments, Meteorological Applications: A journal of forecasting, practical applications, Training Techniques And Modelling, 16, 41–55, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx52"><label>Wernli et al.(2008)Wernli, Paulat, Hagen, and Frei</label><mixed-citation> Wernli, H., Paulat, M., Hagen, M., and Frei, C.: SAL – A novel quality measure for the verification of quantitative precipitation forecasts, Mon. Weather Rev., 136, 4470–4487, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx53"><label>Wernli et al.(2009)Wernli, Hofmann, and Zimmer</label><mixed-citation> Wernli, H., Hofmann, C., and Zimmer, M.: Spatial forecast verification methods intercomparison project: Application of the SAL technique, Weather Forecast., 24, 1472–1484, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx54"><label>Wolf et al.(2020)Wolf, Debut, Sanh, Chaumond, Delangue, Moi, Cistac, Rault, Louf, Funtowicz, Davison, Shleifer, von Platen, Ma, Jernite, Plu, Xu, Le Scao, Gugger, Drame, Lhoest, and Rush</label><mixed-citation>Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., and Rush, A.: Transformers: State-of-the-Art Natural Language Processing, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, edited by: Liu, Q. and Schlangen, D.,  Association for Computational Linguistics, Online, 38–45, <ext-link xlink:href="https://doi.org/10.18653/v1/2020.emnlp-demos.6" ext-link-type="DOI">10.18653/v1/2020.emnlp-demos.6</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx55"><label>Woo and Wong(2017)</label><mixed-citation>Woo, W.-C. and Wong, W.-K.: Operational Application of Optical Flow Techniques to Radar-Based Rainfall Nowcasting, Atmosphere, 8, 48, <ext-link xlink:href="https://doi.org/10.3390/atmos8030048" ext-link-type="DOI">10.3390/atmos8030048</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx56"><label>Yadan(2019)</label><mixed-citation>Yadan, O.: Hydra – A framework for elegantly configuring complex applications, Github [code], <uri>https://github.com/facebookresearch/hydra</uri> (last access: 20 August 2025), 2019.</mixed-citation></ref>
      <ref id="bib1.bibx57"><label>Yu et al.(2022)Yu, Li, Koh, Zhang, Pang, Qin, Ku, Xu, Baldridge, and Wu</label><mixed-citation>Yu, J., Li, X., Koh, J. Y., Zhang, H., Pang, R., Qin, J., Ku, A., Xu, Y., Baldridge, J., and Wu, Y.: Vector-quantized Image Modeling with Improved VQGAN, in: International Conference on Learning Representations, <uri>https://openreview.net/forum?id=pfNyExj7z2</uri> (last access: 20 August 2025), 2022.</mixed-citation></ref>
      <ref id="bib1.bibx58"><label>Zhang et al.(2018)Zhang, Isola, Efros, Shechtman, and Wang</label><mixed-citation>Zhang, R., Isola, P., Efros, A. A., Shechtman, E., and Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric, in: Proceedings of the IEEE conference on computer vision and pattern recognition,  586–595, <uri>https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00068</uri> (last access: 20 August 2025), 2018.</mixed-citation></ref>
      <ref id="bib1.bibx59"><label>Zhang et al.(2023)Zhang, Long, Chen, Xing, Jin, Jordan, and Wang</label><mixed-citation> Zhang, Y., Long, M., Chen, K., Xing, L., Jin, R., Jordan, M. I., and Wang, J.: Skilful nowcasting of extreme precipitation with NowcastNet, Nature, 619, 526–532, 2023.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>GPTCast: a weather language model for precipitation nowcasting</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Agrawal et al.(2019)Agrawal, Barrington, Bromberg, Burge, Gazen, and
Hickey</label><mixed-citation>
      
Agrawal, S., Barrington, L., Bromberg, C., Burge, J., Gazen, C., and Hickey,
J.: Machine Learning for Precipitation Nowcasting from Radar Images, CoRR,
arXiv [preprint],
<a href="https://doi.org/10.48550/arXiv.1912.12132" target="_blank">https://doi.org/10.48550/arXiv.1912.12132</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Ayzel et al.(2020)Ayzel, Scheffer, and Heistermann</label><mixed-citation>
      
Ayzel, G., Scheffer, T., and Heistermann, M.: RainNet v1.0: a convolutional neural network for radar-based precipitation nowcasting, Geosci. Model Dev., 13, 2631–2644, <a href="https://doi.org/10.5194/gmd-13-2631-2020" target="_blank">https://doi.org/10.5194/gmd-13-2631-2020</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Bellon and Austin(1978)</label><mixed-citation>
      
Bellon, A. and Austin, G. L.: The evaluation of two years of real-time
operation of a short-term precipitation forecasting procedure (SHARP),
J. Appl. Meteorol., 17, 1778–1787, 1978.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Bojinski et al.(2023)Bojinski, Blaauboer, Calbet, de Coning, Debie,
Montmerle, Nietosvaara, Norman, Bañón Peregrín, Schmid, Strelec Mahović,
and Wapler</label><mixed-citation>
      
Bojinski, S., Blaauboer, D., Calbet, X., de Coning, E., Debie, F., Montmerle,
T., Nietosvaara, V., Norman, K., Bañón Peregrín, L., Schmid, F.,
Strelec Mahović, N., and Wapler, K.: Towards nowcasting in Europe in 2030,
Meteorol. Appl., 30, e2124,
<a href="https://doi.org/10.1002/met.2124" target="_blank">https://doi.org/10.1002/met.2124</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Bowler et al.(2006)Bowler, Pierce, and Seed</label><mixed-citation>
      
Bowler, N. E., Pierce, C. E., and Seed, A. W.: STEPS: A probabilistic
precipitation forecasting scheme which merges an extrapolation nowcast with
downscaled NWP, Q. J. Roy. Meteor. Soc., 132,
2127–2155, <a href="https://doi.org/10.1256/qj.04.100" target="_blank">https://doi.org/10.1256/qj.04.100</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Dao et al.(2022)Dao, Fu, Ermon, Rudra, and
Ré</label><mixed-citation>
      
Dao, T., Fu, D. Y., Ermon, S., Rudra, A., and Ré, C.: FlashAttention:
Fast and Memory-Efficient Exact Attention with IO-Awareness, Advances in
Neural Information Processing Systems (NeurIPS), <a href="https://dl.acm.org/doi/10.5555/3600270.3601459" target="_blank"/> (last access: 20 August 2025), 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Dixon and Wiener(1993)</label><mixed-citation>
      
Dixon, M. and Wiener, G.: TITAN: Thunderstorm Identification, Tracking,
Analysis, and Nowcasting – A radar-based methodology, J. Atmos. Ocean.
Tech., 10, 785–797, 1993.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Dosovitskiy et al.(2020)Dosovitskiy, Beyer, Kolesnikov, Weissenborn,
Zhai, Unterthiner, Dehghani, Minderer, Heigold, Gelly et al.</label><mixed-citation>
      
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X.,
Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.,  Uszkoreit, J., and Houlsby, N.:
An image is worth 16x16 words: Transformers for image recognition at scale,
arXiv [preprint], <a href="https://doi.org/10.48550/arXiv.2010.11929" target="_blank">https://doi.org/10.48550/arXiv.2010.11929</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Esser et al.(2021)Esser, Rombach, and Ommer</label><mixed-citation>
      
Esser, P., Rombach, R., and Ommer, B.: Taming transformers for high-resolution
image synthesis, in: Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition, Nashville, TN, USA, 20–25 June 2021, 12873–12883,  <a href="https://doi.org/10.1109/CVPR46437.2021.01268" target="_blank">https://doi.org/10.1109/CVPR46437.2021.01268</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Falcon and The PyTorch Lightning
team(2019)</label><mixed-citation>
      
Falcon, W. and The PyTorch Lightning team: PyTorch Lightning, Zenodo [code],
<a href="https://doi.org/10.5281/zenodo.3828935" target="_blank">https://doi.org/10.5281/zenodo.3828935</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Fan et al.(2018)Fan, Lewis, and Dauphin</label><mixed-citation>
      
Fan, A., Lewis, M., and Dauphin, Y.: Hierarchical Neural Story Generation, in:
Proceedings of the 56th Annual Meeting of the Association for Computational
Linguistics (ACL), <a href="https://doi.org/10.18653/v1/P18-1082" target="_blank">https://doi.org/10.18653/v1/P18-1082</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Foresti et al.(2018)Foresti, Sideris, Panziera, Nerini, and
Germann</label><mixed-citation>
      
Foresti, L., Sideris, I. V., Panziera, L., Nerini, D., and Germann, U.: A
10-year radar-based analysis of orographic precipitation growth and decay
patterns over the Swiss Alpine region, Q. J. Roy.
Meteor. Soc., 144, 2277–2301,
<a href="https://doi.org/10.1002/qj.3364" target="_blank">https://doi.org/10.1002/qj.3364</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Fornasiero et al.(2006)Fornasiero, Bech, and
Alberoni</label><mixed-citation>
      
Fornasiero, A., Bech, J., and Alberoni, P. P.: Enhanced radar precipitation estimates using a combined clutter and beam blockage correction technique, Nat. Hazards Earth Syst. Sci., 6, 697–710, <a href="https://doi.org/10.5194/nhess-6-697-2006" target="_blank">https://doi.org/10.5194/nhess-6-697-2006</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Fornasiero et al.(2008)Fornasiero, Amorati, and
Alberoni</label><mixed-citation>
      
Fornasiero, A., Amorati, R., and Alberoni, P. P.: Radar Quantitative
Precipitation Estimation at Arpa-Sim: A Critical Approach to Retrieve the
Rainfall Rate at the Ground Level, in: Proceedings of the 5th European
Radar Conference, Helsinki, vol. 30, ISBN 9789516976764, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Franch et al.(2020)Franch, Nerini, Pendesini, Coviello, Jurman, and
Furlanello</label><mixed-citation>
      
Franch, G., Nerini, D., Pendesini, M., Coviello, L., Jurman, G., and
Furlanello, C.: Precipitation Nowcasting with Orographic Enhanced Stacked
Generalization: Improving Deep Learning Predictions on Extreme Events,
Atmosphere, 11, 267, <a href="https://doi.org/10.3390/atmos11030267" target="_blank">https://doi.org/10.3390/atmos11030267</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Franch et al.(2024a)Franch, Tomasi, Cardinali, Poli,
Alberoni, and Cristoforetti</label><mixed-citation>
      
Franch, G., Tomasi, E., Cardinali, C., Poli, V., Alberoni, P. P., and
Cristoforetti, M.: Dataset for “GPTCast: a weather language model for
precipitation nowcasting”, Zenodo [data set], <a href="https://doi.org/10.5281/zenodo.13692016" target="_blank">https://doi.org/10.5281/zenodo.13692016</a>,
2024a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Franch et al.(2024b)Franch, Tomasi, and
Cristoforetti</label><mixed-citation>
      
Franch, G., Tomasi, E., and Cristoforetti, M.: Code for “GPTCast: a weather
language model for precipitation nowcasting”, Zenodo [code],
<a href="https://doi.org/10.5281/zenodo.13832526" target="_blank">https://doi.org/10.5281/zenodo.13832526</a>, 2024b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Franch et al.(2024c)Franch, Tomasi, and
Cristoforetti</label><mixed-citation>
      
Franch, G., Tomasi, E., and Cristoforetti, M.: Pretrained models for “GPTCast:
a weather language model for precipitation nowcasting”, Zenodo [code],
<a href="https://doi.org/10.5281/zenodo.13594332" target="_blank">https://doi.org/10.5281/zenodo.13594332</a>, 2024c.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Gao et al.(2023)Gao, Shi, Han, Wang, Jin, Maddix, Zhu, Li, and Wang</label><mixed-citation>
      
Gao, Z., Shi, X., Han, B., Wang, H., Jin, X., Maddix, D., Zhu, Y., Li, M., and Wang, Y.: PreDiff: precipitation nowcasting with latent diffusion models, in: Proceedings of the 37th International Conference on Neural Information Processing Systems (NeurIPS '23), New Orleans, LA, USA, 10–16 December 2023, Curran Associates, Inc., Red Hook, NY, USA, 3439, 36 pp., ISBN 9781713899921, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Gholami et al.(2021)Gholami, Kim, Dong, Yao, Mahoney, and
Keutzer</label><mixed-citation>
      
Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M. W., and Keutzer, K.: A
Survey of Quantization Methods for Efficient Neural Network Inference, arXiv
[preprint],
<a href="https://doi.org/10.48550/arXiv.2103.13630" target="_blank">https://doi.org/10.48550/arXiv.2103.13630</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Goodfellow et al.(2014)Goodfellow, Pouget-Abadie, Mirza, Xu,
Warde-Farley, Ozair, Courville, and Bengio</label><mixed-citation>
      
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair,
S., Courville, A., and Bengio, Y.: Generative adversarial nets, Association for Computing Machinery, New York, NY, USA, 139–144, <a href="https://doi.org/10.1145/3422622" target="_blank">https://doi.org/10.1145/3422622</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Göber et al.(2023)Göber, Christel, Hoffmann, Mooney, Rodriguez,
Becker, Ebert, Fearnley, Fundel, Geiger, Golding, Jeurig, Kelman, Kox, Magro,
Perrels, Postigo, Potter, Robbins, Rust, Schoster, Tan, Taylor, and
Williams</label><mixed-citation>
      
Göber, M., Christel, I., Hoffmann, D., Mooney, C. J., Rodriguez, L., Becker,
N., Ebert, E. E., Fearnley, C., Fundel, V. J., Geiger, T., Golding, B.,
Jeurig, J., Kelman, I., Kox, T., Magro, F.-A., Perrels, A., Postigo, J. C.,
Potter, S. H., Robbins, J., Rust, H., Schoster, D., Tan, M. L., Taylor, A.,
and Williams, H.: Enhancing the Value of Weather and Climate Services in
Society: Identified Gaps and Needs as Outcomes of the First WMO WWRP/SERA
Weather and Society Conference, B. Am. Meteor.
Soc., 104, E645–E651, <a href="https://doi.org/10.1175/BAMS-D-22-0199.1" target="_blank">https://doi.org/10.1175/BAMS-D-22-0199.1</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Holtzman et al.(2020)Holtzman, Buys, Du, Forbes, and
Choi</label><mixed-citation>
      
Holtzman, A., Buys, J., Du, L., Forbes, M., and Choi, Y.: The Curious Case of
Neural Text Degeneration, in: International Conference on Learning
Representations (ICLR), ISBN 979-8-3313-2198-7, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Kuzmin et al.(2022)Kuzmin, Van Baalen, Ren, Nagel, Peters, and Blankevoort</label><mixed-citation>
      
Kuzmin, A., Van Baalen, M., Ren, Y., Nagel, M., Peters, J., and Blankevoort, T.: FP8 quantization: the power of the exponent, in: Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS '22), New Orleans, LA, USA, 28 November–9 December 2022, Curran Associates, Inc., Red Hook, NY, USA, 1065, 12 pp., ISBN 9781713871088, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Lam et al.(2023)</label><mixed-citation>
      
Lam, R., Pascanu, R., Puigdomènech Gimenez, M., Agrawal, S., Dapogny, C.,
Schmidt, M., Keck, T., Mudigonda, M., Brutlag, P., Wang, J., Chantry, M.,
Norman, C., Dudhia, A., Clark, R., Otte, N., Tirilly, P., Wiklendt, S.,
Zimmer, A., Merose, A., Petersen, S., Visram, R., Valter, D., Hess, F., See,
A., Fritz, F., Bodin, T., Untema, B., Thurman, R., Targett, P., Ravenscroft,
A., McGuire, P., Kabra, M., Keeling, J., Gopal, A., Cheng, H., Piotrowski,
T., Battaglia, P., Kohli, P., Heess, N., and Hassabis, D.: GraphCast: AI
model for faster and more accurate global weather forecasting, Science, 382,
1416–1421, <a href="https://doi.org/10.1126/science.adi2336" target="_blank">https://doi.org/10.1126/science.adi2336</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Lang et al.(2024)</label><mixed-citation>
      
Lang, S., Alexe, M., Chantry, M., Dramsch, J., Pinault, F., Raoult, B., Clare, M. C. A., Lessig, C., Maier-Gerber, M., Magnusson, L., Ben Bouallègue, Z., Prieto Nemesio, A., Dueben, P. D., Brown, A., Pappenberger, F., and Rabier, F.: AIFS-ECMWF's data-driven forecasting system, arXiv [preprint], <a href="https://doi.org/10.48550/arXiv.2406.01465" target="_blank">https://doi.org/10.48550/arXiv.2406.01465</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Leinonen et al.(2023)Leinonen, Hamann, Nerini, Germann, and
Franch</label><mixed-citation>
      
Leinonen, J., Hamann, U., Nerini, D., Germann, U., and Franch, G.: Latent
diffusion models for generative precipitation nowcasting with accurate
uncertainty quantification,  arXiv [preprint],
<a href="https://doi.org/10.48550/arXiv.2304.12891" target="_blank">https://doi.org/10.48550/arXiv.2304.12891</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Lessig et al.(2023)Lessig, Luise, Gong, Langguth, Stadler, and
Schultz</label><mixed-citation>
      
Lessig, C., Luise, I., Gong, B., Langguth, M., Stadler, S., and Schultz, M.:
AtmoRep: A stochastic model of atmosphere dynamics using large scale
representation learning, arXiv [preprint],
<a href="https://doi.org/10.48550/arXiv.2308.13280" target="_blank">https://doi.org/10.48550/arXiv.2308.13280</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Liu et al.(2021)Liu, Lin, Cao, Hu, Wei, Zhang, Lin, and Guo</label><mixed-citation>
      
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B.:
Swin transformer: Hierarchical vision transformer using shifted windows, in:
Proceedings of the IEEE/CVF international conference on computer vision, 11–17 October 2021,
10012–10022, <a href="https://doi.org/10.1109/ICCV48922.2021.00986" target="_blank">https://doi.org/10.1109/ICCV48922.2021.00986</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Marshall and Palmer(1948)</label><mixed-citation>
      
Marshall, J. S. and Palmer, W. M. K.: The distribution of raindrops with size,
J. Atmos. Sci., 5, 165–166,
<a href="https://doi.org/10.1175/1520-0469(1948)005&lt;0165:TDORWS&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0469(1948)005&lt;0165:TDORWS&gt;2.0.CO;2</a>, 1948.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Panziera et al.(2011)Panziera, Germann, Gabella, and
Mandapaka</label><mixed-citation>
      
Panziera, L., Germann, U., Gabella, M., and Mandapaka, P. V.: NORA – Nowcasting
of Orographic Rainfall by means of Analogues, Q. J. Roy.
Meteor. Soc., 137, 2106–2123,
<a href="https://doi.org/10.1002/qj.878" target="_blank">https://doi.org/10.1002/qj.878</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Pope et al.(2023)Pope, Douglas, Chowdhery, Devlin, Bradbury, Levskaya, Heek, Xiao, Agrawal, and Dean</label><mixed-citation>
      
Pope, R., Douglas, S., Chowdhery, A., Devlin, J., Bradbury, J., Levskaya, A., Heek, J., Xiao, K., Agrawal, S., and Dean, J.: Efficiently scaling transformer inference, arXiv [preprint], <a href="https://doi.org/10.48550/arXiv.2211.05102" target="_blank">https://doi.org/10.48550/arXiv.2211.05102</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Pulkkinen et al.(2019)Pulkkinen, Nerini, Pérez Hortal,
Velasco-Forero, Seed, Germann, and Foresti</label><mixed-citation>
      
Pulkkinen, S., Nerini, D., Pérez Hortal, A. A., Velasco-Forero, C., Seed, A., Germann, U., and Foresti, L.: Pysteps: an open-source Python library for probabilistic precipitation nowcasting (v1.0), Geosci. Model Dev., 12, 4185–4219, <a href="https://doi.org/10.5194/gmd-12-4185-2019" target="_blank">https://doi.org/10.5194/gmd-12-4185-2019</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Pulkkinen et al.(2020)Pulkkinen, Chandrasekar, von Lerber, and
Harri</label><mixed-citation>
      
Pulkkinen, S., Chandrasekar, V., von Lerber, A., and Harri, A.-M.: Nowcasting
of Convective Rainfall Using Volumetric Radar Observations, IEEE T. Geosci.  Remote S., 58, 7845–7859,
<a href="https://doi.org/10.1109/TGRS.2020.2984594" target="_blank">https://doi.org/10.1109/TGRS.2020.2984594</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Pulkkinen et al.(2021)Pulkkinen, Chandrasekar, and
Niemi</label><mixed-citation>
      
Pulkkinen, S., Chandrasekar, V., and Niemi, T.: Lagrangian Integro-Difference
Equation Model for Precipitation Nowcasting, J. Atmos.
Ocean. Tech., 38, 2125–2145, <a href="https://doi.org/10.1175/JTECH-D-21-0013.1" target="_blank">https://doi.org/10.1175/JTECH-D-21-0013.1</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Radford et al.(2019)Radford, Wu, Child, Luan, Amodei, and
Sutskever</label><mixed-citation>
      
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I.:
Language Models are Unsupervised Multitask Learners, OpenAI Blog, 1, <a href="https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf" target="_blank"/> (last access: 20 August 2025), 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Ravuri et al.(2021)Ravuri, Lenc, Willson, Kangin, Lam, Mirowski, Fitzsimons, Athanassiadou, Kashem, Madge, Prudden, Mandhane, Clark, Brock, Simonyan, Hadsell, Robinson, Clancy, Arribas, and Mohamed</label><mixed-citation>
      
Ravuri, S., Lenc, K., Willson, M., Kangin, D., Lam, R., Mirowski, P., Fitzsimons, M., Athanassiadou, M., Kashem, S., Madge, S., Prudden, R., Mandhane, A., Clark, A., Brock, A., Simonyan, K., Hadsell, R., Robinson, N., Clancy, E., Arribas, A., and Mohamed, S.: Skilful precipitation nowcasting using deep generative models of radar, Nature, 597, 672–677, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Ritvanen et al.(2025)Ritvanen, Pulkkinen, Moisseev, and
Nerini</label><mixed-citation>
      
Ritvanen, J., Pulkkinen, S., Moisseev, D., and Nerini, D.: Cell-tracking-based framework for assessing nowcasting model skill in reproducing growth and decay of convective rainfall, Geosci. Model Dev., 18, 1851–1878, <a href="https://doi.org/10.5194/gmd-18-1851-2025" target="_blank">https://doi.org/10.5194/gmd-18-1851-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Rombach et al.(2022)Rombach, Blattmann, Lorenz, Esser, and
Ommer</label><mixed-citation>
      
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B.:
High-resolution image synthesis with latent diffusion models, in: Proceedings
of the IEEE/CVF conference on computer vision and pattern recognition,
10684–10695, <a href="https://doi.org/10.1109/CVPR46437.2021.01268" target="_blank">https://doi.org/10.1109/CVPR46437.2021.01268</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Seed(2003)</label><mixed-citation>
      
Seed, A. W.: A Dynamic and Spatial Scaling Approach to Advection Forecasting,
J. Appl. Meteorol., 42, 381–388,
<a href="https://doi.org/10.1175/1520-0450(2003)042&lt;0381:ADASSA&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0450(2003)042&lt;0381:ADASSA&gt;2.0.CO;2</a>, 2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Seed et al.(2013)Seed, Pierce, and Norman</label><mixed-citation>
      
Seed, A. W., Pierce, C. E., and Norman, K.: Formulation and evaluation of a
scale decomposition-based stochastic precipitation nowcast scheme, Water
Resour. Res., 49, 6624–6641, <a href="https://doi.org/10.1002/wrcr.20536" target="_blank">https://doi.org/10.1002/wrcr.20536</a>,
2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Shi et al.(2015)Shi, Chen, Wang, Yeung, Wong, and Woo</label><mixed-citation>
      
Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., and Woo, W.-C.:
Convolutional LSTM network: A machine learning approach for precipitation
nowcasting, Adv. Neur. In., 28, 802–810, ISBN 9781510825024, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Sideris et al.(2020)Sideris, Foresti, Nerini, and
Germann</label><mixed-citation>
      
Sideris, I. V., Foresti, L., Nerini, D., and Germann, U.: NowPrecip: localized
precipitation nowcasting in the complex terrain of Switzerland, Q.
J. Roy. Meteor. Soc., 146, 1768–1800,
<a href="https://doi.org/10.1002/qj.3766" target="_blank">https://doi.org/10.1002/qj.3766</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Sun et al.(2014)Sun, Xue, Wilson, Zawadzki, Ballard,
Onvlee-Hooimeyer, Joe, Barker, Li, Golding, Xu, and
Pinto</label><mixed-citation>
      
Sun, J., Xue, M., Wilson, J. W., Zawadzki, I., Ballard, S. P.,
Onvlee-Hooimeyer, J., Joe, P., Barker, D. M., Li, P.-W., Golding, B., Xu, M.,
and Pinto, J.: Use of NWP for Nowcasting Convective Precipitation: Recent
Progress and Challenges, B. Am. Meteorol. Soc., 95,
409–426, <a href="https://doi.org/10.1175/BAMS-D-11-00263.1" target="_blank">https://doi.org/10.1175/BAMS-D-11-00263.1</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Surcel et al.(2015)Surcel, Zawadzki, and
Yau</label><mixed-citation>
      
Surcel, M., Zawadzki, I., and Yau, M. K.: A Study on the Scale Dependence of
the Predictability of Precipitation Patterns, J. Atmos.
Sci., 72, 216–235, <a href="https://doi.org/10.1175/JAS-D-14-0071.1" target="_blank">https://doi.org/10.1175/JAS-D-14-0071.1</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Tomasi et al.(2025)Tomasi, Franch, and
Cristoforetti</label><mixed-citation>
      
Tomasi, E., Franch, G., and Cristoforetti, M.: Can AI be enabled to perform dynamical downscaling? A latent diffusion model to mimic kilometer-scale COSMO5.0_CLM9 simulations, Geosci. Model Dev., 18, 2051–2078, <a href="https://doi.org/10.5194/gmd-18-2051-2025" target="_blank">https://doi.org/10.5194/gmd-18-2051-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Turner et al.(2004)Turner, Zawadzki, and
Germann</label><mixed-citation>
      
Turner, B. J., Zawadzki, I., and Germann, U.: Predictability of Precipitation
from Continental Radar Images. Part III: Operational Nowcasting
Implementation (MAPLE), J. Appl. Meteorol., 43, 231–248,
<a href="https://doi.org/10.1175/1520-0450(2004)043&lt;0231:POPFCR&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0450(2004)043&lt;0231:POPFCR&gt;2.0.CO;2</a>, 2004.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Vaswani et al.(2017)Vaswani, Shazeer, Parmar, Uszkoreit, Jones,
Gomez, Kaiser, and Polosukhin</label><mixed-citation>
      
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N.,
Kaiser, Ł., and Polosukhin, I.: Attention is all you need, Adv. Neur. In., 30, 5999–6009, ISBN 9781510860964, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Wang et al.(2018)Wang, Gao, Long, Wang, and Philip</label><mixed-citation>
      
Wang, Y., Gao, Z., Long, M., Wang, J., and Philip, S. Y.: Predrnn++: Towards a
resolution of the deep-in-time dilemma in spatiotemporal predictive learning,
in: International conference on machine learning,   PMLR,  5123–5132,ISBN 9781510867963, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Wang et al.(2004)Wang, Bovik, Sheikh, and Simoncelli</label><mixed-citation>
      
Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P.: Image quality
assessment: from error visibility to structural similarity, IEEE T.
Image Process., 13, 600–612, 2004.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Werner and Cranston(2009)</label><mixed-citation>
      
Werner, M. and Cranston, M.: Understanding the value of radar rainfall nowcasts
in flood forecasting and warning in flashy catchments, Meteorological
Applications: A journal of forecasting, practical applications, Training
Techniques And Modelling, 16, 41–55, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Wernli et al.(2008)Wernli, Paulat, Hagen, and Frei</label><mixed-citation>
      
Wernli, H., Paulat, M., Hagen, M., and Frei, C.: SAL – A novel quality measure
for the verification of quantitative precipitation forecasts, Mon. Weather
Rev., 136, 4470–4487, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>Wernli et al.(2009)Wernli, Hofmann, and Zimmer</label><mixed-citation>
      
Wernli, H., Hofmann, C., and Zimmer, M.: Spatial forecast verification methods
intercomparison project: Application of the SAL technique, Weather
Forecast., 24, 1472–1484, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Wolf et al.(2020)Wolf, Debut, Sanh, Chaumond, Delangue, Moi, Cistac,
Rault, Louf, Funtowicz, Davison, Shleifer, von Platen, Ma, Jernite, Plu, Xu,
Le Scao, Gugger, Drame, Lhoest, and Rush</label><mixed-citation>
      
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P.,
Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen,
P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M.,
Lhoest, Q., and Rush, A.: Transformers: State-of-the-Art Natural Language
Processing, in: Proceedings of the 2020 Conference on Empirical Methods in
Natural Language Processing: System Demonstrations, edited by: Liu, Q. and
Schlangen, D.,  Association for Computational Linguistics, Online, 38–45,
<a href="https://doi.org/10.18653/v1/2020.emnlp-demos.6" target="_blank">https://doi.org/10.18653/v1/2020.emnlp-demos.6</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>Woo and
Wong(2017)</label><mixed-citation>
      
Woo, W.-C. and Wong, W.-K.: Operational Application of Optical Flow Techniques
to Radar-Based Rainfall Nowcasting, Atmosphere, 8, 48,
<a href="https://doi.org/10.3390/atmos8030048" target="_blank">https://doi.org/10.3390/atmos8030048</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>Yadan(2019)</label><mixed-citation>
      
Yadan, O.: Hydra – A framework for elegantly configuring complex applications,
Github [code], <a href="https://github.com/facebookresearch/hydra" target="_blank"/> (last access: 20 August 2025), 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>Yu et al.(2022)Yu, Li, Koh, Zhang, Pang, Qin, Ku, Xu, Baldridge, and
Wu</label><mixed-citation>
      
Yu, J., Li, X., Koh, J. Y., Zhang, H., Pang, R., Qin, J., Ku, A., Xu, Y.,
Baldridge, J., and Wu, Y.: Vector-quantized Image Modeling with Improved
VQGAN, in: International Conference on Learning Representations,
<a href="https://openreview.net/forum?id=pfNyExj7z2" target="_blank"/> (last access: 20 August 2025), 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>Zhang et al.(2018)Zhang, Isola, Efros, Shechtman, and
Wang</label><mixed-citation>
      
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., and Wang, O.: The
unreasonable effectiveness of deep features as a perceptual metric, in:
Proceedings of the IEEE conference on computer vision and pattern
recognition,  586–595, <a href="https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00068" target="_blank"/> (last access: 20 August 2025), 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>Zhang et al.(2023)Zhang, Long, Chen, Xing, Jin, Jordan, and
Wang</label><mixed-citation>
      
Zhang, Y., Long, M., Chen, K., Xing, L., Jin, R., Jordan, M. I., and Wang, J.:
Skilful nowcasting of extreme precipitation with NowcastNet, Nature, 619,
526–532, 2023.

    </mixed-citation></ref-html>--></article>
