<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">GMD</journal-id><journal-title-group>
    <journal-title>Geoscientific Model Development</journal-title>
    <abbrev-journal-title abbrev-type="publisher">GMD</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Geosci. Model Dev.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1991-9603</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/gmd-18-5873-2025</article-id><title-group><article-title>Statistical summaries for streamed data from  climate simulations: one-pass algorithms</article-title><alt-title>Statistical summaries for streamed data from climate simulations: one-pass algorithms</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Grayson</surname><given-names>Katherine</given-names></name>
          <email>katherine.grayson@bsc.es</email>
        <ext-link>https://orcid.org/0000-0002-6776-7893</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Thober</surname><given-names>Stephan</given-names></name>
          
        <ext-link>https://orcid.org/0000-0003-3939-1523</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Lacima-Nadolnik</surname><given-names>Aleksander</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-1826-769X</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Alsina-Ferrer</surname><given-names>Ivan</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff4">
          <name><surname>Lledó</surname><given-names>Llorenç</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-8628-6876</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Sharifi</surname><given-names>Ehsan</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1 aff3">
          <name><surname>Doblas-Reyes</surname><given-names>Francisco</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-6622-4280</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Earth Sciences Department, Barcelona Supercomputing Center, Barcelona, 08034, Spain</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research, Leipzig, 04318, Germany</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>Institució Catalana de Recerca i Estudis Avançats, Barcelona, 08010, Spain</institution>
        </aff>
        <aff id="aff4"><label>4</label><institution>ECMWF, Bonn, 53177, Germany</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Katherine Grayson (katherine.grayson@bsc.es)</corresp></author-notes><pub-date><day>10</day><month>September</month><year>2025</year></pub-date>
      
      <volume>18</volume>
      <issue>17</issue>
      <fpage>5873</fpage><lpage>5890</lpage>
      <history>
        <date date-type="received"><day>6</day><month>January</month><year>2025</year></date>
           <date date-type="rev-request"><day>6</day><month>February</month><year>2025</year></date>
           <date date-type="rev-recd"><day>16</day><month>May</month><year>2025</year></date>
           <date date-type="accepted"><day>20</day><month>June</month><year>2025</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2025 Katherine Grayson et al.</copyright-statement>
        <copyright-year>2025</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://gmd.copernicus.org/articles/18/5873/2025/gmd-18-5873-2025.html">This article is available from https://gmd.copernicus.org/articles/18/5873/2025/gmd-18-5873-2025.html</self-uri><self-uri xlink:href="https://gmd.copernicus.org/articles/18/5873/2025/gmd-18-5873-2025.pdf">The full text article is available as a PDF file from https://gmd.copernicus.org/articles/18/5873/2025/gmd-18-5873-2025.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e158">Global climate models (GCMs) are being increasingly run at finer resolutions to better capture small-scale dynamics and reduce uncertainties associated with parameterizations. Despite advances in high-performance computing (HPC), the resulting terabyte- to petabyte-scale data volumes now being produced from GCMs are overwhelming traditional long-term storage. To address this, some climate modelling projects are adopting a method known as data streaming, where model output is transmitted directly to downstream data consumers (any user of climate model data, e.g. an impact model) during model runtime, eliminating the need to archive complete datasets. This paper introduces the One_Pass Python package (v0.8.0), which enables users to compute statistics on streamed GCM output via one-pass algorithms – computational techniques that sequentially process data in a single pass without requiring access to the full time series. Crucially, packaging these algorithms independently, rather than relying on standardized statistics from GCMs, provides flexibility for a diverse range of downstream data consumers and allows for integration into various HPC workflows.  We present these algorithms for four different statistics: mean, standard deviation, percentiles and histograms. Each statistic is presented in the context of a use case, showing its application to a relevant variable. For statistics that can be represented by a single floating point value (i.e. mean, standard deviation, variance), the results are identical to “conventional” approaches within numerical precision limits, while the memory savings scale linearly with the period of time covered by the statistic. For statistics that require a distribution (percentiles and histograms), we make use of the <inline-formula><mml:math id="M1" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest, an algorithm that ingests streamed data and reduces them to a set of key clusters representing the distribution. Using this algorithm, we achieve excellent accuracy for variables with near-normal distributions (e.g. wind speed) and acceptable accuracy for skewed distributions such as precipitation. We also provide guidance on the best compression factor (the memory vs. accuracy trade-off) to use for each variable. We conclude by exploring the concept of convergence in streamed statistics, an essential factor for downstream applications such as bias-adjusting streamed data.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>Generalitat de Catalunya</funding-source>
<award-id>ARD209/22/000001</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e177">Climate change impacts are often felt at the local level, where decision makers require timely and localized information to anticipate and develop necessary adaptation measures <xref ref-type="bibr" rid="bib1.bibx28 bib1.bibx41" id="paren.1"/>. Projections from regional and global climate models (RCMs and GCMs) are regularly used to create such information for climate adaptation policies and socio-economic decisions <xref ref-type="bibr" rid="bib1.bibx44" id="paren.2"/> but often lack the suitable spatial resolution and temporal frequency to be fully exploited. To support this need, RCMs and GCMs are being run at increasingly finer spatio-temporal resolutions, which can better resolve the smaller-scale dynamics of the Earth's climate system <xref ref-type="bibr" rid="bib1.bibx42 bib1.bibx4 bib1.bibx24 bib1.bibx50 bib1.bibx45" id="paren.3"/>. This ongoing movement in the climate community towards increasingly higher spatio-temporal resolutions raises the question as to how these data will be managed <xref ref-type="bibr" rid="bib1.bibx22 bib1.bibx5" id="paren.4"/>. The growing size of the model output makes the current state-of-the-art archival method (e.g. Coordinated Regional Climate Downscaling Experiment (CORDEX) and Coupled Model Intercomparison Project (CMIP)) unfeasible. Moreover, the current archival method has left some data consumers without their required data, as climate model protocols either limit the number of variables stored or reduce their resolution and frequency (e.g. by storing monthly means or interpolated grids) to cope with the size. Here we refer to data consumers as external users who are not involved in the climate modelling process or diagnostics and traditionally access model output via static archives.</p>
      <p id="d2e192">In this context, initiatives such as Destination Earth (DestinE) <xref ref-type="bibr" rid="bib1.bibx5 bib1.bibx20 bib1.bibx54" id="paren.5"/> are utilizing a way to process the outputs of climate models as soon as they are produced, without having to save the full datasets to disc. This method, known as data streaming, transmits data in a continuous stream to data consumers, who can process them (e.g. via impact models) during runtime <xref ref-type="bibr" rid="bib1.bibx29 bib1.bibx35" id="paren.6"/>.  The data stream allows those data consumers currently limited by reduced storage archives to access the full model output at the highest frequency available (e.g. hourly) and native spatial resolution. However, the advent of data streaming in the climate community poses its own set of challenges. Often, downstream data consumers require climate data that span long periods. For example, many hydrological impact models require daily, monthly or annual maximum precipitation values <xref ref-type="bibr" rid="bib1.bibx51 bib1.bibx46" id="paren.7"/>, while in the wind energy sector, accurate distribution functions of the wind speed are essential information <xref ref-type="bibr" rid="bib1.bibx43 bib1.bibx32" id="paren.8"/>. Moreover, as all models are subject to systematic biases, many users require bias-adjusted climate data. Normally, these computations would be done using conventional methods, which require the entire dataset to be available when the computation is performed. In the streaming context, the one-pass problem is introduced: how to compute summaries, diagnostics or derived quantities without having access to the whole time series.</p>
      <p id="d2e207">In this paper, we introduce the One_Pass package (v0.8.0), a flexible, bespoke Python package built to compute statistics and aid in the bias adjustment of streamed climate data using one-pass algorithms. While one-pass algorithms have been adopted in other fields such as online trading <xref ref-type="bibr" rid="bib1.bibx33" id="paren.9"/> and machine learning <xref ref-type="bibr" rid="bib1.bibx37" id="paren.10"/>, they have yet to find a foothold in climate science, mainly because their use was not necessary, until now. Unlike the conventional method, one-pass algorithms do not have access to the whole time series; rather, they process data incrementally every time a climate model outputs new time steps <xref ref-type="bibr" rid="bib1.bibx39" id="paren.11"/>. This is done by sequentially processing data chunks as they become available, with each chunk's value being incorporated into a rolling summary, which is then moved into an output buffer before processing the next chunk. Details of the package's design choices – including its requirements for flexibility – and its current use in high-performance computing (HPC) workflows are given in Sect. <xref ref-type="sec" rid="Ch1.S2"/>. The remainder of the paper focuses on the mathematical theory behind the one-pass algorithms and investigates their utility and accuracy for climate data analysis. We do not discuss the code implementation; however, examples of the packages can be found in the provided Jupyter notebooks. In Sect. <xref ref-type="sec" rid="Ch1.S3"/> we present the mathematical notation used throughout this paper. Sections <xref ref-type="sec" rid="Ch1.S4"/> to <xref ref-type="sec" rid="Ch1.S6"/> then cover the algorithms used for the mean, standard deviation and statistical distributions. These statistics have been chosen because they represent the most commonly required statistics for climate data, although many other statistics (i.e. minimum, maximum, threshold exceedance, etc.) have been implemented in the One_Pass package using the same approach. For each statistic, the one-pass algorithm is first presented, followed by an example use case which applies the algorithm to a relevant variable over a meaningful time span. With the aid of these use cases, the numerical accuracy is compared to the conventional approach (being able to read the dataset as a whole to compute the statistic), along with the achieved memory savings. In this paper, the conventional approach is always calculated with Python's NumPy package <xref ref-type="bibr" rid="bib1.bibx19" id="paren.12"/>.</p>
      <p id="d2e231">In Sect. <xref ref-type="sec" rid="Ch1.S7"/> we discuss challenges that arise when using streamed data from one-pass algorithms for certain applications. For example, climate model outputs are often bias-adjusted using probability distribution functions (PDFs) to perform a quantile–quantile mapping against a reference dataset <xref ref-type="bibr" rid="bib1.bibx31" id="paren.13"/>. The One_Pass package can be used to create these PDFs from the streamed data. However, these estimates will initially fluctuate and only converge as more data points are added. It is then crucial to know after how many samples the estimate (i.e. the PDF) is representative of the entire period – in other words, when it has converged. Only if convergence is attained can the PDF estimate be used for downstream applications (e.g. bias adjustment). Overall, the aim of this work is to both showcase the utility and highlight the current limitations of one-pass algorithms for climate data analysis. Moreover, through the provision of the One_Pass Python package, we show the foundations of the infrastructure required for the climate community to harness the capabilities of kilometre-scale climate data through data streaming.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Design and implementation in a workflow</title>
      <p id="d2e247">The Climate Change Adaptation Digital Twin (ClimateDT) is one of the components of Destination Earth, a flagship initiative of the European Commission <xref ref-type="bibr" rid="bib1.bibx5 bib1.bibx20 bib1.bibx54" id="paren.14"/>. The ClimateDT aims to combine kilometre-scale global climate modelling with impact sector modelling in a single end-to-end workflow. One of the overarching aims is therefore to provide climate impact information tailored to different user needs by directly transforming climate model output. However, a gap often exists between the needs of specific data consumers and the general requirements of models. Moreover, the ClimateDT runs on several different HPC platforms, with several different GCMs. While many modelling groups will agree on providing monthly means via online calculations for specific variables (e.g. <xref ref-type="bibr" rid="bib1.bibx13" id="altparen.15"/>), it is not realistic, nor feasible, for GCMs to tailor and maintain the disparate needs of data consumers from the impact modelling side. For instance, a data consumer focused on renewable energy production might be interested in monthly wind speed percentiles at a specific height in a particular location. The implementation of this output in climate models is impractical, whose goal is to provide outputs that are as general as possible. The development of the ClimateDT as a flexible, scalable framework emerges as a challenge in addition to that posed by the increase in spatio-temporal resolution. The One_Pass package was conceived in the frame of the ClimateDT to fulfil the requirements of downstream data consumers while keeping the model outputs relatively agnostic.</p>
      <p id="d2e256">The data flow in the ClimateDT is orchestrated through a structured workflow that organizes a sequence of interdependent computational jobs. This workflow is managed using Autosubmit <xref ref-type="bibr" rid="bib1.bibx34" id="paren.16"/>, which automates the submission and monitoring of jobs on the target HPC platform. The One_Pass package runs as one of these jobs in the ClimateDT workflow, serving as an interface between the data production and its consumption. In this job, the One_Pass package operates via a central class <monospace>Opa</monospace>, which initially takes a request defined by the data consumer. The request primarily contains the information relevant to the aggregation of a variable. This information includes, among other things, the variable in question, the statistic to compute and the frequency of aggregation. Once this request is filed, and an instance of this class is created, the resulting object is set up to perform one-pass aggregations, taking in data chunks of arbitrary time length, which belong to the active streaming window. After performing a series of sanity checks on the incoming time steps, the instance of <monospace>Opa</monospace> updates its internal state to include the requested aggregation up to the time corresponding to the last incoming data chunk. Once enough data have been taken in to complete the requested statistic, the instance will output the requested variable aggregated at the specified frequency and move on to the next calculation.</p>
      <p id="d2e268">Because the One_Pass package runs in an HPC environment, it is designed in a persistent manner. This means that the job in which it runs is not required to sit idly and consume HPC hours while there is no model output ready to process. This saves computing resources; however it does require the current status of the statistic to be saved between jobs. This is done via checkpointing, where each instance of the <monospace>Opa</monospace> class is stored after every update as a binary file. Similarly, upon initialization, it will load an existing state from a previous execution (if one exists) to use as a starting point. It will not store any internal state after the completion of the statistic. In the context of the ClimateDT, this is all handled by Autosubmit, the workflow manager, although the package can be used with other HPC workflow managers.</p>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Mathematical notation</title>
      <p id="d2e282">For a given dataset, the following mathematical notation is used to describe the one-pass algorithms: <list list-type="bullet"><list-item>
      <p id="d2e287"><inline-formula><mml:math id="M2" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> is the current number of data samples (time steps) passed to the statistic.</p></list-item><list-item>
      <p id="d2e297"><inline-formula><mml:math id="M3" display="inline"><mml:mi>w</mml:mi></mml:math></inline-formula> is the length of the incoming data chunk (number of time steps).</p></list-item><list-item>
      <p id="d2e307"><inline-formula><mml:math id="M4" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula> is the number of time steps required to complete the statistic (i.e. if the model provides hourly output, and we require a daily statistic, <inline-formula><mml:math id="M5" display="inline"><mml:mrow><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">24</mml:mn></mml:mrow></mml:math></inline-formula>).</p></list-item><list-item>
      <p id="d2e329"><inline-formula><mml:math id="M6" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is one time step of the data at time <inline-formula><mml:math id="M7" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula>.</p></list-item><list-item>
      <p id="d2e355"><inline-formula><mml:math id="M8" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula> represents the full dataset up to <inline-formula><mml:math id="M9" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula>.</p></list-item><list-item>
      <p id="d2e410"><inline-formula><mml:math id="M10" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mi>w</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula> is the incoming data chunk of length <inline-formula><mml:math id="M11" display="inline"><mml:mi>w</mml:mi></mml:math></inline-formula>.</p></list-item><list-item>
      <p id="d2e463"><inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the rolling summary of the statistic before the new chunk at time <inline-formula><mml:math id="M13" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula>. This summary varies for each statistic (i.e. if it is the mean statistic, <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>). This rolling summary will always be of length one in the time dimension.</p></list-item><list-item>
      <p id="d2e510"><inline-formula><mml:math id="M15" display="inline"><mml:mi>g</mml:mi></mml:math></inline-formula> is a one-pass function that updates the previous summary <inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> with new incoming data <inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mi>w</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>.</p></list-item></list></p>
      <p id="d2e541">We introduce the chunk length <inline-formula><mml:math id="M18" display="inline"><mml:mi>w</mml:mi></mml:math></inline-formula> as, in many cases, an active data streaming window will contain a few consecutive time steps output from the climate model. In the case where the incoming data stream has only one time step, <inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mi>w</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mi>w</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> reduces to <inline-formula><mml:math id="M21" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Mean</title>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Algorithm description</title>
      <p id="d2e609">The one-pass algorithm for the mean is given by

            <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M22" display="block"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>g</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mi>w</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>n</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>w</mml:mi><mml:mo mathsize="2.0em">(</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>w</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mi>w</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo mathsize="2.0em">)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the updated rolling mean of the dataset with the new data chunk <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mi>w</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, and the rolling summary <inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is given by the rolling mean <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. If <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:mi>w</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>w</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the temporal conventional mean over the incoming data chunk; however if the data are streamed at the same frequency of the model output with <inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:mi>w</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, then <inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>w</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>2 m air temperature</title>
      <p id="d2e825">We apply the mean one-pass algorithm given in Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>) for the analysis of 2 m air temperature data. European projects such as nextGEMS <xref ref-type="bibr" rid="bib1.bibx47" id="paren.17"/> and  <xref ref-type="bibr" rid="bib1.bibx9" id="text.18"/> are aiming to provide global climate projections for a variety of variables at spatial resolutions on the kilometre scale, ranging from 0.025 to 0.1°. This will allow for granular analysis of projected temperatures to inform climate adaptation at local to regional scales, although performing basic computations with such vast amounts of data can prove challenging. In this use case for the mean algorithm, we use data from the ECMWF's Integrated Forecasting System (IFS) model coupled with the Finite Element/Volume Sea-ice Ocean Model (FESOM) (tco2559-ng5-cycle3 experiment) <xref ref-type="bibr" rid="bib1.bibx30 bib1.bibx45" id="paren.19"/> (run as part of the nextGEMS project), looking at 2 m temperature in March 2020. We use hourly data at native model resolution (<inline-formula><mml:math id="M30" display="inline"><mml:mo lspace="0mm">∼</mml:mo></mml:math></inline-formula> 0.04°), resulting in a global map containing approximately 26.31 million spatial grid cells, 744 time steps and a full size of 145.82 GB with double precision (float64).</p>
      <p id="d2e846">We calculated the March monthly mean of these data using both the conventional method and the one-pass algorithm given in Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>). Computing the temporal conventional mean of these data requires the full time series to be loaded into memory, summed across every cell and then divided by the length of the time dimension (in this case 744). Due to the high memory requirements of this dataset, this was performed on a high-memory node (256 GB) on the Levante HPC system. The dataset was re-chunked into 10 spatial chunks using the Python library Dask-Xarray <xref ref-type="bibr" rid="bib1.bibx8" id="paren.20"/>, where each chunk could fit into available memory. The conventional mean was then computed on each chunk. For the one-pass computation, we used a chunk length of <inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:mi>w</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> and called into memory each hour <inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> of the dataset to simulate streaming, iteratively updating Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>) until <inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:math></inline-formula>. These two methods highlight that with such large datasets, the conventional mean is not necessarily the simplest approach, as we still need specialized tools and high memory resources for computation. Rather than adding additional complexity, the one-pass approach allows for simpler handling and easier computation.</p>
      <p id="d2e892">The results of the one-pass mean algorithm can be seen in Fig. <xref ref-type="fig" rid="F1"/>a. We note that for plotting convenience, the native grid was interpolated to a 0.1° regular lat–long grid. Figure <xref ref-type="fig" rid="F1"/>b shows the difference between the conventional (NumPy) and the one-pass mean shown in (a). The difference is represented by randomly distributed noise on the order of 10<sup>−12</sup>, an accuracy within an insurmountable precision limit set by the machine precision as opposed to algorithmic discrepancies.</p>

      <fig id="F1"><label>Figure 1</label><caption><p id="d2e914"><bold>(a)</bold> Global monthly mean of 2 m temperature in March 2020 using hourly data from the IFS model, computed using the one-pass algorithm given in Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>). <bold>(b)</bold> The difference between <bold>(a)</bold> and the mean calculated using the conventional method.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/18/5873/2025/gmd-18-5873-2025-f01.png"/>

        </fig>

      <p id="d2e933">With regards to memory savings, the one-pass method requires only two data memory blocks, <inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>w</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, both approximately 200 MB with double precision (which is the size of one global time step), to be kept in memory at any point in time. If this computation was performed in real streaming mode, this would require only <inline-formula><mml:math id="M37" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 400 MB of memory to compute the monthly mean, <inline-formula><mml:math id="M38" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">744</mml:mn></mml:mrow></mml:math></inline-formula>th of the 145.82 GB memory requirements for the conventional method. We also note here that the memory cost of the one-pass method is independent of the length of the statistic time span. For the case of <inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:mi>w</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, the memory requirements will always be twice that of storing a spatial field, as opposed to the conventional method, where the memory requirements will increase linearly with the length of the time series required to compute the statistic. Moreover, this computation demonstrates that the one-pass algorithms do not merely provide memory savings; they provide user-oriented diagnostics that allow for easy computation. Due to this vast reduction in memory requirements, different variables can be computed in parallel to each other, further enhancing usability. The current paradigm of loading the entire dataset is not practical (and in some cases not possible) for data of this magnitude. Indeed, special tools such as Python's Dask-Xarray packages are often required.</p>
</sec>
</sec>
<sec id="Ch1.S5">
  <label>5</label><title>Standard deviation</title>
<sec id="Ch1.S5.SS1">
  <label>5.1</label><title>Algorithm description</title>
      <p id="d2e1011">The one-pass algorithm for the standard deviation (and also variance) is calculated over the requested temporal frequency <inline-formula><mml:math id="M40" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula> by updating two estimates iteratively: the one-pass mean and the sum of the squared differences. First, let the conventional summary for the sum of the squared differences, <inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, be defined as

            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M42" display="block"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>c</mml:mi></mml:munderover><mml:mo mathsize="1.1em">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>c</mml:mi></mml:msub><mml:msup><mml:mo mathsize="1.1em">)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the conventional mean of the whole dataset <inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> required for the statistic. For the one-pass calculation of the standard deviation, the rolling summary <inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> defines the sum of the squared differences, <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. In the case where <inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:mi>w</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, the rolling summary is updated by

            <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M48" display="block"><mml:mtable class="split" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mi>g</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mo mathsize="1.1em">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>n</mml:mi></mml:msub><mml:mo mathsize="1.1em">)</mml:mo><mml:mo mathsize="1.1em">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo mathsize="1.1em">)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

          where <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M50" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> are both given by the algorithm for the one-pass mean in Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>). Equation (<xref ref-type="disp-formula" rid="Ch1.E3"/>) is known as Welford's algorithm. In the case where the incoming data have more than one time step (<inline-formula><mml:math id="M51" display="inline"><mml:mrow><mml:mi>w</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>), <inline-formula><mml:math id="M52" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is updated by

            <disp-formula id="Ch1.E4" content-type="numbered"><label>4</label><mml:math id="M53" display="block"><mml:mtable class="split" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mi>g</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mi>w</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>w</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mi>w</mml:mi><mml:mi>n</mml:mi><mml:mo mathsize="1.1em">(</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>n</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>w</mml:mi></mml:msub><mml:msup><mml:mo mathsize="1.1em">)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mi>w</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

          where <inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>w</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the conventional sum of the squared differences over the incoming data block of length <inline-formula><mml:math id="M55" display="inline"><mml:mi>w</mml:mi></mml:math></inline-formula> (given by Eq. (<xref ref-type="disp-formula" rid="Ch1.E2"/>) with <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mi>w</mml:mi></mml:mrow></mml:math></inline-formula>), <inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the one-pass mean at <inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula> calculated with Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>), and <inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>w</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the conventional mean of the incoming data block. See <xref ref-type="bibr" rid="bib1.bibx36" id="text.21"/> for details.</p>
      <p id="d2e1498">Once enough data have been added to the rolling summary <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> so that <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:math></inline-formula> we calculate the standard deviation <inline-formula><mml:math id="M62" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula> using

            <disp-formula id="Ch1.E5" content-type="numbered"><label>5</label><mml:math id="M63" display="block"><mml:mrow><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>=</mml:mo><mml:msqrt><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfrac></mml:mstyle></mml:msqrt><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          to obtain the sample variance. Equation (<xref ref-type="disp-formula" rid="Ch1.E5"/>) also applies to the conventional summary <inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
</sec>
<sec id="Ch1.S5.SS2">
  <label>5.2</label><title>Sea surface height variability</title>
      <p id="d2e1581">We apply the one-pass algorithm for standard deviation to the sea surface height (SSH). The standard deviation of SSH can be used to quantify uncertainty between different model ensembles compared to satellite altimetry data. The SSH can be used to better understand ocean dynamics as its variability gives insights into the redistribution of mass, heat and salt within the water column <xref ref-type="bibr" rid="bib1.bibx6" id="paren.22"/>. We calculate the annual standard deviation using data from the FESOM model (tco2559-ng5-cycle3 experiment) <xref ref-type="bibr" rid="bib1.bibx45" id="paren.23"/>, again run as part of the nextGEMS project. We use daily data over 2021 at native model resolution (<inline-formula><mml:math id="M65" display="inline"><mml:mo lspace="0mm">∼</mml:mo></mml:math></inline-formula> 0.05°), making an annual time series – comprised of 7.4 million grid cells and 365 time slices – of 10.09 GB using single precision (float32). We first calculate the standard deviation using the conventional method defined in Eq. (<xref ref-type="disp-formula" rid="Ch1.E2"/>), followed by the one-pass method in Eq. (<xref ref-type="disp-formula" rid="Ch1.E4"/>), with <inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:mi>w</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>. Like with the mean calculation, for the conventional calculation the data were spatially re-chunked into 10 chunks using the Python library Dask-Xarray, and each chunk was computed separately, while for the one-pass, two daily time steps were iteratively called into memory to simulate data streaming.</p>
      <p id="d2e1614">Figure <xref ref-type="fig" rid="F2"/>a shows the one-pass standard deviation calculated using Eqs. (<xref ref-type="disp-formula" rid="Ch1.E4"/>) and (<xref ref-type="disp-formula" rid="Ch1.E5"/>) (using the One_Pass package; see notebooks). Figure <xref ref-type="fig" rid="F2"/>b then shows the difference between the one-pass and the conventional calculation. Again, for plotting convenience, the native grid was interpolated to a 0.25° regular lat–long grid. Here, the order of magnitude of the difference is  10<sup>−16</sup>, even smaller compared with the mean difference in Fig. <xref ref-type="fig" rid="F1"/>b. Interestingly, we also see some structure emerging in the differences shown in Fig. <xref ref-type="fig" rid="F2"/>b, which correlates with areas of larger standard deviation. However, due to the extremely small values, it is considered negligible in comparison to the required accuracy of the statistic. Therefore, as with the mean statistic, this difference can also be attributed to machine precision limitations.</p>

      <fig id="F2"><label>Figure 2</label><caption><p id="d2e1644"><bold>(a)</bold> Global annual standard deviation of the sea surface height in 2021 using daily data from the FESOM model, computed using the one-pass method given in Eqs. (<xref ref-type="disp-formula" rid="Ch1.E4"/>) and (<xref ref-type="disp-formula" rid="Ch1.E5"/>), implemented with the One_Pass package. <bold>(b)</bold> The difference between the one-pass computation and the conventional computation.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/18/5873/2025/gmd-18-5873-2025-f02.png"/>

          
        </fig>

      <p id="d2e1665">The memory savings for the standard deviation are slightly inferior to the mean one-pass algorithm as here, in the case of <inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:mi>w</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, other than the current data memory block <inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mi>w</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, four additional data summaries are required to be kept in memory: <inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>w</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M72" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M73" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>w</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. However, as with the mean, these memory requirements are independent of the time span (sample size) of the statistic and do not increase as the number of values required to complete the statistic (<inline-formula><mml:math id="M74" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula>) increases. In the example presented here, with <inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:mi>w</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>, approximately 112 MB of memory (single precision) is required, as opposed to the 10.09 GB of the full dataset. This is a reduction of 2 orders of magnitude in memory requirements.</p>
</sec>
</sec>
<sec id="Ch1.S6">
  <label>6</label><title>Distributions: percentiles and histograms</title>
<sec id="Ch1.S6.SS1">
  <label>6.1</label><title>Algorithm description</title>
      <p id="d2e1777">Unlike the one-pass algorithms for mean and standard deviation (and others such as minimum, maximum, threshold exceedance), where the rolling summary <inline-formula><mml:math id="M76" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> can be described by one floating point value, estimates of a distribution cannot be condensed in such a way. The <inline-formula><mml:math id="M77" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest algorithm, developed by <xref ref-type="bibr" rid="bib1.bibx12" id="text.24"/> and <xref ref-type="bibr" rid="bib1.bibx11" id="text.25"/>, creates reliable estimates of PDFs with a single pass through the dataset. The <inline-formula><mml:math id="M78" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest algorithm is used here, to the best of our knowledge, for the first time in climate data analysis. Our One_Pass package (and all the results presented in Sect. <xref ref-type="sec" rid="Ch1.S6"/>) uses the Python package crick <xref ref-type="bibr" rid="bib1.bibx7" id="paren.26"/> for the implementation of the <inline-formula><mml:math id="M79" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest algorithm. We note that other packages may provide slightly different results and efficiency due to variations in the algorithm implementation; however, our preliminary analysis conducted when investigating different packages did not show these to be substantial.</p>
      <p id="d2e1824">The <inline-formula><mml:math id="M80" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest algorithm represents a dataset <inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> by a series of clusters that are defined by their arithmetic mean value and their corresponding weight (i.e. the number of samples that have contributed to the cluster/mean). For streamed data, each value (<inline-formula><mml:math id="M82" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) is added to its nearest cluster, based on the value's proximity to the cluster's mean. As <inline-formula><mml:math id="M83" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is added, the weight and mean of that cluster are updated using Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>). The clusters are organized in such a way that those corresponding to the extremes of the distribution will contain fewer samples than those around the median quantile, meaning that the error is relative to the quantile, as opposed to a constant absolute error seen in previous methods <xref ref-type="bibr" rid="bib1.bibx12" id="paren.27"/>. The unequal sizes (i.e. the weights) of these clusters are set by the monotonically increasing scale function. While there is a range of possible scale functions, here we use the most common,

            <disp-formula id="Ch1.E6" content-type="numbered"><label>6</label><mml:math id="M84" display="block"><mml:mrow><mml:mi>k</mml:mi><mml:mo>(</mml:mo><mml:mi>q</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi mathvariant="italic">δ</mml:mi><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:msup><mml:mtext>sin</mml:mtext><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>(</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mi>q</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M85" display="inline"><mml:mi>q</mml:mi></mml:math></inline-formula> is the quantile, and <inline-formula><mml:math id="M86" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> is the compression parameter. Figure <xref ref-type="fig" rid="F3"/>a provides a visual representation of the scale function in Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>) for four different <inline-formula><mml:math id="M87" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> values (also see Fig. 1 in <xref ref-type="bibr" rid="bib1.bibx12" id="altparen.28"/>). Regardless of the compression parameter, there is a steeper gradient of <inline-formula><mml:math id="M88" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> near <inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:mi>q</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M90" display="inline"><mml:mrow><mml:mi>q</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>. As the size of a given cluster is determined by the slope of <inline-formula><mml:math id="M91" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>, this steeper gradient reduces the cluster weights over the tails of the distribution. For mathematical details of how the cluster sizes are determined, see <xref ref-type="bibr" rid="bib1.bibx12" id="text.29"/>. The compression parameter does not affect the shape of the scale function, but it does increase the range of <inline-formula><mml:math id="M92" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>. The greater the range of <inline-formula><mml:math id="M93" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>, the more clusters will be used to represent <inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, providing more accuracy but also increasing the memory required. This is demonstrated by the shaded grey dots in Fig. <xref ref-type="fig" rid="F3"/>a, located on the lines for <inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">20</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula>. The number of dots shows the number of clusters used to represent a dataset <inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mn mathvariant="normal">500</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, while the size of the dot indicates the weight. Significantly fewer clusters are used for <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">20</mml:mn></mml:mrow></mml:math></inline-formula> compared to <inline-formula><mml:math id="M99" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula>, reducing the accuracy along with the memory requirements. These clusters are ultimately converted to a percentile or a histogram (where bin densities may have non-integer values due to the underlying cluster representation), based on the closest cluster mean to the required percentile.</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e2081"><bold>(a)</bold> Graphical representation of the scale function in Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>) for four different <inline-formula><mml:math id="M100" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> values. The grey dots for <inline-formula><mml:math id="M101" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">20</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M102" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula> show the clusters that would be used to represent a dataset of <inline-formula><mml:math id="M103" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">500</mml:mn></mml:mrow></mml:math></inline-formula>, with a mean value and associated quantile (<inline-formula><mml:math id="M104" display="inline"><mml:mi>q</mml:mi></mml:math></inline-formula>) and a weight represented by the size of the dot. <bold>(b)</bold> Number of clusters used to represent datasets of varying lengths as a function of the compression parameter <inline-formula><mml:math id="M105" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula>. The corresponding memory consumption [kB] is given on the right-hand axis. Six random datasets (sampled from a uniform distribution) of lengths ranging from 50 to 1000 are used.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/18/5873/2025/gmd-18-5873-2025-f03.png"/>

        </fig>

      <p id="d2e2156">The effect of the compression parameter <inline-formula><mml:math id="M106" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> on the memory requirements is shown in Fig. <xref ref-type="fig" rid="F3"/>b, while the effects on accuracy of the percentile estimates are given in Sect. <xref ref-type="sec" rid="Ch1.S6.SS2"/> and <xref ref-type="sec" rid="Ch1.S6.SS3"/>. In Fig. <xref ref-type="fig" rid="F3"/>b, six random datasets (from a uniform distribution) of different lengths (<inline-formula><mml:math id="M107" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>) are used to show how many clusters are required to represent the data as a function of <inline-formula><mml:math id="M108" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> (within the range 20 <inline-formula><mml:math id="M109" display="inline"><mml:mrow><mml:mo>≤</mml:mo><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>≤</mml:mo></mml:mrow></mml:math></inline-formula> 140). Beyond <inline-formula><mml:math id="M110" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">350</mml:mn></mml:mrow></mml:math></inline-formula>, shown by the darkest three samples, the number of clusters used to represent the data grows linearly with <inline-formula><mml:math id="M111" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula>. This means that, for the range of <inline-formula><mml:math id="M112" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> tested, for all datasets with more than 350 values, the number of clusters is independent of the size of the dataset and is set only by <inline-formula><mml:math id="M113" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula>; i.e. a dataset of 500 values will be represented with the same number of clusters as a dataset of 5 million values. For smaller dataset sizes (anything below 350, as shown by the three lighter blue/green samples in Fig. <xref ref-type="fig" rid="F3"/>b) there may be no memory savings (depending on <inline-formula><mml:math id="M114" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula>) generated from using the algorithm, as each cluster requires two values (mean and weight) for its representation. For example, when considering <inline-formula><mml:math id="M115" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">140</mml:mn></mml:mrow></mml:math></inline-formula> and a sample size of 350, 180 clusters are used, requiring 360 values, which exceeds the length of the original dataset. Indeed, for very short datasets such as <inline-formula><mml:math id="M116" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">50</mml:mn></mml:mrow></mml:math></inline-formula> (lightest green line in Fig. <xref ref-type="fig" rid="F3"/>b), the number of clusters cannot grow beyond 50 as the distribution is already represented in its entirety, with each cluster containing one data sample with a weight of one. For these shorter datasets, there will be no added benefit of increasing <inline-formula><mml:math id="M117" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula>. However, for longer sample sizes and/or smaller <inline-formula><mml:math id="M118" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula>, the memory savings generated are substantial.</p>
      <p id="d2e2285">In Sects. <xref ref-type="sec" rid="Ch1.S6.SS2"/> and <xref ref-type="sec" rid="Ch1.S6.SS3"/>, we examine the use of the <inline-formula><mml:math id="M119" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest algorithm with two case studies: wind energy in Sect. <xref ref-type="sec" rid="Ch1.S6.SS2"/> and extreme precipitation events in Sect. <xref ref-type="sec" rid="Ch1.S6.SS3"/>. These two examples have been chosen to examine how well the <inline-formula><mml:math id="M120" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest algorithm represents (a) a more normally distributed variable around the median percentiles and (b) extreme events at the tails of a heavily skewed distribution. Comparisons are made against the conventional method, calculated with NumPy, which has access to the full dataset and does not rely on one-pass methods. For the <inline-formula><mml:math id="M121" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest method, we used <inline-formula><mml:math id="M122" display="inline"><mml:mrow><mml:mi>w</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> in both examples, meaning each <inline-formula><mml:math id="M123" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> was added to its respective digest consecutively, to simulate data streaming.</p>
</sec>
<sec id="Ch1.S6.SS2">
  <label>6.2</label><title>Wind energy</title>
      <p id="d2e2349">We present here the application of the <inline-formula><mml:math id="M124" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest in the context of wind energy. With the decarbonization of the energy system turning into a global necessity, renewable energies such as solar and wind are becoming major contributors to the power network <xref ref-type="bibr" rid="bib1.bibx25" id="paren.30"/>. However, unlike with fossil fuels, wind energy production is heavily affected by atmospheric conditions, subject to both short-term variability (i.e. weather) and longer-term variations caused by seasonal and/or interannual variability <xref ref-type="bibr" rid="bib1.bibx16 bib1.bibx49" id="paren.31"/>. This volatility makes the integration of wind energy into the power network a challenging task <xref ref-type="bibr" rid="bib1.bibx27" id="paren.32"/>.</p>
      <p id="d2e2368">Having access to histograms of wind speed from high-frequency data (i.e. at hourly or sub-hourly scale) and at hub height (i.e. height of the turbine rotor) is among the requirements of wind farm operators and stakeholders to estimate the available wind resources at a particular location. This information can be combined with the power curve of the turbines installed at each location to give reasonable estimates of energy production over a period of time. Obtaining an accurate representation of the wind distribution is therefore crucial, as the distributions condense the information from climate simulations required by the wind energy industry and aid in both the understanding of future output from current farms and in the decision-making relating to the viability of a proposed wind farm location <xref ref-type="bibr" rid="bib1.bibx32" id="paren.33"/>. Currently, there are two main methods for describing wind speed distributions: through full histograms of time series data or through fitting probability distribution functions to the data <xref ref-type="bibr" rid="bib1.bibx38 bib1.bibx48" id="paren.34"/>. While the non-parametric approach (time series) generally outperforms the parametric (statistical) one in accurately characterizing the distribution <xref ref-type="bibr" rid="bib1.bibx53" id="paren.35"/>, it poses numerous challenges attributable to the large amounts of data required <xref ref-type="bibr" rid="bib1.bibx48" id="paren.36"/>.</p>
      <p id="d2e2383">We investigate the use of the <inline-formula><mml:math id="M125" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest algorithm to estimate the wind speed distribution from streamed climate data. We again use data from the ECMWF's IFS model (tco2559-ng5-cycle3 experiment), this time looking at the 10 m wind speeds over December 2020. We again use the hourly data at native model resolution (<inline-formula><mml:math id="M126" display="inline"><mml:mo lspace="0mm">∼</mml:mo></mml:math></inline-formula> 0.04°), resulting in a global map containing approximately 26.31 million spatial grid cells, 744 time steps and a full size of 145.82 GB with double precision (float64). However, for plotting convenience and storing in Zenodo <xref ref-type="bibr" rid="bib1.bibx18" id="paren.37"/>, the data have been spatially regridded to 1°. Wind speed is calculated from the root of the sum of the squares of the 10 m hourly mean zonal and meridional components. We conduct a detailed analysis of two locations, the offshore Moray East wind farm, located at (58.25° N, 2.75° E) in the North Sea, and the onshore Roscoe wind farm, positioned at (32.35° N, 100.45° W) in Texas. Both are marked on the global map in Fig. <xref ref-type="fig" rid="F4"/>a in red and pink, respectively.</p>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e2408"><bold>(a)</bold> Global map showing the absolute mean difference between the <inline-formula><mml:math id="M127" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest (<inline-formula><mml:math id="M128" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">60</mml:mn></mml:mrow></mml:math></inline-formula>) and NumPy (using the linear interpolation method) estimate of all wind speed percentiles from 1 to 100 given as a percentage of the NumPy value. Wind speed hourly data in December 2020 from the IFS model. The red dot marks the offshore Moray East wind farm in the North Sea (58.25° N, 2.75° E) and the pink dot the onshore wind farm in Roscoe, Texas (32.35° N, 100.45° W). <bold>(b)</bold> Quantile–quantile plot for percentiles 1 to 100 comparing NumPy and the <inline-formula><mml:math id="M129" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest algorithm (with <inline-formula><mml:math id="M130" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">60</mml:mn></mml:mrow></mml:math></inline-formula>) for Moray East. The dashed black line represents the one-to-one fit, while the shaded grey area shows the range of wind speeds that most commonly used turbines operate in <xref ref-type="bibr" rid="bib1.bibx32" id="text.38"/>. <bold>(c)</bold> Same as <bold>(b)</bold> but for the Roscoe wind farm. <bold>(e, h)</bold> Histograms of the Moray East time series, <bold>(e)</bold> calculated with the <inline-formula><mml:math id="M131" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest (<inline-formula><mml:math id="M132" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">60</mml:mn></mml:mrow></mml:math></inline-formula>) and <bold>(h)</bold> NumPy. <bold>(f, i)</bold> Same as <bold>(e)</bold>, <bold>(h)</bold> but for the Roscoe time series. <bold>(d)</bold> The difference between the <inline-formula><mml:math id="M133" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest and NumPy calculation of the 50th percentile, given as a percentage of the NumPy value, as a function of <inline-formula><mml:math id="M134" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> for both marked locations. The error bars show the possible differences when employing all available NumPy interpolation schemes rather than the default linear interpolation method. <bold>(g)</bold> The same as <bold>(d)</bold> but for the 80th percentile.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/18/5873/2025/gmd-18-5873-2025-f04.png"/>

        </fig>

      <p id="d2e2532">Figure <xref ref-type="fig" rid="F4"/> shows a detailed comparison between how the <inline-formula><mml:math id="M135" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest and the conventional method describe typical wind speed distributions. Figure <xref ref-type="fig" rid="F4"/>b and c show quantile–quantile plots for the NumPy percentile estimates against the <inline-formula><mml:math id="M136" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest estimates (using <inline-formula><mml:math id="M137" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">60</mml:mn></mml:mrow></mml:math></inline-formula>) for all percentiles ranging from 1 to 100 for (b) the Moray East wind farm time series and (c) the Roscoe time series. The shaded grey areas indicate the range of wind speeds in which most commonly used turbine classes operate <xref ref-type="bibr" rid="bib1.bibx32" id="paren.39"/>. For the offshore Moray East, the extreme maximum percentiles lie outside the grey shaded region, but the lower tail is within it, whereas for the on-shore Roscoe farm, the opposite is true. This shows that an accurate representation of the full wind speed distribution is required in order to cover the typical range of wind speeds relevant to wind farms. Both (b) and (c) show an almost perfect linear relationship, evidencing that the <inline-formula><mml:math id="M138" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest method provides the same level of accuracy as NumPy for typical wind speed distributions. This is further shown in (e), (f), (h) and (i), where the histograms of the distributions for the two locations are given using the <inline-formula><mml:math id="M139" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest method (e,f) and NumPy (h,i). The difference between the distributions provided by the two methods is minimal, with their main difference lying in the storage requirements. For the NumPy estimate, 5.95 kB is needed to store the full 744 value time series of one grid cell, compared to the 1.28 kB storage for the <inline-formula><mml:math id="M140" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest estimate based on 80 clusters (<inline-formula><mml:math id="M141" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">60</mml:mn></mml:mrow></mml:math></inline-formula>).</p>
      <p id="d2e2602">To ensure that the minor differences shown in Fig. <xref ref-type="fig" rid="F4"/>b and c are not specific to the two chosen locations, we calculated the absolute mean percentile difference (average across all percentile estimates from 1 to 100, i.e. over the quantile–quantile plots) for every global grid cell for <inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">60</mml:mn></mml:mrow></mml:math></inline-formula>, the results of which are shown in Fig. <xref ref-type="fig" rid="F4"/>a. In this global map, no difference exceeds 0.9 % of the NumPy value. Converting to actual terms, this translates to no absolute mean difference exceeding 0.068 m s<sup>−1</sup> across the globe. Taking the global spatial average of these mean differences gives 0.020 m s<sup>−1</sup>.</p>
      <p id="d2e2645">Given such a high level of accuracy achieved with <inline-formula><mml:math id="M145" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">60</mml:mn></mml:mrow></mml:math></inline-formula>, we further investigate the effects of compression in Fig. <xref ref-type="fig" rid="F4"/>d and g. For the same two locations, the difference in the estimate of the 50th and 80th percentiles between the two methods is shown as a function of <inline-formula><mml:math id="M146" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula>. The difference is represented as a percentage of the NumPy percentile estimate, calculated using the default linear interpolation method. The error bars represent the range of possible differences between the <inline-formula><mml:math id="M147" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest and the NumPy estimates obtained from using the eight different interpolation schemes available in NumPy. The difference between the interpolation schemes, outlined in <xref ref-type="bibr" rid="bib1.bibx23" id="text.40"/>, will be larger for a given percentile estimate if the data points are more sparsely distributed in the original dataset. Therefore, when any type of interpolation is required to estimate a percentile, there will always be a range of possible values depending on the interpolation method chosen. We see that the difference for the Roscoe wind farm in Texas, while incredibly small, is slightly higher for both the 50th and 80th percentiles. This is due to the shape of the two distributions, as evident in the histograms in (f) and (i). Although the Moray East dataset has a larger variance, it more closely resembles a normal distribution, whereas the dataset for Roscoe is more uniform, with the peak skewed to the left. The shape of the scale function given in Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>) will result in clusters with the lowest weight representing the distribution tails, while the clusters representing the middle of the distribution will have a larger weight. This is a clever characteristic of the algorithm, as due to the bulk of the data in a normal distribution being centred around the median, these middle clusters can afford to be larger and cover a broader range of values without impacting precision. As the data from the Roscoe site slightly deviate from this normal distribution, a small increase in the difference is observed.</p>
      <p id="d2e2682">However, despite this perceived higher difference for the Roscoe wind farm in Texas, the absolute maximum differences for both the 50th and 80th percentiles are approximately 2 % and 0.6 %, respectively, for the lowest compression factor, further decreasing as we increase <inline-formula><mml:math id="M148" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula>. In actual terms, the maximum differences (for <inline-formula><mml:math id="M149" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">20</mml:mn></mml:mrow></mml:math></inline-formula>) are 0.075 and 0.1 m s<sup>−1</sup>, respectively, errors which would be considered negligible for the end users. Moreover, while the 50th-percentile estimate at the Roscoe wind farm has the largest differences across all compression factors (light-pink data in (d)), it also has the largest error bars, indicating that the conventional method also contains greater uncertainty. This highlights that the different interpolation methods used by NumPy have a greater impact on the given result due to the sparser data, also explaining why the discrepancy between the one-pass and conventional methods is higher for this percentile estimate. We also observe an asymptote in the differences around <inline-formula><mml:math id="M151" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">60</mml:mn></mml:mrow></mml:math></inline-formula> in Fig. <xref ref-type="fig" rid="F4"/>d and g, showing that further increasing the compression parameter would not significantly enhance the accuracy of the results for these wind speed distributions. Indeed, given the extremely small calculated difference, using a <inline-formula><mml:math id="M152" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo></mml:mrow></mml:math></inline-formula> 40 would likely be sufficient to capture the distribution of global wind speed data required for users.</p>
      <p id="d2e2741">Overall, as wind speed is best described by a bi-modal Weibull distribution <xref ref-type="bibr" rid="bib1.bibx38" id="paren.41"/>, for monthly wind speed data at hourly time steps, the <inline-formula><mml:math id="M153" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest with <inline-formula><mml:math id="M154" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">60</mml:mn></mml:mrow></mml:math></inline-formula> is more than suitable to fully represent the overall distribution while reducing the overall size of the global monthly data from 145.82  to <inline-formula><mml:math id="M155" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 33.2 GB. At the grid cell level, the monthly time series is compressed from 5.9 to 1.2 kB. For <inline-formula><mml:math id="M156" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">40</mml:mn></mml:mrow></mml:math></inline-formula>, this would further reduce to 0.85 kB. We further note that if we were interested in time series longer than the month shown here – for example, annually – the hourly time series for one grid cell would require 8760 values (<inline-formula><mml:math id="M157" display="inline"><mml:mo lspace="0mm">∼</mml:mo></mml:math></inline-formula> 70 kB), while its representation with the <inline-formula><mml:math id="M158" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest would still only require 1.2 kB. In contrast, if our interest were in weekly datasets with a time step of 1 h, containing only 168 values (<inline-formula><mml:math id="M159" display="inline"><mml:mo lspace="0mm">∼</mml:mo></mml:math></inline-formula> 1.3 kB), using <inline-formula><mml:math id="M160" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">60</mml:mn></mml:mrow></mml:math></inline-formula> would still require 1.2 kB. Although no significant memory savings would be obtained, the histograms could still be provided in real time to the users due to the data streaming.</p>
</sec>
<sec id="Ch1.S6.SS3">
  <label>6.3</label><title>Precipitation</title>
      <p id="d2e2827">In the following section we focus on the <inline-formula><mml:math id="M161" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest algorithm in the context of extreme precipitation events. It is necessary to examine these extreme events, such as intense rainfall and potential flood risk, as they pose great social, economic and environmental threats. Both theory and evidence are showing that anthropogenic climate change is increasing the risk of such extreme events, especially in areas with high moisture availability and during tropical monsoon seasons <xref ref-type="bibr" rid="bib1.bibx14 bib1.bibx52 bib1.bibx10 bib1.bibx3" id="paren.42"/>. The need for climate adaptation measures in vulnerable communities exposed to these risks is pressing, and, as with the other use cases in this paper, an accurate representation of the hazard is essential. Consequently, our focus here is on assessing how accurately the <inline-formula><mml:math id="M162" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest algorithm captures extreme events associated with the upper tail of precipitation distributions. In this analysis, we use data from the ICOsahedral Non-hydrostatic (ICON) model <xref ref-type="bibr" rid="bib1.bibx26 bib1.bibx21" id="paren.43"/> (ngc2009 experiment), looking at precipitation over August 2021. We use half-hourly data using the Healpix spatial grid <xref ref-type="bibr" rid="bib1.bibx15" id="paren.44"/> (<inline-formula><mml:math id="M163" display="inline"><mml:mo lspace="0mm">∼</mml:mo></mml:math></inline-formula> 0.04°) containing 20.9 million grid cells. The full monthly dataset for this variable, containing 1488 time steps, requires 116.25 GB of memory using single precision (float32). As with Sect. <xref ref-type="sec" rid="Ch1.S6.SS2"/>, for plotting convenience and storing in Zenodo <xref ref-type="bibr" rid="bib1.bibx18" id="paren.45"/>, the data have been spatially regridded to 1°.</p>
      <p id="d2e2866">Figure <xref ref-type="fig" rid="F5"/>a–c illustrates the comparison between NumPy and the <inline-formula><mml:math id="M164" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest in their estimates of the 99th percentile of a precipitation distribution. We focus on four specific locations, each characterized by different distributions, with the locations in Brazil and the North Pacific specifically chosen to highlight the areas of largest discrepancy between the one-pass and conventional methods. The histograms in Fig. <xref ref-type="fig" rid="F5"/>d–k show the full distributions for the four locations, with (d–e) created using the <inline-formula><mml:math id="M165" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest and (h–k) using NumPy. A common theme among precipitation distributions is that they are heavily skewed, with the majority of the data falling around zero when there is very little to no precipitation. The dark-red histograms in (d, h) are for the location in Brazil and show an extremely dry month with almost all of the values in the 0–1 mm d<sup>−1</sup> bin (notice the logarithmic scale). In contrast, the location in Columbia (e, i) reveals heavy precipitation over the month with maximum values above 400 mm d<sup>−1</sup> and a more even spread across the distribution. Despite the ranges in maximum values, all the <inline-formula><mml:math id="M168" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest histograms show a non-integer number of samples in some of the histogram bins, whereas the NumPy counterparts show integer density values, representing the exact values in the distribution. These non-integer values are due to a weighted contribution from the clusters to the histogram.</p>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e2921"><bold>(a)</bold> The global map shows the absolute difference between the <inline-formula><mml:math id="M169" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest (<inline-formula><mml:math id="M170" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula>) and NumPy (using the linear interpolation method) estimate of the 99th precipitation percentile given as a percentage of the NumPy value (see text for details). Half-hourly precipitation data from the ICON model (ngc2009 experiment) in August 2021 are used in each panel. The location of the four solid pink circles is given in the legend, and the following panels use the same colours to indicate the locations. <bold>(b)</bold> The absolute difference between the NumPy and <inline-formula><mml:math id="M171" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest 99th-percentile estimate, given as a percentage of the NumPy estimate, shown as a function of compression for the four marked locations in <bold>(a)</bold>. The upper grey axis shows the number of clusters used in each digest for each <inline-formula><mml:math id="M172" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula>. The error bars represent the range of possible differences based on all available NumPy interpolation schemes, in contrast to the default linear interpolation method (see text for details). <bold>(c)</bold> Same as <bold>(b)</bold> but given as the actual difference in mm d<sup>−1</sup>. <bold>(d–g)</bold> Histograms calculated using the <inline-formula><mml:math id="M174" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest algorithm, with <inline-formula><mml:math id="M175" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula> showing the distribution of the total precipitation (mm d<sup>−1</sup>) for each location. <bold>(h–k)</bold> Same as <bold>(d)</bold>–<bold>(g)</bold> but calculated using NumPy.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/18/5873/2025/gmd-18-5873-2025-f05.png"/>

        </fig>

      <p id="d2e3035">Figure <xref ref-type="fig" rid="F5"/>c shows the difference between the NumPy and <inline-formula><mml:math id="M177" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest estimate of the 99th percentile for the total precipitation as a function of <inline-formula><mml:math id="M178" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula>. The corresponding number of clusters for each <inline-formula><mml:math id="M179" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> is indicated in grey along the upper axis. Here the North Pacific location shows the greatest difference in the 99th-percentile estimate. The reason for this is clear when examining the corresponding histograms for the <inline-formula><mml:math id="M180" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest (f) and NumPy (j). Focusing on the NumPy histogram (the actual distribution) at the upper end, the data are sparse, with only four values in the top quartile of the data range. Due to this sparseness, the 99th-percentile estimate falls in between two data points, so the interpolation method used by NumPy significantly impacts the estimate. This is reflected in the error bars for the North Pacific location in Fig. <xref ref-type="fig" rid="F5"/>c. While the difference between the <inline-formula><mml:math id="M181" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest and the NumPy estimate using linear interpolation is large (dots), other NumPy interpolation schemes result in a negative difference (not shown due to the logarithmic scale), showing that the <inline-formula><mml:math id="M182" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest estimate lies in between the different estimates obtained with the available NumPy interpolation schemes. These larger differences can therefore be better attributed to the low density of values at the upper tails as opposed to poor representation of these tails by the <inline-formula><mml:math id="M183" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest. While the location in Colombia also has a high range of values, as there are more samples in the upper tail of the distribution, the <inline-formula><mml:math id="M184" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest and NumPy estimates are more similar. As seen in Sect. <xref ref-type="sec" rid="Ch1.S6.SS2"/>, there is also a decrease in differences as <inline-formula><mml:math id="M185" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> increases.</p>
      <p id="d2e3109">Figure <xref ref-type="fig" rid="F5"/>b shows the same differences presented in Fig. <xref ref-type="fig" rid="F5"/>c but given as an absolute percentage of the NumPy estimate. Here, the largest error is seen for the location in Brazil. The reason for this large error is similar to the results in (c), where the sparseness of the data in other bins greater than 0–1 mm d<sup>−1</sup> is obvious in the NumPy histogram (h). However, as this area is so dry and the distribution so skewed, around 99 % of the data lie in the first bin, and the 99th-percentile estimate is 0.06 and 1.02 mm d<sup>−1</sup> for NumPy and the <inline-formula><mml:math id="M188" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest (<inline-formula><mml:math id="M189" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula>), respectively. This highlights the challenges of working with precipitation values below 1, as percentage errors can show unrealistically poor results. To account for these unrealistically poor estimates we calculated the percentage difference for both Fig. <xref ref-type="fig" rid="F5"/>a and b using <inline-formula><mml:math id="M190" display="inline"><mml:mrow><mml:mtext>error</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn><mml:mo>|</mml:mo><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>-</mml:mo><mml:mi>b</mml:mi><mml:mo>)</mml:mo><mml:mo>|</mml:mo><mml:mo>/</mml:mo><mml:mo>(</mml:mo><mml:mi>b</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M191" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> is the <inline-formula><mml:math id="M192" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest estimate, <inline-formula><mml:math id="M193" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> is the NumPy estimate using the linear interpolation (other than the error bars in Fig. <xref ref-type="fig" rid="F5"/>b), and <inline-formula><mml:math id="M194" display="inline"><mml:mrow><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>. This small constant <inline-formula><mml:math id="M195" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> is introduced to stabilize the calculation when <inline-formula><mml:math id="M196" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> is extremely small. The results are shown globally (using <inline-formula><mml:math id="M197" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula>) in Fig. <xref ref-type="fig" rid="F5"/>a. Most of the differences fall between 1 % and 10 % of the NumPy value. However, due to the reasons just described, drier regions show larger percentage errors. Indeed, larger values are seen around the Saharan desert and in regions of eastern Australia. The average of the global differences is shown in Table <xref ref-type="table" rid="T1"/> as a function of <inline-formula><mml:math id="M198" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula>, given as both the percentile estimate and the absolute value. We have included this table to highlight that, due to the extremely low precipitation values for some of the estimates, a higher percentage difference does not necessarily translate to a higher difference in actual terms. Even with a relatively small <inline-formula><mml:math id="M199" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> of 40, the overall relative difference of the 99th percentile (when averaged globally) is less than 4 % (difference of 1.27 mm d<sup>−1</sup>). Both of these differences decrease to less than 2 % and 0.60 mm d<sup>−1</sup> as <inline-formula><mml:math id="M202" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> increases to 120.</p>

<table-wrap id="T1"><label>Table 1</label><caption><p id="d2e3316">Global average of the 99th percentile difference for different compression values, given as both the absolute difference in (mm d<sup>−1</sup>) and as a percentage.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M204" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M205" display="inline"><mml:mrow><mml:mo>|</mml:mo><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>-</mml:mo><mml:mi>b</mml:mi><mml:mo>)</mml:mo><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M206" display="inline"><mml:mrow><mml:mn mathvariant="normal">100</mml:mn><mml:mo>|</mml:mo><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>-</mml:mo><mml:mi>b</mml:mi><mml:mo>)</mml:mo><mml:mo>|</mml:mo><mml:mo>/</mml:mo><mml:mo>(</mml:mo><mml:mi>b</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">(mm d<sup>−1</sup>)</oasis:entry>
         <oasis:entry colname="col3">(%)</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">40</oasis:entry>
         <oasis:entry colname="col2">1.27</oasis:entry>
         <oasis:entry colname="col3">3.77</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">60</oasis:entry>
         <oasis:entry colname="col2">0.91</oasis:entry>
         <oasis:entry colname="col3">2.63</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">80</oasis:entry>
         <oasis:entry colname="col2">0.75</oasis:entry>
         <oasis:entry colname="col3">2.14</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">100</oasis:entry>
         <oasis:entry colname="col2">0.65</oasis:entry>
         <oasis:entry colname="col3">1.86</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">120</oasis:entry>
         <oasis:entry colname="col2">0.60</oasis:entry>
         <oasis:entry colname="col3">1.67</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e3499">Based on the results from Table <xref ref-type="table" rid="T1"/> and  Fig. <xref ref-type="fig" rid="F5"/>b and c we see a reduction in difference with increasing compression that asymptotes around <inline-formula><mml:math id="M208" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>≈</mml:mo></mml:mrow></mml:math></inline-formula> 80. This is higher than the asymptotic value seen in Fig. <xref ref-type="fig" rid="F4"/>d and g of approximately <inline-formula><mml:math id="M209" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>≈</mml:mo></mml:mrow></mml:math></inline-formula> 60. In general, there are no significant accuracy improvements from using a <inline-formula><mml:math id="M210" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> more than approximately 80 (100 clusters) to represent the precipitation distributions, and we would not recommend exceeding this compression factor. Indeed, depending on the specific accuracy requirements, it may be beneficial to reduce this even further. Based on results from <inline-formula><mml:math id="M211" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula>, the entire dataset of 116.25 GB could be represented with 36.57 GB, reducing the memory requirements by approximately a third.</p>
      <p id="d2e3548">One interesting point to raise is how the scale function for the <inline-formula><mml:math id="M212" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest algorithm, given in Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>), impacts these results. While the differences obtained for the precipitation distributions are well within the acceptable limits for most use cases, comparing them with the results in Sect. <xref ref-type="sec" rid="Ch1.S6.SS2"/>, we see poorer accuracy. This is due to the wind distributions more closely resembling a normal distribution, which is the distribution that the symmetric scale function describes best. While outside the scope of this investigation, we note that to better represent these skewed precipitation distributions, a non-symmetric scale function that would create larger clusters at the lower tail may more accurately capture the underlying distribution. Another method to improve the representation of the dataset would be to simply impose a cut-off (such as 1 mm d<sup>−1</sup>) for the data that are added to the digests. Removing this extremely large cluster close to zero would, in many cases, improve the representation of the data by the <inline-formula><mml:math id="M214" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest.</p>
</sec>
</sec>
<sec id="Ch1.S7">
  <label>7</label><title>Convergence</title>
      <p id="d2e3590">While earlier sections focused on the accuracy and efficiency of one-pass algorithms for computing statistics, the broader goal of the One_Pass package is to support flexible, real-time analysis of streamed climate data. Beyond statistics, it also integrates with a companion Python package (not detailed here) designed for performing bias adjustment on streamed model output. Typically, bias adjustment uses quantile–quantile (Q–Q) mapping <xref ref-type="bibr" rid="bib1.bibx31" id="paren.46"/> to correct statistical biases. The <inline-formula><mml:math id="M215" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest algorithm is used to construct a dynamic distribution that enables the bias adjustment on streamed data. However, Q–Q mapping requires a stable and well-sampled distribution (<inline-formula><mml:math id="M216" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) of the model variable to function effectively. How to determine this leads to the vital question of how long the data stream needs to run for the statistic summary <inline-formula><mml:math id="M217" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to stabilize. We determine this based on the statistical summary's convergence rate <xref ref-type="bibr" rid="bib1.bibx17" id="paren.47"/>.</p>
      <p id="d2e3628">The convergence rate is used in numerical analysis to determine how long a sequence of computations needs to run before reaching asymptotic behaviour.  In the context of streamed climate data, this means that the statistic is not only representative of the data seen so far but also stable enough to reflect long-term behaviour. Taking the mean temperature as an example, at the beginning of a time series, the mean may change significantly with each new data point, but as more data accumulate, the changes diminish, signalling that convergence has been reached. Our intent is not to suggest that climate variables become stationary or stop evolving; rather, we aim to determine when the rolling estimate of a one-pass statistic summary <inline-formula><mml:math id="M218" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> has stabilized sufficiently to be used confidently for subsequent statistical calculations such as bias adjustment. Moreover, this analysis does not contradict the earlier benefits of one-pass methods (e.g. reduced memory usage, early access to useful summaries). Rather, it offers guidance on how the rolling summaries can be used when <inline-formula><mml:math id="M219" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>≠</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:math></inline-formula>  and how soon those summaries can be used for bias adjustment or similar post-processing steps.</p>
      <p id="d2e3654">This concept of convergence is explored by examining the rolling summary <inline-formula><mml:math id="M220" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> for 2 m temperature, 10 m wind speed, and precipitation from the IFS and ICON models. The temperature and wind speed datasets are the same IFS datasets used in Sects. <xref ref-type="sec" rid="Ch1.S4"/> and <xref ref-type="sec" rid="Ch1.S6.SS2"/>, respectively, and the precipitation dataset consists of the same ICON data used in Sect. <xref ref-type="sec" rid="Ch1.S6.SS3"/>. For both temperature and wind speed the rolling summary <inline-formula><mml:math id="M221" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, while for precipitation <inline-formula><mml:math id="M222" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the rolling 50th-percentile estimate. For all rolling summaries <inline-formula><mml:math id="M223" display="inline"><mml:mrow><mml:mi>w</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, meaning that the number of samples, <inline-formula><mml:math id="M224" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>, contributing to <inline-formula><mml:math id="M225" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> grows by one each time. Unlike in the previous sections, where we were interested in the value of <inline-formula><mml:math id="M226" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> at <inline-formula><mml:math id="M227" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:math></inline-formula>, here <inline-formula><mml:math id="M228" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is stored at every time step, providing a time series of its development denoted as <inline-formula><mml:math id="M229" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>. The rolling standard deviation (<inline-formula><mml:math id="M230" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula>) is then taken of <inline-formula><mml:math id="M231" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> using Eqs. (<xref ref-type="disp-formula" rid="Ch1.E4"/>) and (<xref ref-type="disp-formula" rid="Ch1.E5"/>). This results in a time series of <inline-formula><mml:math id="M232" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula> defined as <inline-formula><mml:math id="M233" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, where, for example, <inline-formula><mml:math id="M234" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the standard deviation of the series <inline-formula><mml:math id="M235" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d2e3927">The outer axis in Fig. <xref ref-type="fig" rid="F6"/> shows the evolution of the running standard deviation, <inline-formula><mml:math id="M236" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, over time using the temperature data at four locations marked in the legend. As expected, Fig. <xref ref-type="fig" rid="F6"/> shows how the series <inline-formula><mml:math id="M237" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> initially fluctuates before settling into a gradually stabilizing trajectory, resembling inverse exponential decay when plotted over time. This reflects how the summary statistic <inline-formula><mml:math id="M238" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (be it a mean or a distribution) becomes more stable as additional samples are incorporated. While the figure shows results for temperature, we observe similar behaviour for precipitation and wind speed data, differing only in peak values. We then use the classical definition of convergence rate <xref ref-type="bibr" rid="bib1.bibx17" id="paren.48"/>

          <disp-formula id="Ch1.E7" content-type="numbered"><label>7</label><mml:math id="M239" display="block"><mml:mrow><mml:munder><mml:mo movablelimits="false">lim⁡</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>→</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:munder><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mi>L</mml:mi><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mi>L</mml:mi><mml:msup><mml:mo>|</mml:mo><mml:mi>q</mml:mi></mml:msup></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>=</mml:mo><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        where <inline-formula><mml:math id="M240" display="inline"><mml:mi>q</mml:mi></mml:math></inline-formula> is the order of convergence, <inline-formula><mml:math id="M241" display="inline"><mml:mi>L</mml:mi></mml:math></inline-formula> is the convergence limit, <inline-formula><mml:math id="M242" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the standard deviation of the <inline-formula><mml:math id="M243" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> time series at time <inline-formula><mml:math id="M244" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M245" display="inline"><mml:mi mathvariant="italic">μ</mml:mi></mml:math></inline-formula> is the convergence rate. Taking <inline-formula><mml:math id="M246" display="inline"><mml:mrow><mml:mi>q</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M247" display="inline"><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> (see Appendix <xref ref-type="sec" rid="App1.Ch1.S1"/> for details on these values), Eq. (<xref ref-type="disp-formula" rid="Ch1.E7"/>) could be reduced to

          <disp-formula id="Ch1.E8" content-type="numbered"><label>8</label><mml:math id="M248" display="block"><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>|</mml:mo></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>=</mml:mo><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        where we have added the small parameter <inline-formula><mml:math id="M249" display="inline"><mml:mrow><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.005</mml:mn></mml:mrow></mml:math></inline-formula>, which defines the boundaries of convergence. The inner axis of Fig. <xref ref-type="fig" rid="F6"/> shows the time series of <inline-formula><mml:math id="M250" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>/</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> for the same locations, with <inline-formula><mml:math id="M251" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> marked by the dashed black lines.  We use this standard deviation ratio to define when the time series has “converged” –  that is, when a sufficient number of samples have accumulated such that <inline-formula><mml:math id="M252" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> serves as a reliable statistical summary of the statistic up to time <inline-formula><mml:math id="M253" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>.  We reiterate that we do not claim the statistic has reached a final or static value. Climate variables will exhibit temporal variability and long-term trends, and we do not assume that their statistical properties remain fixed over time. Rather, our focus is on when the variability in the estimate of <inline-formula><mml:math id="M254" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> itself has sufficiently diminished (i.e. when its fluctuations due to sample size limitations become negligible compared to inherent data variability). This distinction is critical when using one-pass algorithms; for example, we must determine when <inline-formula><mml:math id="M255" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is a reliable enough representation of the underlying distribution to apply Q–Q mapping for bias adjustment.</p>

      <fig id="F6"><label>Figure 6</label><caption><p id="d2e4247">The convergence of the series <inline-formula><mml:math id="M256" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> for the temperature dataset. The outer axis shows four <inline-formula><mml:math id="M257" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> series, where the location of the points is given in the legend. The inner axis shows the convergence rate given in Eq. (<xref ref-type="disp-formula" rid="Ch1.E8"/>).</p></caption>
        <graphic xlink:href="https://gmd.copernicus.org/articles/18/5873/2025/gmd-18-5873-2025-f06.png"/>

      </fig>

      <p id="d2e4280">In Fig. <xref ref-type="fig" rid="F7"/>a, c and e, we show the global standard deviation <inline-formula><mml:math id="M258" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> of <inline-formula><mml:math id="M259" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> at the end of the month for the air temperature, wind and precipitation data, respectively. Here <inline-formula><mml:math id="M260" display="inline"><mml:mrow><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">744</mml:mn></mml:mrow></mml:math></inline-formula> for the hourly temperature and wind speed data, while <inline-formula><mml:math id="M261" display="inline"><mml:mrow><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1488</mml:mn></mml:mrow></mml:math></inline-formula> for the half-hourly precipitation data. As expected, the standard deviations for all the variables are larger in areas that experience more climate variability. For example, the final standard deviation of the rolling temperature mean shown in (a) shows much larger values away from the Equator, where temperature averages will experience seasonal variation. Figure <xref ref-type="fig" rid="F7"/>b, d and f show the number of samples required for the standard deviation <inline-formula><mml:math id="M262" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> of these rolling summaries to converge, as defined in Eq. (<xref ref-type="disp-formula" rid="Ch1.E8"/>), when <inline-formula><mml:math id="M263" display="inline"><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:mo>/</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula> falls within the range <inline-formula><mml:math id="M264" display="inline"><mml:mrow><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>±</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi></mml:mrow></mml:math></inline-formula>.</p>

      <fig id="F7" specific-use="star"><label>Figure 7</label><caption><p id="d2e4392"><bold>(a)</bold> The standard deviation of the full <inline-formula><mml:math id="M265" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> time series at the end of the IFS monthly temperature time series during March 2020. Here <inline-formula><mml:math id="M266" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. <bold>(b)</bold> The number of samples required (i.e. number of time steps) for the rolling standard deviation of the <inline-formula><mml:math id="M267" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> time series to converge. Convergence is defined in the text. <bold>(c)</bold> Same as <bold>(a)</bold> but for the IFS monthly wind speed time series in December 2020. <bold>(d)</bold> Same as <bold>(b)</bold> but for wind speed. <bold>(e)</bold> Same as <bold>(a)</bold> and <bold>(c)</bold>, but here <inline-formula><mml:math id="M268" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the estimate of the 50th percentile using the ICON precipitation time series in August 2021. <bold>(f)</bold> Same as <bold>(b)</bold> and <bold>(d)</bold> but for precipitation.</p></caption>
        <graphic xlink:href="https://gmd.copernicus.org/articles/18/5873/2025/gmd-18-5873-2025-f07.png"/>

      </fig>

      <p id="d2e4492">Encouragingly, the convergence time does not have a strong correlation with the final standard deviation <inline-formula><mml:math id="M269" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> shown in the left column. This is reassuring, as it shows that convergence is partially independent of the actual spread of the data and allows conservative estimates of sample size to be used for global data. What is particularly noteworthy is that, across all these datasets, the number of samples required for convergence is extremely similar. Taking the mean value of Fig. <xref ref-type="fig" rid="F7"/>b, d and f results in 82, 77 and 81 samples, respectively. This is striking, especially given that Fig. <xref ref-type="fig" rid="F7"/>f represents the convergence of the 50th-percentile estimate, as opposed to the mean in Fig. <xref ref-type="fig" rid="F7"/>b and d. This indicates that, for hourly and half-hourly data, approximately 4 d is required to accurately represent the month. Overall, in this section we present how the one-pass algorithms provide added value in the context of bias adjustment for streamed climate data. We present criteria for stabilization that we use to define how many samples are required to be added to a rolling one-pass summary <inline-formula><mml:math id="M270" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> before that summary can be used as a representation of the whole distribution.</p>
</sec>
<sec id="Ch1.S8" sec-type="conclusions">
  <label>8</label><title>Conclusions</title>
      <p id="d2e4532">Within the climate modelling community, the generation of increasingly larger datasets from kilometre-scale GCMs is becoming almost inevitable. While there is a clear argument for the added value of these high-resolution models, new challenges of handling, storing and accessing the resultant data are emerging. One novel method being investigated within the DestinE project <xref ref-type="bibr" rid="bib1.bibx5 bib1.bibx20" id="paren.49"/> is data streaming, where climate variables at native model resolution are passed directly to downstream impact models in near-real model runtime. This article presents the algorithms behind the One_Pass (v0.8.0) package, designed to handle streamed climate data at the kilometre scale. The application of each algorithm has been demonstrated through relevant use cases in the context of climate change impact studies. We categorized the statistics into two different groups: a first group that can be represented by a single floating-point value and a second group which requires a distribution estimate. For the first group, which requires only a single value (e.g. mean, standard deviation, minimum, maximum, threshold exceedance), we obtain accuracy on the order of machine precision, well beyond the accuracy required or indeed provided by climate models themselves. While providing the same result as the conventional method, these algorithms allow the user to keep only a few rolling summaries in memory at any moment in time. Unlike a conventional statistic, where the memory requirements for computation scale linearly as the time series grows, the one-pass algorithms for these statistics provide an easily implemented, user-oriented method that bypasses these potentially unfeasible memory requirements.</p>
      <p id="d2e4538">For the statistics that require a representation of the distribution (e.g. percentiles and histograms), we applied the <inline-formula><mml:math id="M271" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest method, framed within relevant use cases. In Sect. <xref ref-type="sec" rid="Ch1.S6.SS2"/> we focus on wind, a variable which requires an accurate representation of the full distribution, useful in the context of renewable energy. We recommend using a compression factor of <inline-formula><mml:math id="M272" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">60</mml:mn></mml:mrow></mml:math></inline-formula> (approximately 80 clusters), where the mean absolute percentile differences for global monthly wind speeds did not exceed 0.9 % of the estimate given by the conventional method. For precipitation, given in Sect. <xref ref-type="sec" rid="Ch1.S6.SS3"/>, we focus on the extremes of skewed precipitation distributions. Due to the presence of low-frequency extreme events, there was more discrepancy between the NumPy and <inline-formula><mml:math id="M273" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest estimates. In the cases of high differences between the two estimates, there were also large error bars from the different interpolation schemes of NumPy. Examining these distributions showed that these higher differences were due to the sparseness of data in the distribution as opposed to poor representation from the <inline-formula><mml:math id="M274" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest. In the cases of high percentile differences, these were unrealistic differences that occurred due to division by extremely small numbers generated from the NumPy estimate and also occurred when precipitation fell around 0 to 1 mm d<sup>−1</sup>, negligible values in terms of the user interested in extreme rainfall events. Overall, when averaging the differences globally, we obtained (for <inline-formula><mml:math id="M276" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">60</mml:mn></mml:mrow></mml:math></inline-formula>) 2.63 % or 0.91 mm d <sup>−1</sup> in actual terms.  We noted three methods for improving the <inline-formula><mml:math id="M278" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest estimate for precipitation: (a) use a different scale function (not currently implemented in the One_Pass package);  (b) set a threshold for the data (i.e. ignore all samples below 1 mm d<sup>−1</sup>); (c) examine the <inline-formula><mml:math id="M280" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest histograms rather than a single-percentile estimate. We also recommended a slightly higher compression factor of <inline-formula><mml:math id="M281" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d2e4654">In Sect. <xref ref-type="sec" rid="Ch1.S7"/>, we present the concept of convergence for the one-pass statistical summaries <inline-formula><mml:math id="M282" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. While one-pass algorithms provide immediate access to evolving statistical summaries, their reliability for distribution-based applications like bias adjustment depends on the stability of the underlying estimates. Our convergence analysis offered a practical guideline for determining when these summaries can be considered representative of the full distribution. This does not imply that the climate variables themselves have stabilized but rather that the evolving summary has reached a usable approximation (i.e. a sufficiently representative estimate of the complete time series). By quantifying convergence in this way, we provide users with the tools to make informed decisions about when to trust and apply one-pass statistics in downstream statistical applications, such as quantile–quantile mapping.</p>
      <p id="d2e4670">Overall, we have demonstrated the effectiveness of one-pass algorithms on streamed climate data and provided their Python implementation in the One_Pass package, ready for use in data streaming workflows <xref ref-type="bibr" rid="bib1.bibx1" id="paren.50"/>. These algorithms not only provide accuracy well within the required limits of climate model variables but also empower users to harness the full potential of high-resolution data, in both space and time. Indeed, while some of the methods contain small errors (specifically the <inline-formula><mml:math id="M283" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest), we note that not harnessing the added value of high-resolution data due to storage limitations will also incur a potentially more significant error. Due to the fact that only a few rolling summaries are required to be kept in memory, these statistics become time-independent, allowing users dealing with high-resolution GCMs to select any variables at their native model resolution and process them in near-real model runtime. This has the potential to eliminate the constraints experienced by some users when relying on pre-defined archives of set climate variables, typically provided months to years after the simulations have been completed. With the continuing movement towards higher resolution, climate data streaming will become a fundamental paradigm of data processing and analysis. This paper showcases the features of one-pass algorithms across a range of relevant statistics that can be harnessed to work in the new era of data streaming.</p>
</sec>

      
      </body>
    <back><app-group>

<app id="App1.Ch1.S1">
  <label>Appendix A</label><title>Convergence order estimate</title>
      <p id="d2e4694">As both <inline-formula><mml:math id="M284" display="inline"><mml:mi>q</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M285" display="inline"><mml:mi mathvariant="italic">μ</mml:mi></mml:math></inline-formula> are unknown in Eq. (<xref ref-type="disp-formula" rid="Ch1.E7"/>) the order of convergence was estimated using

          <disp-formula id="App1.Ch1.S1.E9" content-type="numbered"><label>A1</label><mml:math id="M286" display="block"><mml:mrow><mml:mi>q</mml:mi><mml:mo>≈</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mi>log⁡</mml:mi><mml:mo mathsize="1.1em">|</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo mathsize="1.1em">|</mml:mo></mml:mrow><mml:mrow><mml:mi>log⁡</mml:mi><mml:mo mathsize="1.1em">|</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo mathsize="1.1em">|</mml:mo></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

        When Eq. (<xref ref-type="disp-formula" rid="App1.Ch1.S1.E9"/>) was calculated over <inline-formula><mml:math id="M287" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">σ</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> for a random selection of grid points, the approximation of <inline-formula><mml:math id="M288" display="inline"><mml:mi>q</mml:mi></mml:math></inline-formula> (for all variables) was centred around 1. This indicated a linear convergence rate for all the standard deviation time series over the global grid cells.</p>
</app>
  </app-group><notes notes-type="codeavailability"><title>Code availability</title>

      <p id="d2e4847">Version 0.5 of the repository that contains the Jupyter notebooks to reproduce all the figures using the one-pass package v0.8.0 is preserved under the DOI <ext-link xlink:href="https://doi.org/10.5281/zenodo.15439803" ext-link-type="DOI">10.5281/zenodo.15439803</ext-link> <xref ref-type="bibr" rid="bib1.bibx2" id="paren.51"/> and developed on GitHub at the URL <uri>https://github.com/kat-grayson/one_pass_algorithms_paper/tree/main</uri>, last access: 12 June 2025. The source code for the one-pass package v0.8.0 implementation, ready for integration into data streaming workflows, with DOI <ext-link xlink:href="https://doi.org/10.5281/zenodo.15438184" ext-link-type="DOI">10.5281/zenodo.15438184</ext-link> <xref ref-type="bibr" rid="bib1.bibx1" id="paren.52"/>, can be found at <uri>https://github.com/DestinE-Climate-DT/one_pass</uri>, last access: 12 June 2025, and is licensed under the Apache License, version 2.0.</p>
  </notes><notes notes-type="dataavailability"><title>Data availability</title>

      <p id="d2e4872">Full data from the nextGEMS cycle 3 are openly accessible and can be found at <uri>https://nextgems-h2020.eu/data-sets/</uri> <xref ref-type="bibr" rid="bib1.bibx40" id="paren.53"/>, including output from the development Cycle 3  (<ext-link xlink:href="https://doi.org/10.26050/WDCC/nextGEMS_cyc3" ext-link-type="DOI">10.26050/WDCC/nextGEMS_cyc3</ext-link>, <xref ref-type="bibr" rid="bib1.bibx30" id="altparen.54"/>) and production runs (<uri>https://www.wdc-climate.de/ui/entry?acronym=nextGEMS_prod</uri>, <xref ref-type="bibr" rid="bib1.bibx55" id="altparen.55"/>) for both ICON and IFS. All the nextGEMS netCDF data used to demonstrate the use of the one-pass algorithms described in this paper (e.g. looping through the time dimension to simulate data streaming) and to create all the figures in the study are available in the Zenodo repository “nextGEMS cycle3 datasets: statistical summaries for streamed data from climate simulations v3” with the DOI <ext-link xlink:href="https://doi.org/10.5281/zenodo.12533197" ext-link-type="DOI">10.5281/zenodo.12533197</ext-link>, with the Creative Commons Attribution 4.0 International license <xref ref-type="bibr" rid="bib1.bibx18" id="paren.56"/>.</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e4904">LL conceptualized the ideas behind the study, while KG and ST developed the methodology. KG wrote the One_Pass package, conducted the formal analysis and wrote the original draft. ST and FDR both supervised. ALN and ES both helped validate the study. IAF helped with revisions, the continued maintenance of the package and reformatting of the notebooks. All authors contributed to the review and editing process.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e4910">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e4916">Views and opinions expressed are those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the European Commission can be held responsible for them.  Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e4925">We would first like to acknowledge Bruno Kinoshita for his help and advice on the <inline-formula><mml:math id="M289" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>-digest algorithm. We would also like to acknowledge Paolo Davini and Jost Von Hardenberg for aiding with data retrieval. The work presented in this document has been produced in the context of the European Union Destination Earth Initiative and is related to the tasks assigned by the European Union to the European Centre for Medium-Range Weather Forecasts to implement part of this initiative, with funding from the European Union. We also acknowledge the EuroHPC Joint Undertaking (JU) for awarding this project access to the EuroHPC supercomputer LUMI and MareNostrum5 through a EuroHPC JU Special Access call.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e4937">This research has been supported by the Generalitat de Catalunya ClimCat programme with grant number ARD209/22/000001, the European Commission Horizon 2020 framework programme nextGEMS under grant number GA 101003470, and the GLORIA project with grant number TED2021-129543B-I00 funded by the Spanish MICIU/AEI and by the European Union NextGenerationEU/PRTR.</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e4943">This paper was edited by Po-Lun Ma and reviewed by Lucas Harris and one anonymous referee.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Alsina-Ferrer and Grayson(2025a)</label><mixed-citation>Alsina-Ferrer, I. and Grayson, K: DestinE-Climate-DT/one_pass: v0.8.0, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.15438184" ext-link-type="DOI">10.5281/zenodo.15438184</ext-link>, 2025a.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Alsina-Ferrer and Grayson(2025b)</label><mixed-citation>Alsina-Ferrer, I. and Grayson, K: kat-grayson/one_pass_algorithms_paper: v0.5.0, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.15439803" ext-link-type="DOI">10.5281/zenodo.15439803</ext-link>, 2025b.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Asadieh and Krakauer(2015)</label><mixed-citation>Asadieh, B. and Krakauer, N. Y.: Global trends in extreme precipitation: climate models versus observations, Hydrol. Earth Syst. Sci., 19, 877–891, <ext-link xlink:href="https://doi.org/10.5194/hess-19-877-2015" ext-link-type="DOI">10.5194/hess-19-877-2015</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Bador et al.(2020)Bador, Boé, Terray, Alexander, Baker, Bellucci, Haarsma, Koenigk, Moine, Lohmann, Putrasahan, Roberts, Roberts, Scoccimarro, Schiemann, Seddon, Senan, Valcke, and Vanniere</label><mixed-citation>Bador, M., Boé, J., Terray, L., Alexander, L. V., Baker, A., Bellucci, A., Haarsma, R., Koenigk, T., Moine, M. P., Lohmann, K., Putrasahan, D. A., Roberts, C., Roberts, M., Scoccimarro, E., Schiemann, R., Seddon, J., Senan, R., Valcke, S., and Vanniere, B.: Impact of Higher Spatial Atmospheric Resolution on Precipitation Extremes Over Land in Global Climate Models, J. Geophys. Res.-Atmos., 125, <ext-link xlink:href="https://doi.org/10.1029/2019JD032184" ext-link-type="DOI">10.1029/2019JD032184</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Bauer et al.(2021)Bauer, Stevens, and Hazeleger</label><mixed-citation>Bauer, P., Stevens, B., and Hazeleger, W.: A digital twin of Earth for the green transition, Nat. Clim. Change, 11, 80–83, <ext-link xlink:href="https://doi.org/10.1038/s41558-021-00986-y" ext-link-type="DOI">10.1038/s41558-021-00986-y</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Close et al.(2020)Close, Penduff, Speich, and Molines</label><mixed-citation>Close, S., Penduff, T., Speich, S., and Molines, J. M.: A means of estimating the intrinsic and atmospherically-forced contributions to sea surface height variability applied to altimetric observations, Prog. Oceanogr., 184, <ext-link xlink:href="https://doi.org/10.1016/j.pocean.2020.102314" ext-link-type="DOI">10.1016/j.pocean.2020.102314</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Crist-Harif(2023)</label><mixed-citation>Crist-Harif, J.: Crick, GitHub [code], <uri>https://github.com/dask/crick</uri> (last access: 10 May 2025), 2023.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Dask(2024)</label><mixed-citation>Dask: Dask, GitHub [code], <uri>https://github.com/dask/dask</uri> (last access: 10 May 2025), 2024.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>DestinationEarth(2024)</label><mixed-citation>DestinationEarth: <uri>https://destination-earth.eu/</uri> (last access: 17 April 2025), 2024.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Donat et al.(2016)Donat, Lowry, Alexander, O'Gorman, and Maher</label><mixed-citation>Donat, M. G., Lowry, A. L., Alexander, L. V., O'Gorman, P. A., and Maher, N.: More extreme precipitation in the world's dry and wet regions, Nat. Clim. Change, 6, 508–513, <ext-link xlink:href="https://doi.org/10.1038/nclimate2941" ext-link-type="DOI">10.1038/nclimate2941</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Dunning(2021)</label><mixed-citation>Dunning, T.: The t-digest: Efficient estimates of distributions, Software Impacts, 7, 100049, <ext-link xlink:href="https://doi.org/10.1016/j.simpa.2020.100049" ext-link-type="DOI">10.1016/j.simpa.2020.100049</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Dunning and Ertl(2019)</label><mixed-citation>Dunning, T. and Ertl, O.: Computing Extremely Accurate Quantiles Using t-Digests, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.1902.04023" ext-link-type="DOI">10.48550/arXiv.1902.04023</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>NOAA-GFDL(2007)</label><mixed-citation>NOAA-GFDL: FMS, GitHub [code], <uri>https://github.com/NOAA-GFDL/FMS/tree/main?tab=License-1-ov-file</uri> (last access: 10 May 2025), 2007.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Gimeno et al.(2022)Gimeno, Sorí, Vázquez, Stojanovic, Algarra, Eiras-Barca, Gimeno-Sotelo, and Nieto</label><mixed-citation>Gimeno, L., Sorí, R., Vázquez, M., Stojanovic, M., Algarra, I., Eiras-Barca, J., Gimeno-Sotelo, L., and Nieto, R.: Extreme precipitation events, Wires Water, 9, 1–21, <ext-link xlink:href="https://doi.org/10.1002/wat2.1611" ext-link-type="DOI">10.1002/wat2.1611</ext-link>, 2022. </mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Gorski et al.(2005)Gorski, Hivon, Banday, Wandelt, Hansen, Reinecke, and Bartelmann</label><mixed-citation>Gorski, K. M., Hivon, E., Banday, A. J., Wandelt, B. D., Hansen, F. K., Reinecke, M., and Bartelmann, M.: HEALPix: A Framework for High‐Resolution Discretization and Fast Analysis of Data Distributed on the Sphere, Astrophys. J., 622, 759–771, <ext-link xlink:href="https://doi.org/10.1086/427976" ext-link-type="DOI">10.1086/427976</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Grams et al.(2017)Grams, Beerli, Pfenninger, Staffell, and Wernli</label><mixed-citation>Grams, C. M., Beerli, R., Pfenninger, S., Staffell, I., and Wernli, H.: Balancing Europe's wind-power output through spatial deployment informed by weather regimes, Nat. Clim.e Change, 7, 557–562, <ext-link xlink:href="https://doi.org/10.1038/NCLIMATE3338" ext-link-type="DOI">10.1038/NCLIMATE3338</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Grau-Sánchez et al.(2010)Grau-Sánchez, Noguera, and Gutiérrez</label><mixed-citation>Grau-Sánchez, M., Noguera, M., and Gutiérrez, J. M.: On some computational orders of convergence, Appl. Math. Lett., 23, 472–478, <ext-link xlink:href="https://doi.org/10.1016/j.aml.2009.12.006" ext-link-type="DOI">10.1016/j.aml.2009.12.006</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Grayson(2024)</label><mixed-citation>Grayson, K.: nextGEMS cycle3 datasets: statistical summaries for streamed data from climate simulations (Version v3), Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.12533197" ext-link-type="DOI">10.5281/zenodo.12533197</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Harris et al.(2020)Harris, Millman, van der Walt, Gommers, Virtanen, Cournapeau, Wieser, Taylor, Berg, Smith, Kern, Picus, Hoyer, van Kerkwijk, Brett, Haldane, del Río, Wiebe, Peterson, Gérard-Marchant, Sheppard, Reddy, Weckesser, Abbasi, Gohlke, and Oliphant</label><mixed-citation>Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E.: Array programming with NumPy, Nature, 585, 357–362, <ext-link xlink:href="https://doi.org/10.1038/s41586-020-2649-2" ext-link-type="DOI">10.1038/s41586-020-2649-2</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Hoffmann et al.(2023)Hoffmann, Bauer, Sandu, Wedi, Geenen, and Thiemert</label><mixed-citation>Hoffmann, J., Bauer, P., Sandu, I., Wedi, N., Geenen, T., and Thiemert, D.: Destination Earth – A digital twin in support of climate services, Climate Services, 30, 100394, <ext-link xlink:href="https://doi.org/10.1016/j.cliser.2023.100394" ext-link-type="DOI">10.1016/j.cliser.2023.100394</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Hohenegger et al.(2023)</label><mixed-citation>Hohenegger, C., Korn, P., Linardakis, L., Redler, R., Schnur, R., Adamidis, P., Bao, J., Bastin, S., Behravesh, M., Bergemann, M., Biercamp, J., Bockelmann, H., Brokopf, R., Brüggemann, N., Casaroli, L., Chegini, F., Datseris, G., Esch, M., George, G., Giorgetta, M., Gutjahr, O., Haak, H., Hanke, M., Ilyina, T., Jahns, T., Jungclaus, J., Kern, M., Klocke, D., Kluft, L., Kölling, T., Kornblueh, L., Kosukhin, S., Kroll, C., Lee, J., Mauritsen, T., Mehlmann, C., Mieslinger, T., Naumann, A. K., Paccini, L., Peinado, A., Praturi, D. S., Putrasahan, D., Rast, S., Riddick, T., Roeber, N., Schmidt, H., Schulzweida, U., Schütte, F., Segura, H., Shevchenko, R., Singh, V., Specht, M., Stephan, C. C., von Storch, J.-S., Vogel, R., Wengel, C., Winkler, M., Ziemen, F., Marotzke, J., and Stevens, B.: ICON-Sapphire: simulating the components of the Earth system and their interactions at kilometer and subkilometer scales, Geosci. Model Dev., 16, 779–811, <ext-link xlink:href="https://doi.org/10.5194/gmd-16-779-2023" ext-link-type="DOI">10.5194/gmd-16-779-2023</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Hu et al.(2018)Hu, Yang, Schnase, Duffy, Xu, Bowen, Lee, and Song</label><mixed-citation>Hu, F., Yang, C., Schnase, J. L., Duffy, D. Q., Xu, M., Bowen, M. K., Lee, T., and Song, W.: ClimateSpark: An in-memory distributed computing framework for big climate data analytics, Comput. Geosci., 115, 154–166, <ext-link xlink:href="https://doi.org/10.1016/j.cageo.2018.03.011" ext-link-type="DOI">10.1016/j.cageo.2018.03.011</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Hyndman and Fan(1996)</label><mixed-citation>Hyndman, R. J. and Fan, Y.: Sample Quantiles in Statistical Packages, Am. Stat., 50, 361–365, <ext-link xlink:href="https://doi.org/10.1080/00031305.1996.10473566" ext-link-type="DOI">10.1080/00031305.1996.10473566</ext-link>, 1996.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Iles et al.(2020)Iles, Vautard, Strachan, Joussaume, Eggen, and Hewitt</label><mixed-citation>Iles, C. E., Vautard, R., Strachan, J., Joussaume, S., Eggen, B. R., and Hewitt, C. D.: The benefits of increasing resolution in global and regional climate simulations for European climate extremes, Geosci. Model Dev., 13, 5583–5607, <ext-link xlink:href="https://doi.org/10.5194/gmd-13-5583-2020" ext-link-type="DOI">10.5194/gmd-13-5583-2020</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Jansen et al.(2020)Jansen, Staffell, Kitzing, Quoilin, Wiggelinkhuizen, Bulder, Riepin, and Müsgens</label><mixed-citation>Jansen, M., Staffell, I., Kitzing, L., Quoilin, S., Wiggelinkhuizen, E., Bulder, B., Riepin, I., and Müsgens, F.: Offshore wind competitiveness in mature markets without subsidy, Nat. Energy, 5, 614–622, <ext-link xlink:href="https://doi.org/10.1038/s41560-020-0661-2" ext-link-type="DOI">10.1038/s41560-020-0661-2</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Jungclaus et al.(2022)</label><mixed-citation>Jungclaus, J. H., Lorenz, S. J., Schmidt, H., Brovkin, V., Brüggemann, N., Chegini, F., Crüger, T., De-Vrese, P., Gayler, V., Giorgetta, M. A., Gutjahr, O., Haak, H., Hagemann, S., Hanke, M., Ilyina, T., Korn, P., Kröger, J., Linardakis, L., Mehlmann, C., Mikolajewicz, U., Müller, W. A., Nabel, J. E., Notz, D., Pohlmann, H., Putrasahan, D. A., Raddatz, T., Ramme, L., Redler, R., Reick, C. H., Riddick, T., Sam, T., Schneck, R., Schnur, R., Schupfner, M., von Storch, J. S., Wachsmann, F., Wieners, K. H., Ziemen, F., Stevens, B., Marotzke, J., and Claussen, M.: The ICON Earth System Model Version 1.0, J. Adv. Model. Earth Sy., 14, <ext-link xlink:href="https://doi.org/10.1029/2021MS002813" ext-link-type="DOI">10.1029/2021MS002813</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Jurasz et al.(2022)Jurasz, Kies, and De Felice</label><mixed-citation>Jurasz, J., Kies, A., and De Felice, M.: Complementary behavior of solar and wind energy based on the reported data on the European level – a country-level analysis, Complementarity of Variable Renewable Energy Sources, 197–214, <ext-link xlink:href="https://doi.org/10.1016/B978-0-323-85527-3.00023-6" ext-link-type="DOI">10.1016/B978-0-323-85527-3.00023-6</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Katopodis et al.(2021)Katopodis, Markantonis, Vlachogiannis, Politi, and Sfetsos</label><mixed-citation>Katopodis, T., Markantonis, I., Vlachogiannis, D., Politi, N., and Sfetsos, A.: Assessing climate change impacts on wind characteristics in Greece through high resolution regional climate modelling, Renew. Energ., 179, 427–444, <ext-link xlink:href="https://doi.org/10.1016/j.renene.2021.07.061" ext-link-type="DOI">10.1016/j.renene.2021.07.061</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Kolajo et al.(2019)Kolajo, Daramola, and Adebiyi</label><mixed-citation>Kolajo, T., Daramola, O., and Adebiyi, A.: Big data stream analysis: a systematic literature review, J. Big Data, 6, <ext-link xlink:href="https://doi.org/10.1186/s40537-019-0210-7" ext-link-type="DOI">10.1186/s40537-019-0210-7</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Koldunov et al.(2023)Koldunov, Kölling, Pedruzo-Bagazgoitia, Rackow, Redler, Sidorenko, Wieners, and Ziemen</label><mixed-citation>Koldunov, N., Kölling, T., Pedruzo-Bagazgoitia, X., Rackow, T., Redler, R., Sidorenko, D., Wieners, K.-H., and Ziemen, F. A.: nextGEMS: output of the model development cycle 3 simulations for ICON and IFS. World Data Center for Climate (WDCC) at DKRZ [data set], <ext-link xlink:href="https://doi.org/10.26050/WDCC/nextGEMS_cyc3" ext-link-type="DOI">10.26050/WDCC/nextGEMS_cyc3</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Lange(2019)</label><mixed-citation>Lange, S.: Trend-preserving bias adjustment and statistical downscaling with ISIMIP3BASD (v1.0), Geosci. Model Dev., 12, 3055–3070, <ext-link xlink:href="https://doi.org/10.5194/gmd-12-3055-2019" ext-link-type="DOI">10.5194/gmd-12-3055-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Lledó et al.(2019)</label><mixed-citation>Lledó, L., Torralba, V., Soret, A., Ramon, J., and Doblas-Reyes, F. J.: Seasonal forecasts of wind power generation, Renew. Energ., 143, 91–100, <ext-link xlink:href="https://doi.org/10.1016/j.renene.2019.04.135" ext-link-type="DOI">10.1016/j.renene.2019.04.135</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Loveless et al.(2013)Loveless, Stoikov, and Waeber</label><mixed-citation>Loveless, J., Stoikov, S., and Waeber, R.: Online Algorithms in High-frequency Trading, Queue, 11, 30–41, <ext-link xlink:href="https://doi.org/10.1145/2523426.2534976" ext-link-type="DOI">10.1145/2523426.2534976</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Manubens-Gil et al.(2016)Manubens-Gil, Vegas-Regidor, Prodhomme, Mula-Valls, and Doblas-Reyes</label><mixed-citation>Manubens-Gil, D., Vegas-Regidor, J., Prodhomme, C., Mula-Valls, O., and Doblas-Reyes, F. J.: Seamless management of ensemble climate prediction experiments on HPC platforms, in: 2016 International Conference on High Performance Computing &amp; Simulation (HPCS),  Innsbruck, Austria,   895–900, <ext-link xlink:href="https://doi.org/10.1109/HPCSim.2016.7568429" ext-link-type="DOI">10.1109/HPCSim.2016.7568429</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Marinescu(2023)</label><mixed-citation>Marinescu, D. C.: Chap. 12 – Big Data, data streaming, and the mobile cloud, in: Cloud Computing, edited by: Marinescu, D. C., Morgan Kaufmann,  3rd Edn.,  453–500, <ext-link xlink:href="https://doi.org/10.1016/B978-0-32-385277-7.00019-1" ext-link-type="DOI">10.1016/B978-0-32-385277-7.00019-1</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Mastelini and de Carvalho(2021)</label><mixed-citation>Mastelini, S. M. and de Carvalho, A. C. P. d. L. F.: Using dynamical quantization to perform split attempts in online tree regressors, Pattern Recogn. Lett., 145, 37–42, <ext-link xlink:href="https://doi.org/10.1016/j.patrec.2021.01.033" ext-link-type="DOI">10.1016/j.patrec.2021.01.033</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Min et al.(2022)Min, Ahn, and Azizan</label><mixed-citation>Min, Y., Ahn, K., and Azizan, N.: One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares, 2022 IEEE 61st Conference on Decision and Control (CDC),  Cancun, Mexico,  4720–4725, <ext-link xlink:href="https://doi.org/10.1109/CDC51059.2022.9992939" ext-link-type="DOI">10.1109/CDC51059.2022.9992939</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Morgan et al.(2011)Morgan, Lackner, Vogel, and Baise</label><mixed-citation>Morgan, E. C., Lackner, M., Vogel, R. M., and Baise, L. G.: Probability distributions for offshore wind speeds, Energ. Convers. Manage., 52, 15–26, <ext-link xlink:href="https://doi.org/10.1016/j.enconman.2010.06.015" ext-link-type="DOI">10.1016/j.enconman.2010.06.015</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Muthukrishnan(2005)</label><mixed-citation>Muthukrishnan, S.: Data streams: Algorithms and applications, Found. Trends   Theor.  Comp. Sci., 1, 117–236, <ext-link xlink:href="https://doi.org/10.1561/0400000002" ext-link-type="DOI">10.1561/0400000002</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>NextGEMS()</label><mixed-citation>NextGEMS: nextGEMS, <uri>https://nextgems-h2020.eu/data-sets/</uri>, last access: 4 December 2024.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Orr et al.(2021)Orr, Ekström, Charlton, Peat, and Fowler</label><mixed-citation>Orr, H. G., Ekström, M., Charlton, M. B., Peat, K. L., and Fowler, H. J.: Using high-resolution climate change information in water management: A decision-makers' perspective, Philos. T. Roy. Soc. A-Math, 379,   <ext-link xlink:href="https://doi.org/10.1098/rsta.2020.0219" ext-link-type="DOI">10.1098/rsta.2020.0219</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Palmer(2014)</label><mixed-citation>Palmer, T.: Climate forecasting: Build high-resolution global climate models, Nature, 515, 338–339, <ext-link xlink:href="https://doi.org/10.1038/515338a" ext-link-type="DOI">10.1038/515338a</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Pryor and Barthelmie(2010)</label><mixed-citation>Pryor, S. C. and Barthelmie, R. J.: Climate change impacts on wind energy: A review, Renew. Sust.  Energ. Rev., 14, 430–437, <ext-link xlink:href="https://doi.org/10.1016/j.rser.2009.07.028" ext-link-type="DOI">10.1016/j.rser.2009.07.028</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Pörtner et al.(2022)Pörtner, Roberts, Tignor, Poloczanska, Mintenbeck, Alegría, Craig, Langsdorf, Löschke, Möller, Okem, and Rama</label><mixed-citation>Pörtner, H.-O., Roberts, D., Tignor, M., Poloczanska, E., Mintenbeck, K., Alegría, A., Craig, M., Langsdorf, S., Löschke, S., Möller, V., Okem, A., and Rama, B. E.: IPCC, 2022: Climate Change 2022: Impacts, Adaptation and Vulnerability. Contribution of Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change, Technical Summary, Cambridge University Press, Cambridge, UK and New York, USA, <ext-link xlink:href="https://doi.org/10.1017/9781009325844" ext-link-type="DOI">10.1017/9781009325844</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx45"><label>Rackow et al.(2025)n</label><mixed-citation>Rackow, T., Pedruzo-Bagazgoitia, X., Becker, T., Milinski, S., Sandu, I., Aguridan, R., Bechtold, P., Beyer, S., Bidlot, J., Boussetta, S., Deconinck, W., Diamantakis, M., Dueben, P., Dutra, E., Forbes, R., Ghosh, R., Goessling, H. F., Hadade, I., Hegewald, J., Jung, T., Keeley, S., Kluft, L., Koldunov, N., Koldunov, A., Kölling, T., Kousal, J., Kühnlein, C., Maciel, P., Mogensen, K., Quintino, T., Polichtchouk, I., Reuter, B., Sármány, D., Scholz, P., Sidorenko, D., Streffing, J., Sützl, B., Takasuka, D., Tietsche, S., Valentini, M., Vannière, B., Wedi, N., Zampieri, L., and Ziemen, F.: Multi-year simulations at kilometre scale with the Integrated Forecasting System coupled to FESOM2.5 and NEMOv3.4, Geosci. Model Dev., 18, 33–69, <ext-link xlink:href="https://doi.org/10.5194/gmd-18-33-2025" ext-link-type="DOI">10.5194/gmd-18-33-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx46"><label>Samaniego et al.(2019)Samaniego, Thober, Wanders, Pan, Rakovec, Sheffield, Wood, Prudhomme, Rees, Houghton-Carr, Fry, Smith, Watts, Hisdal, Estrela, Buontempo, Marx, and Kumar</label><mixed-citation>Samaniego, L., Thober, S., Wanders, N., Pan, M., Rakovec, O., Sheffield, J., Wood, E. F., Prudhomme, C., Rees, G., Houghton-Carr, H., Fry, M., Smith, K., Watts, G., Hisdal, H., Estrela, T., Buontempo, C., Marx, A., and Kumar, R.: Hydrological forecasts and projections for improved decision-making in the water sector in europe, B. Am. Meteorol. Soc., 100, 2451–2471, <ext-link xlink:href="https://doi.org/10.1175/BAMS-D-17-0274.1" ext-link-type="DOI">10.1175/BAMS-D-17-0274.1</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx47"><label>Segura et al.(2025)</label><mixed-citation>Segura, H., Pedruzo-Bagazgoitia, X., Weiss, P., Müller, S. K., Rackow, T., Lee, J., Dolores-Tesillos, E., Benedict, I., Aengenheyster, M., Aguridan, R., Arduini, G., Baker, A. J., Bao, J., Bastin, S., Baulenas, E., Becker, T., Beyer, S., Bockelmann, H., Brüggemann, N., Brunner, L., Cheedela, S. K., Das, S., Denissen, J., Dragaud, I., Dziekan, P., Ekblom, M., Engels, J. F., Esch, M., Forbes, R., Frauen, C., Freischem, L., García-Maroto, D., Geier, P., Gierz, P., González-Cervera, Á., Grayson, K., Griffith, M., Gutjahr, O., Haak, H., Hadade, I., Haslehner, K., ul Hasson, S., Hegewald, J., Kluft, L., Koldunov, A., Koldunov, N., Kölling, T., Koseki, S., Kosukhin, S., Kousal, J., Kuma, P., Kumar, A. U., Li, R., Maury, N., Meindl, M., Milinski, S., Mogensen, K., Niraula, B., Nowak, J., Praturi, D. S., Proske, U., Putrasahan, D., Redler, R., Santuy, D., Sármány, D., Schnur, R., Scholz, P., Sidorenko, D., Spät, D., Sützl, B., Takasuka, D., Tompkins, A., Uribe, A., Valentini, M., Veerman, M., Voigt, A., Warnau, S., Wachsmann, F., Wacławczyk, M., Wedi, N., Wieners, K.-H., Wille, J., Winkler, M., Wu, Y., Ziemen, F., Zimmermann, J., Bender, F. A.-M., Bojovic, D., Bony, S., Bordoni, S., Brehmer, P., Dengler, M., Dutra, E., Faye, S., Fischer, E., van Heerwaarden, C., Hohenegger, C., Järvinen, H., Jochum, M., Jung, T., Jungclaus, J. H., Keenlyside, N. S., Klocke, D., Konow, H., Klose, M., Malinowski, S., Martius, O., Mauritsen, T., Mellado, J. P., Mieslinger, T., Mohino, E., Pawłowska, H., Peters-von Gehlen, K., Sarré, A., Sobhani, P., Stier, P., Tuppi, L., Vidale, P. L., Sandu, I., and Stevens, B.: nextGEMS: entering the era of kilometer-scale Earth system modeling, EGUsphere [preprint], <ext-link xlink:href="https://doi.org/10.5194/egusphere-2025-509" ext-link-type="DOI">10.5194/egusphere-2025-509</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx48"><label>Shi et al.(2021)Shi, Dong, Xiao, and Huang</label><mixed-citation>Shi, H., Dong, Z., Xiao, N., and Huang, Q.: Wind Speed Distributions Used in Wind Energy Assessment: A Review, Front. Energ. Res., 9, 1–14, <ext-link xlink:href="https://doi.org/10.3389/fenrg.2021.769920" ext-link-type="DOI">10.3389/fenrg.2021.769920</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx49"><label>Staffell and Pfenninger(2018)</label><mixed-citation>Staffell, I. and Pfenninger, S.: The increasing impact of weather on electricity supply and demand, Energy, 145, 65–78, <ext-link xlink:href="https://doi.org/10.1016/j.energy.2017.12.051" ext-link-type="DOI">10.1016/j.energy.2017.12.051</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx50"><label>Stevens et al.(2020)</label><mixed-citation>Stevens, B., Acquistapace, C., Hansen, A., Heinze, R., Klinger, C., Klocke, D., Rybka, H., Schubotz, W., Windmiller, J., Adamidis, P., Arka, I., Barlakas, V., Biercamp, J., Brueck, M., Brune, S., Buehler, S. A., Burkhardt, U., Cioni, G., Costa-Surós, M., Crewell, S., Crüger, T., Deneke, H., Friederichs, P., Henken, C. C., Hohenegger, C., Jacob, M., Jakub, F., Kalthoff, N., Köhler, M., van LAAR, T. W., Li, P., Löhnert, U., Macke, A., Madenach, N., Mayer, B., Nam, C., Naumann, A. K., Peters, K., Poll, S., Quaas, J., Röber, N., Rochetin, N., Scheck, L., Schemann, V., Schnitt, S., Seifert, A., Senf, F., Shapkalijevski, M., Simmer, C., Singh, S., Sourdeval, O., Spickermann, D., Strandgren, J., Tessiot, O., Vercauteren, N., Vial, J., Voigt, A., and Zängl, G.: The added value of large-eddy and storm-resolving models for simulating clouds and precipitation, J. Meteorol. Soc. Jpn., 98, 395–435, <ext-link xlink:href="https://doi.org/10.2151/jmsj.2020-021" ext-link-type="DOI">10.2151/jmsj.2020-021</ext-link>, 2020. </mixed-citation></ref>
      <ref id="bib1.bibx51"><label>Teutschbein and Seibert(2012)</label><mixed-citation>Teutschbein, C. and Seibert, J.: Bias correction of regional climate model simulations for hydrological climate-change impact studies: Review and evaluation of different methods, J.  Hydrol., 456–457, 12–29, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2012.05.052" ext-link-type="DOI">10.1016/j.jhydrol.2012.05.052</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx52"><label>Thober et al.(2018)Thober, Kumar, Wanders, Marx, Pan, Rakovec, Samaniego, Sheffield, Wood, and Zink</label><mixed-citation>Thober, S., Kumar, R., Wanders, N., Marx, A., Pan, M., Rakovec, O., Samaniego, L., Sheffield, J., Wood, E. F., and Zink, M.: Multi-model ensemble projections of European river floods and high flows at 1.5, 2, and 3 degrees global warming, Environ. Res. Lett., 13, <ext-link xlink:href="https://doi.org/10.1088/1748-9326/aa9e35" ext-link-type="DOI">10.1088/1748-9326/aa9e35</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx53"><label>Wang et al.(2016)Wang, Hu, and Ma</label><mixed-citation>Wang, J., Hu, J., and Ma, K.: Wind speed probability distribution estimation and wind energy assessment, Renew. Sust. Energ. Rev., 60, 881–899, <ext-link xlink:href="https://doi.org/10.1016/j.rser.2016.01.057" ext-link-type="DOI">10.1016/j.rser.2016.01.057</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx54"><label>Wedi et al.(2025)</label><mixed-citation>Wedi, N., Sandu, I., Bauer, P., Acosta, M., Carbuhn, R., Andrae, U., Auger, L., Balsamo, G., Baousis, V., Bennett, V., Bennett, A., Buontempo, C., Bretonnière, P.-a., Capell, R., Castrillo, M., Chantry, M., Chevallier, M., Correa, R., Davini, P., Denby, L., Doblas-reyes, F., Dueben, P., Fischer, C., Frauen, C., Frogner, I.-l., Früh, B., Gascón, E., Gérard, E., Gorwits, O., Geenen, T., Grayson, K., Guenova-rubio, N., Hadade, I., Hardenberg, J. V., Haus, U.-u., Hawkes, J., Hirtl, M., Hoffmann, J., Horvath, K., Järvinen, H., Jung, T., Kann, A., Klocke, D., Koldunov, N., Kontkanen, J., Sievi-korte, O., Kristiansen, J., Kuwertz, E., Mäkelä, J., Maljutenko, I., Manninen, P., Mcknight, U. S., Milinski, S., Mueller, A., Mcnally, A., Modigliani, U., Narayanappa, D., Nielsen, P., Nipen, T., Nortamo, H., Peuch, V.-H., and Polade, S.: Journal of the European Meteorological Society Implementing digital twin technology of the earth system in Destination Earth, J.  Eur. Meteorol. Soc., 3, 100015, <ext-link xlink:href="https://doi.org/10.1016/j.jemets.2025.100015" ext-link-type="DOI">10.1016/j.jemets.2025.100015</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx55"><label>Wieners et al.(2024)</label><mixed-citation>Wieners, K.-H., Rackow, T., Aguridan, R., Becker, T., Beyer, S., Cheedela, S. K., Dreier, N.-A., Engels, J. F., Esch, M., Frauen, C., Klocke, D., Kölling, T., Pedruzo-Bagazgoitia, X., Putrasahan, D., Sidorenko, D., Schnur, R., Stevens, B., and Zimmermann, J.: nextGEMS: output of the production simulations for ICON and IFS, World Data Center for Climate (WDCC) at DKRZ [data set], <uri>https://www.wdc-climate.de/ui/entry?acronym=nextGEMS_prod</uri> (last access: 9 December 2024), 2024.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Statistical summaries for streamed data from  climate simulations: one-pass algorithms</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Alsina-Ferrer and Grayson(2025a)</label><mixed-citation>
      
Alsina-Ferrer, I. and Grayson, K: DestinE-Climate-DT/one_pass: v0.8.0, Zenodo [code], <a href="https://doi.org/10.5281/zenodo.15438184" target="_blank">https://doi.org/10.5281/zenodo.15438184</a>, 2025a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Alsina-Ferrer and Grayson(2025b)</label><mixed-citation>
      
Alsina-Ferrer, I. and Grayson, K: kat-grayson/one_pass_algorithms_paper: v0.5.0, Zenodo [code], <a href="https://doi.org/10.5281/zenodo.15439803" target="_blank">https://doi.org/10.5281/zenodo.15439803</a>, 2025b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Asadieh and Krakauer(2015)</label><mixed-citation>
      
Asadieh, B. and Krakauer, N. Y.: Global trends in extreme precipitation: climate models versus observations, Hydrol. Earth Syst. Sci., 19, 877–891, <a href="https://doi.org/10.5194/hess-19-877-2015" target="_blank">https://doi.org/10.5194/hess-19-877-2015</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Bador et al.(2020)Bador, Boé, Terray, Alexander, Baker,
Bellucci, Haarsma, Koenigk, Moine, Lohmann, Putrasahan, Roberts, Roberts,
Scoccimarro, Schiemann, Seddon, Senan, Valcke, and Vanniere</label><mixed-citation>
      
Bador, M., Boé, J., Terray, L., Alexander, L. V., Baker, A., Bellucci,
A., Haarsma, R., Koenigk, T., Moine, M. P., Lohmann, K., Putrasahan, D. A.,
Roberts, C., Roberts, M., Scoccimarro, E., Schiemann, R., Seddon, J., Senan,
R., Valcke, S., and Vanniere, B.: Impact of Higher Spatial Atmospheric
Resolution on Precipitation Extremes Over Land in Global Climate Models,
J. Geophys. Res.-Atmos., 125,
<a href="https://doi.org/10.1029/2019JD032184" target="_blank">https://doi.org/10.1029/2019JD032184</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Bauer et al.(2021)Bauer, Stevens, and Hazeleger</label><mixed-citation>
      
Bauer, P., Stevens, B., and Hazeleger, W.: A digital twin of Earth for the
green transition, Nat. Clim. Change, 11, 80–83,
<a href="https://doi.org/10.1038/s41558-021-00986-y" target="_blank">https://doi.org/10.1038/s41558-021-00986-y</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Close et al.(2020)Close, Penduff, Speich, and Molines</label><mixed-citation>
      
Close, S., Penduff, T., Speich, S., and Molines, J. M.: A means of estimating
the intrinsic and atmospherically-forced contributions to sea surface height
variability applied to altimetric observations, Prog. Oceanogr.,
184, <a href="https://doi.org/10.1016/j.pocean.2020.102314" target="_blank">https://doi.org/10.1016/j.pocean.2020.102314</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Crist-Harif(2023)</label><mixed-citation>
      
Crist-Harif, J.: Crick, GitHub [code], <a href="https://github.com/dask/crick" target="_blank"/> (last access: 10 May 2025), 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Dask(2024)</label><mixed-citation>
      
Dask: Dask, GitHub [code], <a href="https://github.com/dask/dask" target="_blank"/> (last access: 10 May 2025), 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>DestinationEarth(2024)</label><mixed-citation>
      
DestinationEarth: <a href="https://destination-earth.eu/" target="_blank"/> (last access: 17 April 2025), 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Donat et al.(2016)Donat, Lowry, Alexander, O'Gorman, and
Maher</label><mixed-citation>
      
Donat, M. G., Lowry, A. L., Alexander, L. V., O'Gorman, P. A., and Maher, N.:
More extreme precipitation in the world's dry and wet regions, Nat. Clim.
Change, 6, 508–513, <a href="https://doi.org/10.1038/nclimate2941" target="_blank">https://doi.org/10.1038/nclimate2941</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Dunning(2021)</label><mixed-citation>
      
Dunning, T.: The t-digest: Efficient estimates of distributions, Software
Impacts, 7, 100049, <a href="https://doi.org/10.1016/j.simpa.2020.100049" target="_blank">https://doi.org/10.1016/j.simpa.2020.100049</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Dunning and Ertl(2019)</label><mixed-citation>
      
Dunning, T. and Ertl, O.: Computing Extremely Accurate Quantiles Using
t-Digests, arXiv [preprint],
<a href="https://doi.org/10.48550/arXiv.1902.04023" target="_blank">https://doi.org/10.48550/arXiv.1902.04023</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>NOAA-GFDL(2007)</label><mixed-citation>
      
NOAA-GFDL: FMS, GitHub [code],
<a href="https://github.com/NOAA-GFDL/FMS/tree/main?tab=License-1-ov-file" target="_blank"/> (last access: 10 May 2025), 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Gimeno et al.(2022)Gimeno, Sorí, Vázquez, Stojanovic,
Algarra, Eiras-Barca, Gimeno-Sotelo, and Nieto</label><mixed-citation>
      
Gimeno, L., Sorí, R., Vázquez, M., Stojanovic, M., Algarra, I.,
Eiras-Barca, J., Gimeno-Sotelo, L., and Nieto, R.: Extreme precipitation
events, Wires Water, 9, 1–21,
<a href="https://doi.org/10.1002/wat2.1611" target="_blank">https://doi.org/10.1002/wat2.1611</a>, 2022.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Gorski et al.(2005)Gorski, Hivon, Banday, Wandelt, Hansen, Reinecke,
and Bartelmann</label><mixed-citation>
      
Gorski, K. M., Hivon, E., Banday, A. J., Wandelt, B. D., Hansen, F. K.,
Reinecke, M., and Bartelmann, M.: HEALPix: A Framework for High‐Resolution
Discretization and Fast Analysis of Data Distributed on the Sphere,
Astrophys. J., 622, 759–771, <a href="https://doi.org/10.1086/427976" target="_blank">https://doi.org/10.1086/427976</a>, 2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Grams et al.(2017)Grams, Beerli, Pfenninger, Staffell, and
Wernli</label><mixed-citation>
      
Grams, C. M., Beerli, R., Pfenninger, S., Staffell, I., and Wernli, H.:
Balancing Europe's wind-power output through spatial deployment informed by
weather regimes, Nat. Clim.e Change, 7, 557–562,
<a href="https://doi.org/10.1038/NCLIMATE3338" target="_blank">https://doi.org/10.1038/NCLIMATE3338</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Grau-Sánchez et al.(2010)Grau-Sánchez, Noguera, and
Gutiérrez</label><mixed-citation>
      
Grau-Sánchez, M., Noguera, M., and Gutiérrez, J. M.: On some
computational orders of convergence, Appl. Math. Lett., 23,
472–478, <a href="https://doi.org/10.1016/j.aml.2009.12.006" target="_blank">https://doi.org/10.1016/j.aml.2009.12.006</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Grayson(2024)</label><mixed-citation>
      
Grayson, K.: nextGEMS cycle3 datasets: statistical summaries for streamed data from climate simulations (Version v3), Zenodo [data set], <a href="https://doi.org/10.5281/zenodo.12533197" target="_blank">https://doi.org/10.5281/zenodo.12533197</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Harris et al.(2020)Harris, Millman, van der Walt, Gommers, Virtanen,
Cournapeau, Wieser, Taylor, Berg, Smith, Kern, Picus, Hoyer, van Kerkwijk,
Brett, Haldane, del Río, Wiebe, Peterson, Gérard-Marchant,
Sheppard, Reddy, Weckesser, Abbasi, Gohlke, and Oliphant</label><mixed-citation>
      
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P.,
Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R.,
Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del
Río, J. F., Wiebe, M., Peterson, P., Gérard-Marchant, P.,
Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant,
T. E.: Array programming with NumPy, Nature, 585, 357–362,
<a href="https://doi.org/10.1038/s41586-020-2649-2" target="_blank">https://doi.org/10.1038/s41586-020-2649-2</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Hoffmann et al.(2023)Hoffmann, Bauer, Sandu, Wedi, Geenen, and
Thiemert</label><mixed-citation>
      
Hoffmann, J., Bauer, P., Sandu, I., Wedi, N., Geenen, T., and Thiemert, D.:
Destination Earth – A digital twin in support of climate services,
Climate Services, 30, 100394, <a href="https://doi.org/10.1016/j.cliser.2023.100394" target="_blank">https://doi.org/10.1016/j.cliser.2023.100394</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Hohenegger et al.(2023)</label><mixed-citation>
      
Hohenegger, C., Korn, P., Linardakis, L., Redler, R., Schnur, R., Adamidis, P., Bao, J., Bastin, S., Behravesh, M., Bergemann, M., Biercamp, J., Bockelmann, H., Brokopf, R., Brüggemann, N., Casaroli, L., Chegini, F., Datseris, G., Esch, M., George, G., Giorgetta, M., Gutjahr, O., Haak, H., Hanke, M., Ilyina, T., Jahns, T., Jungclaus, J., Kern, M., Klocke, D., Kluft, L., Kölling, T., Kornblueh, L., Kosukhin, S., Kroll, C., Lee, J., Mauritsen, T., Mehlmann, C., Mieslinger, T., Naumann, A. K., Paccini, L., Peinado, A., Praturi, D. S., Putrasahan, D., Rast, S., Riddick, T., Roeber, N., Schmidt, H., Schulzweida, U., Schütte, F., Segura, H., Shevchenko, R., Singh, V., Specht, M., Stephan, C. C., von Storch, J.-S., Vogel, R., Wengel, C., Winkler, M., Ziemen, F., Marotzke, J., and Stevens, B.: ICON-Sapphire: simulating the components of the Earth system and their interactions at kilometer and subkilometer scales, Geosci. Model Dev., 16, 779–811, <a href="https://doi.org/10.5194/gmd-16-779-2023" target="_blank">https://doi.org/10.5194/gmd-16-779-2023</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Hu et al.(2018)Hu, Yang, Schnase, Duffy, Xu, Bowen, Lee, and
Song</label><mixed-citation>
      
Hu, F., Yang, C., Schnase, J. L., Duffy, D. Q., Xu, M., Bowen, M. K., Lee, T.,
and Song, W.: ClimateSpark: An in-memory distributed computing framework for
big climate data analytics, Comput. Geosci., 115, 154–166,
<a href="https://doi.org/10.1016/j.cageo.2018.03.011" target="_blank">https://doi.org/10.1016/j.cageo.2018.03.011</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Hyndman and Fan(1996)</label><mixed-citation>
      
Hyndman, R. J. and Fan, Y.: Sample Quantiles in Statistical Packages, Am.
Stat., 50, 361–365, <a href="https://doi.org/10.1080/00031305.1996.10473566" target="_blank">https://doi.org/10.1080/00031305.1996.10473566</a>, 1996.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Iles et al.(2020)Iles, Vautard, Strachan, Joussaume, Eggen, and
Hewitt</label><mixed-citation>
      
Iles, C. E., Vautard, R., Strachan, J., Joussaume, S., Eggen, B. R., and Hewitt, C. D.: The benefits of increasing resolution in global and regional climate simulations for European climate extremes, Geosci. Model Dev., 13, 5583–5607, <a href="https://doi.org/10.5194/gmd-13-5583-2020" target="_blank">https://doi.org/10.5194/gmd-13-5583-2020</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Jansen et al.(2020)Jansen, Staffell, Kitzing, Quoilin,
Wiggelinkhuizen, Bulder, Riepin, and Müsgens</label><mixed-citation>
      
Jansen, M., Staffell, I., Kitzing, L., Quoilin, S., Wiggelinkhuizen, E.,
Bulder, B., Riepin, I., and Müsgens, F.: Offshore wind competitiveness
in mature markets without subsidy, Nat. Energy, 5, 614–622,
<a href="https://doi.org/10.1038/s41560-020-0661-2" target="_blank">https://doi.org/10.1038/s41560-020-0661-2</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Jungclaus et al.(2022)</label><mixed-citation>
      
Jungclaus, J. H., Lorenz, S. J., Schmidt, H., Brovkin, V., Brüggemann,
N., Chegini, F., Crüger, T., De-Vrese, P., Gayler, V., Giorgetta,
M. A., Gutjahr, O., Haak, H., Hagemann, S., Hanke, M., Ilyina, T., Korn, P.,
Kröger, J., Linardakis, L., Mehlmann, C., Mikolajewicz, U.,
Müller, W. A., Nabel, J. E., Notz, D., Pohlmann, H., Putrasahan, D. A.,
Raddatz, T., Ramme, L., Redler, R., Reick, C. H., Riddick, T., Sam, T.,
Schneck, R., Schnur, R., Schupfner, M., von Storch, J. S., Wachsmann, F.,
Wieners, K. H., Ziemen, F., Stevens, B., Marotzke, J., and Claussen, M.: The
ICON Earth System Model Version 1.0, J. Adv. Model. Earth
Sy., 14, <a href="https://doi.org/10.1029/2021MS002813" target="_blank">https://doi.org/10.1029/2021MS002813</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Jurasz et al.(2022)Jurasz, Kies, and De Felice</label><mixed-citation>
      
Jurasz, J., Kies, A., and De Felice, M.: Complementary behavior of solar and
wind energy based on the reported data on the European level – a
country-level analysis, Complementarity of Variable Renewable Energy Sources,
197–214, <a href="https://doi.org/10.1016/B978-0-323-85527-3.00023-6" target="_blank">https://doi.org/10.1016/B978-0-323-85527-3.00023-6</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Katopodis et al.(2021)Katopodis, Markantonis, Vlachogiannis, Politi,
and Sfetsos</label><mixed-citation>
      
Katopodis, T., Markantonis, I., Vlachogiannis, D., Politi, N., and Sfetsos, A.:
Assessing climate change impacts on wind characteristics in Greece through
high resolution regional climate modelling, Renew. Energ., 179, 427–444,
<a href="https://doi.org/10.1016/j.renene.2021.07.061" target="_blank">https://doi.org/10.1016/j.renene.2021.07.061</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Kolajo et al.(2019)Kolajo, Daramola, and Adebiyi</label><mixed-citation>
      
Kolajo, T., Daramola, O., and Adebiyi, A.: Big data stream analysis: a
systematic literature review, J. Big Data, 6,
<a href="https://doi.org/10.1186/s40537-019-0210-7" target="_blank">https://doi.org/10.1186/s40537-019-0210-7</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Koldunov et al.(2023)Koldunov, Kölling, Pedruzo-Bagazgoitia,
Rackow, Redler, Sidorenko, Wieners, and Ziemen</label><mixed-citation>
      
Koldunov, N., Kölling, T., Pedruzo-Bagazgoitia, X., Rackow, T., Redler,
R., Sidorenko, D., Wieners, K.-H., and Ziemen, F. A.: nextGEMS: output of the
model development cycle 3 simulations for ICON and IFS. World Data Center for
Climate (WDCC) at DKRZ [data set], <a href="https://doi.org/10.26050/WDCC/nextGEMS_cyc3" target="_blank">https://doi.org/10.26050/WDCC/nextGEMS_cyc3</a>,
2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Lange(2019)</label><mixed-citation>
      
Lange, S.: Trend-preserving bias adjustment and statistical downscaling with ISIMIP3BASD (v1.0), Geosci. Model Dev., 12, 3055–3070, <a href="https://doi.org/10.5194/gmd-12-3055-2019" target="_blank">https://doi.org/10.5194/gmd-12-3055-2019</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Lledó et al.(2019)</label><mixed-citation>
      
Lledó, L., Torralba, V., Soret, A., Ramon, J., and Doblas-Reyes, F. J.: Seasonal forecasts of wind power generation, Renew. Energ., 143, 91–100, <a href="https://doi.org/10.1016/j.renene.2019.04.135" target="_blank">https://doi.org/10.1016/j.renene.2019.04.135</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Loveless et al.(2013)Loveless, Stoikov, and Waeber</label><mixed-citation>
      
Loveless, J., Stoikov, S., and Waeber, R.: Online Algorithms in High-frequency
Trading, Queue, 11, 30–41, <a href="https://doi.org/10.1145/2523426.2534976" target="_blank">https://doi.org/10.1145/2523426.2534976</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Manubens-Gil et al.(2016)Manubens-Gil, Vegas-Regidor, Prodhomme,
Mula-Valls, and Doblas-Reyes</label><mixed-citation>
      
Manubens-Gil, D., Vegas-Regidor, J., Prodhomme, C., Mula-Valls, O., and
Doblas-Reyes, F. J.: Seamless management of ensemble climate prediction
experiments on HPC platforms, in: 2016 International Conference on High
Performance Computing &amp; Simulation (HPCS),  Innsbruck, Austria,   895–900,
<a href="https://doi.org/10.1109/HPCSim.2016.7568429" target="_blank">https://doi.org/10.1109/HPCSim.2016.7568429</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Marinescu(2023)</label><mixed-citation>
      
Marinescu, D. C.: Chap. 12 – Big Data, data streaming, and the mobile cloud,
in: Cloud Computing, edited by: Marinescu, D. C.,
Morgan Kaufmann,  3rd Edn.,  453–500,
<a href="https://doi.org/10.1016/B978-0-32-385277-7.00019-1" target="_blank">https://doi.org/10.1016/B978-0-32-385277-7.00019-1</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Mastelini and de Carvalho(2021)</label><mixed-citation>
      
Mastelini, S. M. and de Carvalho, A. C. P. d. L. F.: Using dynamical
quantization to perform split attempts in online tree regressors, Pattern
Recogn. Lett., 145, 37–42, <a href="https://doi.org/10.1016/j.patrec.2021.01.033" target="_blank">https://doi.org/10.1016/j.patrec.2021.01.033</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Min et al.(2022)Min, Ahn, and Azizan</label><mixed-citation>
      
Min, Y., Ahn, K., and Azizan, N.: One-Pass Learning via Bridging Orthogonal
Gradient Descent and Recursive Least-Squares, 2022 IEEE 61st Conference on
Decision and Control (CDC),  Cancun, Mexico,  4720–4725,
<a href="https://doi.org/10.1109/CDC51059.2022.9992939" target="_blank">https://doi.org/10.1109/CDC51059.2022.9992939</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Morgan et al.(2011)Morgan, Lackner, Vogel, and Baise</label><mixed-citation>
      
Morgan, E. C., Lackner, M., Vogel, R. M., and Baise, L. G.: Probability
distributions for offshore wind speeds, Energ. Convers. Manage., 52,
15–26, <a href="https://doi.org/10.1016/j.enconman.2010.06.015" target="_blank">https://doi.org/10.1016/j.enconman.2010.06.015</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Muthukrishnan(2005)</label><mixed-citation>
      
Muthukrishnan, S.: Data streams: Algorithms and applications, Found.
Trends   Theor.  Comp. Sci., 1, 117–236,
<a href="https://doi.org/10.1561/0400000002" target="_blank">https://doi.org/10.1561/0400000002</a>, 2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>NextGEMS()</label><mixed-citation>
      
NextGEMS: nextGEMS, <a href="https://nextgems-h2020.eu/data-sets/" target="_blank"/>, last access: 4 December 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Orr et al.(2021)Orr, Ekström, Charlton, Peat, and
Fowler</label><mixed-citation>
      
Orr, H. G., Ekström, M., Charlton, M. B., Peat, K. L., and Fowler, H. J.:
Using high-resolution climate change information in water management: A
decision-makers' perspective, Philos. T. Roy.
Soc. A-Math, 379,   <a href="https://doi.org/10.1098/rsta.2020.0219" target="_blank">https://doi.org/10.1098/rsta.2020.0219</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Palmer(2014)</label><mixed-citation>
      
Palmer, T.: Climate forecasting: Build high-resolution global climate models,
Nature, 515, 338–339, <a href="https://doi.org/10.1038/515338a" target="_blank">https://doi.org/10.1038/515338a</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Pryor and Barthelmie(2010)</label><mixed-citation>
      
Pryor, S. C. and Barthelmie, R. J.: Climate change impacts on wind energy: A
review, Renew. Sust.  Energ. Rev., 14, 430–437,
<a href="https://doi.org/10.1016/j.rser.2009.07.028" target="_blank">https://doi.org/10.1016/j.rser.2009.07.028</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Pörtner et al.(2022)Pörtner, Roberts, Tignor, Poloczanska,
Mintenbeck, Alegría, Craig, Langsdorf, Löschke, Möller, Okem, and
Rama</label><mixed-citation>
      
Pörtner, H.-O., Roberts, D., Tignor, M., Poloczanska, E., Mintenbeck, K.,
Alegría, A., Craig, M., Langsdorf, S., Löschke, S., Möller, V., Okem, A.,
and Rama, B. E.: IPCC, 2022: Climate Change 2022: Impacts, Adaptation and
Vulnerability. Contribution of Working Group II to the Sixth Assessment
Report of the Intergovernmental Panel on Climate Change, Technical Summary,
Cambridge University Press, Cambridge, UK and New York, USA,
<a href="https://doi.org/10.1017/9781009325844" target="_blank">https://doi.org/10.1017/9781009325844</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Rackow et al.(2025)n</label><mixed-citation>
      
Rackow, T., Pedruzo-Bagazgoitia, X., Becker, T., Milinski, S., Sandu, I., Aguridan, R., Bechtold, P., Beyer, S., Bidlot, J., Boussetta, S., Deconinck, W., Diamantakis, M., Dueben, P., Dutra, E., Forbes, R., Ghosh, R., Goessling, H. F., Hadade, I., Hegewald, J., Jung, T., Keeley, S., Kluft, L., Koldunov, N., Koldunov, A., Kölling, T., Kousal, J., Kühnlein, C., Maciel, P., Mogensen, K., Quintino, T., Polichtchouk, I., Reuter, B., Sármány, D., Scholz, P., Sidorenko, D., Streffing, J., Sützl, B., Takasuka, D., Tietsche, S., Valentini, M., Vannière, B., Wedi, N., Zampieri, L., and Ziemen, F.: Multi-year simulations at kilometre scale with the Integrated Forecasting System coupled to FESOM2.5 and NEMOv3.4, Geosci. Model Dev., 18, 33–69, <a href="https://doi.org/10.5194/gmd-18-33-2025" target="_blank">https://doi.org/10.5194/gmd-18-33-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Samaniego et al.(2019)Samaniego, Thober, Wanders, Pan, Rakovec,
Sheffield, Wood, Prudhomme, Rees, Houghton-Carr, Fry, Smith, Watts, Hisdal,
Estrela, Buontempo, Marx, and Kumar</label><mixed-citation>
      
Samaniego, L., Thober, S., Wanders, N., Pan, M., Rakovec, O., Sheffield, J.,
Wood, E. F., Prudhomme, C., Rees, G., Houghton-Carr, H., Fry, M., Smith, K.,
Watts, G., Hisdal, H., Estrela, T., Buontempo, C., Marx, A., and Kumar, R.:
Hydrological forecasts and projections for improved decision-making in the
water sector in europe, B. Am. Meteorol. Soc., 100,
2451–2471, <a href="https://doi.org/10.1175/BAMS-D-17-0274.1" target="_blank">https://doi.org/10.1175/BAMS-D-17-0274.1</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Segura et al.(2025)</label><mixed-citation>
      
Segura, H., Pedruzo-Bagazgoitia, X., Weiss, P., Müller, S. K., Rackow, T., Lee, J., Dolores-Tesillos, E., Benedict, I., Aengenheyster, M., Aguridan, R., Arduini, G., Baker, A. J., Bao, J., Bastin, S., Baulenas, E., Becker, T., Beyer, S., Bockelmann, H., Brüggemann, N., Brunner, L., Cheedela, S. K., Das, S., Denissen, J., Dragaud, I., Dziekan, P., Ekblom, M., Engels, J. F., Esch, M., Forbes, R., Frauen, C., Freischem, L., García-Maroto, D., Geier, P., Gierz, P., González-Cervera, Á., Grayson, K., Griffith, M., Gutjahr, O., Haak, H., Hadade, I., Haslehner, K., ul Hasson, S., Hegewald, J., Kluft, L., Koldunov, A., Koldunov, N., Kölling, T., Koseki, S., Kosukhin, S., Kousal, J., Kuma, P., Kumar, A. U., Li, R., Maury, N., Meindl, M., Milinski, S., Mogensen, K., Niraula, B., Nowak, J., Praturi, D. S., Proske, U., Putrasahan, D., Redler, R., Santuy, D., Sármány, D., Schnur, R., Scholz, P., Sidorenko, D., Spät, D., Sützl, B., Takasuka, D., Tompkins, A., Uribe, A., Valentini, M., Veerman, M., Voigt, A., Warnau, S., Wachsmann, F., Wacławczyk, M., Wedi, N., Wieners, K.-H., Wille, J., Winkler, M., Wu, Y., Ziemen, F., Zimmermann, J., Bender, F. A.-M., Bojovic, D., Bony, S., Bordoni, S., Brehmer, P., Dengler, M., Dutra, E., Faye, S., Fischer, E., van Heerwaarden, C., Hohenegger, C., Järvinen, H., Jochum, M., Jung, T., Jungclaus, J. H., Keenlyside, N. S., Klocke, D., Konow, H., Klose, M., Malinowski, S., Martius, O., Mauritsen, T., Mellado, J. P., Mieslinger, T., Mohino, E., Pawłowska, H., Peters-von Gehlen, K., Sarré, A., Sobhani, P., Stier, P., Tuppi, L., Vidale, P. L., Sandu, I., and Stevens, B.: nextGEMS: entering the era of kilometer-scale Earth system modeling, EGUsphere [preprint], <a href="https://doi.org/10.5194/egusphere-2025-509" target="_blank">https://doi.org/10.5194/egusphere-2025-509</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Shi et al.(2021)Shi, Dong, Xiao, and Huang</label><mixed-citation>
      
Shi, H., Dong, Z., Xiao, N., and Huang, Q.: Wind Speed Distributions Used in
Wind Energy Assessment: A Review, Front. Energ. Res., 9, 1–14,
<a href="https://doi.org/10.3389/fenrg.2021.769920" target="_blank">https://doi.org/10.3389/fenrg.2021.769920</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Staffell and Pfenninger(2018)</label><mixed-citation>
      
Staffell, I. and Pfenninger, S.: The increasing impact of weather on
electricity supply and demand, Energy, 145, 65–78,
<a href="https://doi.org/10.1016/j.energy.2017.12.051" target="_blank">https://doi.org/10.1016/j.energy.2017.12.051</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Stevens et al.(2020)</label><mixed-citation>
      
Stevens, B., Acquistapace, C., Hansen, A., Heinze, R., Klinger, C., Klocke, D.,
Rybka, H., Schubotz, W., Windmiller, J., Adamidis, P., Arka, I., Barlakas,
V., Biercamp, J., Brueck, M., Brune, S., Buehler, S. A., Burkhardt, U.,
Cioni, G., Costa-Surós, M., Crewell, S., Crüger, T., Deneke, H.,
Friederichs, P., Henken, C. C., Hohenegger, C., Jacob, M., Jakub, F.,
Kalthoff, N., Köhler, M., van LAAR, T. W., Li, P., Löhnert, U.,
Macke, A., Madenach, N., Mayer, B., Nam, C., Naumann, A. K., Peters, K.,
Poll, S., Quaas, J., Röber, N., Rochetin, N., Scheck, L., Schemann, V.,
Schnitt, S., Seifert, A., Senf, F., Shapkalijevski, M., Simmer, C., Singh,
S., Sourdeval, O., Spickermann, D., Strandgren, J., Tessiot, O., Vercauteren,
N., Vial, J., Voigt, A., and Zängl, G.: The added value of large-eddy
and storm-resolving models for simulating clouds and precipitation, J.
Meteorol. Soc. Jpn., 98, 395–435,
<a href="https://doi.org/10.2151/jmsj.2020-021" target="_blank">https://doi.org/10.2151/jmsj.2020-021</a>, 2020.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Teutschbein and Seibert(2012)</label><mixed-citation>
      
Teutschbein, C. and Seibert, J.: Bias correction of regional climate model
simulations for hydrological climate-change impact studies: Review and
evaluation of different methods, J.  Hydrol., 456–457, 12–29,
<a href="https://doi.org/10.1016/j.jhydrol.2012.05.052" target="_blank">https://doi.org/10.1016/j.jhydrol.2012.05.052</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Thober et al.(2018)Thober, Kumar, Wanders, Marx, Pan, Rakovec,
Samaniego, Sheffield, Wood, and Zink</label><mixed-citation>
      
Thober, S., Kumar, R., Wanders, N., Marx, A., Pan, M., Rakovec, O., Samaniego,
L., Sheffield, J., Wood, E. F., and Zink, M.: Multi-model ensemble
projections of European river floods and high flows at 1.5, 2, and 3 degrees
global warming, Environ. Res. Lett., 13,
<a href="https://doi.org/10.1088/1748-9326/aa9e35" target="_blank">https://doi.org/10.1088/1748-9326/aa9e35</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>Wang et al.(2016)Wang, Hu, and Ma</label><mixed-citation>
      
Wang, J., Hu, J., and Ma, K.: Wind speed probability distribution estimation
and wind energy assessment, Renew. Sust. Energ. Rev., 60,
881–899, <a href="https://doi.org/10.1016/j.rser.2016.01.057" target="_blank">https://doi.org/10.1016/j.rser.2016.01.057</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Wedi et al.(2025)</label><mixed-citation>
      
Wedi, N., Sandu, I., Bauer, P., Acosta, M., Carbuhn, R., Andrae, U., Auger, L.,
Balsamo, G., Baousis, V., Bennett, V., Bennett, A., Buontempo, C.,
Bretonnière, P.-a., Capell, R., Castrillo, M., Chantry, M., Chevallier,
M., Correa, R., Davini, P., Denby, L., Doblas-reyes, F., Dueben, P., Fischer,
C., Frauen, C., Frogner, I.-l., Früh, B., Gascón, E.,
Gérard, E., Gorwits, O., Geenen, T., Grayson, K., Guenova-rubio, N.,
Hadade, I., Hardenberg, J. V., Haus, U.-u., Hawkes, J., Hirtl, M., Hoffmann,
J., Horvath, K., Järvinen, H., Jung, T., Kann, A., Klocke, D.,
Koldunov, N., Kontkanen, J., Sievi-korte, O., Kristiansen, J., Kuwertz, E.,
Mäkelä, J., Maljutenko, I., Manninen, P., Mcknight, U. S.,
Milinski, S., Mueller, A., Mcnally, A., Modigliani, U., Narayanappa, D.,
Nielsen, P., Nipen, T., Nortamo, H., Peuch, V.-H., and Polade, S.: Journal
of the European Meteorological Society Implementing digital twin technology
of the earth system in Destination Earth, J.  Eur.
Meteorol. Soc., 3, 100015, <a href="https://doi.org/10.1016/j.jemets.2025.100015" target="_blank">https://doi.org/10.1016/j.jemets.2025.100015</a>,
2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>Wieners et al.(2024)</label><mixed-citation>
      
Wieners, K.-H., Rackow, T., Aguridan, R., Becker, T., Beyer, S., Cheedela,
S. K., Dreier, N.-A., Engels, J. F., Esch, M., Frauen, C., Klocke, D.,
Kölling, T., Pedruzo-Bagazgoitia, X., Putrasahan, D., Sidorenko, D., Schnur,
R., Stevens, B., and Zimmermann, J.: nextGEMS: output of the production
simulations for ICON and IFS, World Data Center for Climate (WDCC) at DKRZ
[data set],
<a href="https://www.wdc-climate.de/ui/entry?acronym=nextGEMS_prod" target="_blank"/> (last access: 9 December 2024), 2024.

    </mixed-citation></ref-html>--></article>
