<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" dtd-version="3.0">
  <front>
    <journal-meta>
<journal-id journal-id-type="publisher">GMD</journal-id>
<journal-title-group>
<journal-title>Geoscientific Model Development</journal-title>
<abbrev-journal-title abbrev-type="publisher">GMD</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">Geosci. Model Dev.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">1991-9603</issn>
<publisher><publisher-name>Copernicus Publications</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>

    <article-meta>
      <article-id pub-id-type="doi">10.5194/gmd-10-1751-2017</article-id><title-group><article-title>Accelerating volcanic ash data assimilation using a mask-state algorithm
based on an ensemble Kalman filter: a case study with the LOTOS-EUROS model
(version 1.10) </article-title>
      </title-group><?xmltex \runningtitle{A mask-state algorithm}?><?xmltex \runningauthor{G. Fu et al.}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Fu</surname><given-names>Guangliang</given-names></name>
          <email>g.fu@tudelft.nl</email>
        <ext-link>https://orcid.org/0000-0001-8916-0243</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Lin</surname><given-names>Hai Xiang</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-1653-4854</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Heemink</surname><given-names>Arnold</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Lu</surname><given-names>Sha</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-5434-893X</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Segers</surname><given-names>Arjo</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1 aff3">
          <name><surname>van Velzen</surname><given-names>Nils</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff4">
          <name><surname>Lu</surname><given-names>Tongchao</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff5">
          <name><surname>Xu</surname><given-names>Shiming</given-names></name>
          
        <ext-link>https://orcid.org/0000-0003-4999-3499</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Delft University of Technology, Delft Institute of Applied Mathematics,
Mekelweg 4, 2628 CD Delft, the Netherlands</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>TNO, Department of Climate, Air and Sustainability, P.O. Box 80015,
3508 TA Utrecht, the Netherlands</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>VORtech, P.O. Box 260, 2600 AG Delft, the Netherlands.</institution>
        </aff>
        <aff id="aff4"><label>4</label><institution>School of Mathematics, Shandong University, Jinan, Shandong, China</institution>
        </aff>
        <aff id="aff5"><label>5</label><institution>Department of Earth System Science, Tsinghua University, Beijing,
China</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Guangliang Fu (g.fu@tudelft.nl)</corresp></author-notes><pub-date><day>24</day><month>April</month><year>2017</year></pub-date>
      
      <volume>10</volume>
      <issue>4</issue>
      <fpage>1751</fpage><lpage>1766</lpage>
      <history>
        <date date-type="received"><day>1</day><month>August</month><year>2016</year></date>
           <date date-type="rev-request"><day>24</day><month>August</month><year>2016</year></date>
           <date date-type="rev-recd"><day>7</day><month>February</month><year>2017</year></date>
           <date date-type="accepted"><day>3</day><month>April</month><year>2017</year></date>
      </history>
      <permissions>
<license license-type="open-access">
<license-p>This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/3.0/">http://creativecommons.org/licenses/by/3.0/</ext-link></license-p>
</license>
</permissions><self-uri xlink:href="https://gmd.copernicus.org/articles/10/1751/2017/gmd-10-1751-2017.html">This article is available from https://gmd.copernicus.org/articles/10/1751/2017/gmd-10-1751-2017.html</self-uri>
<self-uri xlink:href="https://gmd.copernicus.org/articles/10/1751/2017/gmd-10-1751-2017.pdf">The full text article is available as a PDF file from https://gmd.copernicus.org/articles/10/1751/2017/gmd-10-1751-2017.pdf</self-uri>


      <abstract>
    <p>In this study, we investigate a strategy to accelerate the data assimilation
(DA) algorithm. Based on evaluations of the computational time, the analysis
step of the assimilation turns out to be the most expensive part. After a
study of the characteristics of the ensemble ash state, we propose a
mask-state algorithm which records the sparsity information of the full
ensemble state matrix and transforms the full matrix into a relatively small
one. This will reduce the computational cost in the analysis step.
Experimental results show the mask-state algorithm significantly speeds up
the analysis step. Subsequently, the total amount of computing time for
volcanic ash DA is reduced to an acceptable level. The mask-state algorithm
is generic and thus can be embedded in any ensemble-based DA framework.
Moreover, ensemble-based DA with the mask-state algorithm is promising and
flexible, because it implements exactly the standard DA without any
approximation and it realizes the satisfying performance without any change
in the full model.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <title>Introduction</title>
      <p>Volcanic ash erupted into atmospheres can lead to severe influences on
aviation society <xref ref-type="bibr" rid="bib1.bibx14" id="paren.1"/>. Turbine engines of airplanes are
extremely threatened by ash ingestion <xref ref-type="bibr" rid="bib1.bibx4" id="paren.2"/>. Thus,
accurate real-time aviation advice is highly required during an explosive
volcanic ash eruption <xref ref-type="bibr" rid="bib1.bibx7" id="paren.3"/>. Using data assimilation
(DA) to improve model forecast accuracy is a powerful approach
<xref ref-type="bibr" rid="bib1.bibx26" id="paren.4"/>. Recently, ensemble-based DA
<xref ref-type="bibr" rid="bib1.bibx8" id="paren.5"/> has been evaluated as very useful for improving
volcanic ash forecasts and regional aviation advice <xref ref-type="bibr" rid="bib1.bibx12" id="paren.6"/>.
It corrects volcanic ash concentrations by continuously assimilating
observations. In <xref ref-type="bibr" rid="bib1.bibx12" id="text.7"/>, real aircraft in situ measurements
were assimilated using the ensemble Kalman filter (EnKF), which is the best
known and most popular ensemble-based DA method. Based on the validation with
independent data, ensemble-based DA was concluded as being powerful for
improving the forecast accuracy.</p>
      <p>However, to make the methodology efficient also in an operational (real-time)
sense, the computational efforts must be acceptable. For volcanic ash DA
problems, so far, no studies on the computational aspects have been reported
in the literature. Actually, when large amounts of volcanic ash erupted into
atmospheres, the computational speed of volcanic ash forecasts is just as
important as the forecast accuracy <xref ref-type="bibr" rid="bib1.bibx41" id="paren.8"/>. For example, due
to the lack of a fast and accurate forecast system, the sudden eruption of
the Eyjafjallajökull volcano in Iceland from 14 April to 23 May 2010 caused
an unprecedented closure of the European and North Atlantic airspace,
resulting in a huge global economic loss of USD 5 billion
<xref ref-type="bibr" rid="bib1.bibx31" id="paren.9"/>. Since then, research on fast and
accurate volcanic ash forecasts has gained much attention, because it is
needed to provide timely and accurate aviation advice for frequently operated
commercial airplanes. It was shown that the accuracy of volcanic ash
transport can be significantly improved by the DA system in
<xref ref-type="bibr" rid="bib1.bibx12" id="text.10"/>. Therefore, it is urgent to also consider the
computational aspect, i.e., improving the computational speed of the volcanic
ash DA system as quickly as possible. This is the main focus of this study.</p>
      <p>Due to the computational complexity of ensemble-based algorithms and the
large scale of dynamical applications, applying these methods usually
introduces a large computational cost. This has been reported from the
literature on different applications. For example, for operational weather
forecasting with ensemble-based DA, <xref ref-type="bibr" rid="bib1.bibx18" id="text.11"/> reported
computational challenges at the Canadian Meteorological Center with an
operational EnKF featuring 192 ensemble members, using a large
600 <inline-formula><mml:math id="M1" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 300 global horizontal grid and 74 vertical levels. An
initialization requirement of over 7 <inline-formula><mml:math id="M2" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 10<inline-formula><mml:math id="M3" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">10</mml:mn></mml:msup></mml:math></inline-formula> values to specify
each ensemble results in large computational efforts on the initialization
and forecast steps in weather forecasting. For oil reservoir history-matching
<xref ref-type="bibr" rid="bib1.bibx38" id="paren.12"/>, the reservoir simulation model usually has a
large number of state variables; thus, the forecasts of an ensemble of
simulation models are often time-consuming. Besides, when time-lapse seismic
or dense reservoir data are available, the analysis step of assimilating
these large observations becomes very time-consuming
<xref ref-type="bibr" rid="bib1.bibx23" id="paren.13"/>. Large computational requirements of
ensemble-based DA have also been reported in ocean circulation models
<xref ref-type="bibr" rid="bib1.bibx21 bib1.bibx22" id="paren.14"/>, tropospheric chemistry
assimilation <xref ref-type="bibr" rid="bib1.bibx29" id="paren.15"/>, and many other applications.</p>
      <p>To accelerate an ensemble-based DA system, the ensemble forecast step can
first be parallelized because the propagation of different ensemble members
is independent. Thus if a computer with a sufficiently large number of
parallel processors is available, all the ensemble members can be
simultaneously integrated. In the analysis stage, to calculate the Kalman
gain and the ensemble error covariance matrix, all ensemble states must be
combined together. In weather forecasting and oceanography sciences,
<xref ref-type="bibr" rid="bib1.bibx21" id="text.16"/>, <xref ref-type="bibr" rid="bib1.bibx22" id="text.17"/>, and
<xref ref-type="bibr" rid="bib1.bibx17" id="text.18"/> have reported using parallelization
approaches to accelerate the expensive analysis stage. In reservoir history
matching, a three-level parallelization has been proposed by
<xref ref-type="bibr" rid="bib1.bibx38 bib1.bibx23" id="text.19"/> in recent years,
to significantly reduce computational efforts of both forecast and analysis
steps due to massive dense observations and large simulation models. The
first parallelization level is to separately perform the ensemble simulations
on different processors during the forecast step. This approach is usually
quite efficient when a large ensemble size is used. However, the scale or
model size of one reservoir simulation is constrained by the memory of a
single processor. Thus, the second parallelization level is to perform one
ensemble member simulation using a parallel reservoir model. These two levels
do not deal with the analysis step, which collects all ensemble members to do
computations usually on a single processor. Therefore, a third level of
parallelization was implemented by <xref ref-type="bibr" rid="bib1.bibx38" id="text.20"/> and
<xref ref-type="bibr" rid="bib1.bibx23" id="text.21"/> by parallelizing matrix-vector
multiplications in the analysis steps. Furthermore, some other approaches on
accelerating ensemble-based DA systems have also been reported, such as
GPU-based acceleration <xref ref-type="bibr" rid="bib1.bibx33" id="paren.22"/> in numerical weather prediction
(NWP) and domain decomposition in atmospheric chemistry assimilation
<xref ref-type="bibr" rid="bib1.bibx37 bib1.bibx29" id="paren.23"/>. The observations used in an
DA system can also be optimized with some preprocessing procedures, as
reported by <xref ref-type="bibr" rid="bib1.bibx18" id="text.24"/>.</p>
      <p>Although for other applications there were many efforts in dealing with large
computational requirements in an ensemble-based DA system, most of them
cannot be directly used to accelerate volcanic ash DA. This is because the
acceleration algorithms are strongly dependent on specific problems, such as
model complexity (high or low resolution), observation type (dense or
sparse), or primary requirement (accuracy or speed). These factors determine,
for a specific application, which part is the most time-consuming, and which
part is intrinsically sequential. Thus, no unified approach for efficient
acceleration of all the applications can be found. Although the successful
approaches in other applications cannot be directly employed in volcanic ash
forecasts, their success does stress the importance of designing a proper
approach based on the computational analysis of a specific DA system.
Therefore, the computational cost of our volcanic ash DA system will first be
analyzed. Then, based on the computational analysis, we will investigate a
strategy to accelerate the ensemble-based DA system for volcanic ash
forecasts.</p>
      <p>This paper is organized as follows. Section <xref ref-type="sec" rid="Ch1.S2"/> introduces the
methodology of volcanic ash DA. Section <xref ref-type="sec" rid="Ch1.S3"/> analyzes the
computational cost of the conventional volcanic ash DA system. In
Sect. <xref ref-type="sec" rid="Ch1.S4"/>, the mask-state algorithm (MS) is developed for
acceleration. The comparison between MS and standard sparse matrix methods is
presented in Sect. <xref ref-type="sec" rid="Ch1.S5"/>. The discussions on MS is in
Sect. <xref ref-type="sec" rid="Ch1.S6"/>. Finally, the last section summarizes the
concluding remarks of our research.</p>
</sec>
<sec id="Ch1.S2">
  <title>Methodology of the volcanic ash DA system</title>
      <p>In this study, the EnKF <xref ref-type="bibr" rid="bib1.bibx8" id="paren.25"/> is employed to perform
ensemble-based DA. EnKF is typically a sequential Monte Carlo method,
according to the uncertain state estimate with <inline-formula><mml:math id="M4" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> ensemble members,
<inline-formula><mml:math id="M5" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">ξ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">ξ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">⋯</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">ξ</mml:mi><mml:mi>N</mml:mi></mml:msub><mml:mo>.</mml:mo></mml:mrow></mml:math></inline-formula> Each
member is assumed as one sample in the distribution of the true state. It has
been proposed that for operational applications, the ensemble size can be
limited to 10–100 for cost effectiveness
<xref ref-type="bibr" rid="bib1.bibx30 bib1.bibx3" id="paren.26"/>. Thus, in this study, an
ensemble size of 100 is used due to the high accuracy requirement of the
volcanic ash forecasts to aviation advice as mentioned in
Sect. <xref ref-type="sec" rid="Ch1.S1"/>.</p>
      <p>To simulate a volcanic ash plume, an atmospheric transport model is needed.
In this paper, the LOTOS-EUROS (abbreviation of LOng Term Ozone Simulation –
EURopean Operational Smog) model is used <xref ref-type="bibr" rid="bib1.bibx36" id="paren.27"/> with model
version 1.10 (<uri>http://www.lotos-euros.nl/</uri>). The LOTOS-EUROS model
<xref ref-type="bibr" rid="bib1.bibx36" id="paren.28"/> is an operational model focusing on nitrogen oxides,
ozone, particulate matter, and volcanic ash. The model configurations for
volcanic ash were discussed in detail by <xref ref-type="bibr" rid="bib1.bibx12" id="text.29"/>. For
volcanic ash simulation, the model is configured with a state vector of size
180 <inline-formula><mml:math id="M6" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 200 <inline-formula><mml:math id="M7" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 18 <inline-formula><mml:math id="M8" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 6 (the dimensions correspond to
longitude, latitude, vertical level, and ash species), and the size of the
model state is thus calculated as <inline-formula><mml:math id="M9" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mn mathvariant="normal">6</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1" specific-use="star"><caption><p>Methodology of ensemble-based DA.
<bold>(a)</bold> The initial volcanic ash state at 09:00 UTC.
<bold>(b)</bold> Flight route of measurement aircraft.
<bold>(c)</bold> Aircraft in situ measurements of PM<inline-formula><mml:math id="M10" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">10</mml:mn></mml:msub></mml:math></inline-formula> and PM<inline-formula><mml:math id="M11" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2.5</mml:mn></mml:msub></mml:math></inline-formula> from 09:30 to 11:10 UTC, 18 May 2010.
<bold>(d)</bold> Volcanic ash assimilation result at 12:00 UTC.
</p></caption>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/10/1751/2017/gmd-10-1751-2017-f01.png"/>

      </fig>

      <p>The experiment in this study starts at <inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> (09:00 UTC, 18 May 2010 for
this study) by considering an initial condition from a previous LOTOS-EUROS
conventional model run (see Fig. <xref ref-type="fig" rid="Ch1.F1"/>a). In the
second step (the forecast step) the model propagates the ensemble members
from the time <inline-formula><mml:math id="M13" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M15" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula>, the time step is 10 min):

              <disp-formula id="Ch1.E1" content-type="numbered"><mml:math id="M16" display="block"><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msubsup><mml:mi mathvariant="bold-italic">ξ</mml:mi><mml:mi>j</mml:mi><mml:mi>f</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">ξ</mml:mi><mml:mi>j</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:mo>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

        The operator <inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> describes the time evolution of the state which
contains the ash concentrations in all model grid boxes. The state at the
time <inline-formula><mml:math id="M18" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> has a distribution with the mean <inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and the
forecast error covariance matrix <inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">P</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> given by

              <disp-formula specific-use="align" content-type="numbered"><mml:math id="M21" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E2"><mml:mtd/><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msup><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo>[</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:msubsup><mml:mi mathvariant="bold-italic">ξ</mml:mi><mml:mi>j</mml:mi><mml:mi>f</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>]</mml:mo><mml:mo>/</mml:mo><mml:mi>N</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E3"><mml:mtd/><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msup><mml:mi mathvariant="bold">L</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo>[</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">ξ</mml:mi><mml:mn mathvariant="normal">1</mml:mn><mml:mi>f</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi>f</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mi mathvariant="normal">⋯</mml:mi><mml:mo>,</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">ξ</mml:mi><mml:mi>N</mml:mi><mml:mi>f</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi>f</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>]</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E4"><mml:mtd/><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msup><mml:mi mathvariant="bold">P</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo>[</mml:mo><mml:msup><mml:mi mathvariant="bold">L</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:msup><mml:mi mathvariant="bold">L</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mi>T</mml:mi></mml:msup><mml:mo>]</mml:mo><mml:mo>/</mml:mo><mml:mo>(</mml:mo><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          where <inline-formula><mml:math id="M22" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">L</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> represents the ensemble perturbation matrix. In
this study, the forecast step is performed in parallel because of the
natural/common parallelism of the independent ensemble propagation, which is
a trivial approach when employing ensemble-based DA
<xref ref-type="bibr" rid="bib1.bibx24 bib1.bibx38 bib1.bibx23" id="paren.30"/>.</p>
      <p>When the model propagates to 09:40 UTC, 18 May 2010, the volcanic ash state
gets sequentially analyzed by the DA process by combining real aircraft in
situ measurements of PM<inline-formula><mml:math id="M23" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">10</mml:mn></mml:msub></mml:math></inline-formula> and PM<inline-formula><mml:math id="M24" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2.5</mml:mn></mml:msub></mml:math></inline-formula> concentrations until
11:10 UTC. The measurement route and values are demonstrated in
Fig. <xref ref-type="fig" rid="Ch1.F1"/>b, c and the details are described in
<xref ref-type="bibr" rid="bib1.bibx39" id="text.31"/> and <xref ref-type="bibr" rid="bib1.bibx12" id="text.32"/>. The observational
network at time <inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is defined by the operator <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> which maps the
state vector <inline-formula><mml:math id="M27" display="inline"><mml:mi mathvariant="bold-italic">x</mml:mi></mml:math></inline-formula> to the observational vector <inline-formula><mml:math id="M28" display="inline"><mml:mi mathvariant="bold-italic">y</mml:mi></mml:math></inline-formula> by

              <disp-formula id="Ch1.E5" content-type="numbered"><mml:math id="M29" display="block"><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>H</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        where <inline-formula><mml:math id="M30" display="inline"><mml:mi mathvariant="bold-italic">y</mml:mi></mml:math></inline-formula> contains the aircraft measurements and <inline-formula><mml:math id="M31" display="inline"><mml:mi mathvariant="bold-italic">v</mml:mi></mml:math></inline-formula> represents
the observational error. <inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> selects the grid cell in <inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> that
corresponds to the locations of the observation. When measurements are
available, the ensemble members are updated in the analysis step using

              <disp-formula specific-use="align" content-type="numbered"><mml:math id="M34" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E6"><mml:mtd/><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi mathvariant="bold">K</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">P</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mi mathvariant="bold">H</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mi>T</mml:mi></mml:msup><mml:mo>[</mml:mo><mml:mi mathvariant="bold">H</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:msup><mml:mi mathvariant="bold">P</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mi mathvariant="bold">H</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mi>T</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:mi mathvariant="bold">R</mml:mi><mml:msup><mml:mo>]</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E7"><mml:mtd/><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msubsup><mml:mi mathvariant="bold-italic">ξ</mml:mi><mml:mi>j</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">ξ</mml:mi><mml:mi>j</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mi mathvariant="bold">K</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>[</mml:mo><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:mi mathvariant="bold">H</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">ξ</mml:mi><mml:mi>j</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>]</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          where <inline-formula><mml:math id="M35" display="inline"><mml:mi mathvariant="bold">K</mml:mi></mml:math></inline-formula> represents the Kalman gain, <inline-formula><mml:math id="M36" display="inline"><mml:mi mathvariant="bold">H</mml:mi></mml:math></inline-formula> is the
observational matrix formed by the observational operator <inline-formula><mml:math id="M37" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M38" display="inline"><mml:mi mathvariant="bold">R</mml:mi></mml:math></inline-formula>
represents the measurement error covariance matrix, and <inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>
represents the realization out of the observation error distribution
<inline-formula><mml:math id="M40" display="inline"><mml:mi mathvariant="bold-italic">v</mml:mi></mml:math></inline-formula>. After the continuous assimilation ending at 11:10 UTC, the
forecast at 12:00 UTC is illustrated in Fig. <xref ref-type="fig" rid="Ch1.F1"/>d,
for which the forecast accuracy has been carefully evaluated as significantly
improved compared to the case without DA <xref ref-type="bibr" rid="bib1.bibx12" id="paren.33"/>.</p>
      <p>The EnKF with the above setups is abbreviated as “conventional EnKF” and
used in this study for the computational evaluation. Note that in the study
we do not use covariance localization as proposed by
<xref ref-type="bibr" rid="bib1.bibx15" id="text.34"/> for reducing spurious covariances. This
is because although localization is possible, the ideal case is not to use it
in order to have the correct covariances in a large (converged) ensemble. It
is crucial for localization that when unphysical (spurious) covariances are
eliminated, physical (correct) covariances can be well maintained
<xref ref-type="bibr" rid="bib1.bibx32" id="paren.35"/>. If the “filtering length scale” for
localization is too long (i.e., all the dynamical covariances are allowed),
many of the spurious covariances may not be eliminated. If the length is too
short, important physical dynamical covariances then may be lost together
with the spurious ones. Therefore, essentially deciding on an accurate
localization is a challenging subject
<xref ref-type="bibr" rid="bib1.bibx34 bib1.bibx20" id="paren.36"/>, especially for
accuracy-demanding applications. Therefore, in this study we choose the
ensemble size of 100 to guarantee the accuracy and avoid large spurious
covariances.</p>
</sec>
<sec id="Ch1.S3">
  <title>Computational analysis for volcanic ash DA</title>
<sec id="Ch1.S3.SS1">
  <title>Computational analysis of the total runtime</title>
      <p>Ensemble-based DA is a useful approach to improve the forecast accuracy of
volcanic ash transport. However, if it is time-consuming, it cannot be taken
as efficient due to the high requirement on speed for volcanic ash DA (see
Sect. <xref ref-type="sec" rid="Ch1.S1"/>). Based on this consideration, we need to analyze the
computational cost of a conventional volcanic ash DA system.</p>
      <p>As introduced in Sect. <xref ref-type="sec" rid="Ch1.S2"/>, the total execution time of
conventional EnKF comprises four parts, i.e., initialization, forecast,
analysis, and other computational cost. The initialization time includes
reading meteorological data, initializing model geographical and grid
configurations, reading emission information, initializing stochastic
observers for reading and transforming observations to the model grid, and
initializing all the ensemble states and ensemble means. The forecast time is
obtained from Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>), while the analysis time corresponds to
the computational sum from Eqs. (<xref ref-type="disp-formula" rid="Ch1.E2"/>) to (<xref ref-type="disp-formula" rid="Ch1.E7"/>).
The other computational time includes script compiling, setting environment
variables, and starting and finalizing DA algorithms.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1" specific-use="star"><caption><p>Comparison of the computational cost of conventional EnKF and
MS-EnKF. (The results are obtained from the bullx B720 thin nodes of the
Cartesius cluster, which is a computing facility of SURFsara, the Netherlands
Supercomputing Centre. Each node is configured with 2 <inline-formula><mml:math id="M41" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 12-core 2.6
GHz Intel Xeon E5-2690 v3 (Haswell) CPUs and with memory 64 GB.)
</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Case</oasis:entry>  
         <oasis:entry colname="col2">Conventional EnKF</oasis:entry>  
         <oasis:entry colname="col3">MS-EnKF</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Cores used</oasis:entry>  
         <oasis:entry colname="col2">102</oasis:entry>  
         <oasis:entry colname="col3">102</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Tracer number (n<inline-formula><mml:math id="M42" display="inline"><mml:msub><mml:mi/><mml:mrow><mml:mi>s</mml:mi><mml:mi>p</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>)</oasis:entry>  
         <oasis:entry colname="col2">6</oasis:entry>  
         <oasis:entry colname="col3">6</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Measurements of tracers (m)</oasis:entry>  
         <oasis:entry colname="col2">2</oasis:entry>  
         <oasis:entry colname="col3">2</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Ensemble size (N)</oasis:entry>  
         <oasis:entry colname="col2">100</oasis:entry>  
         <oasis:entry colname="col3">100</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Parallel in forecast step</oasis:entry>  
         <oasis:entry colname="col2">Yes</oasis:entry>  
         <oasis:entry colname="col3">Yes</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Parallel in analysis step</oasis:entry>  
         <oasis:entry colname="col2">No</oasis:entry>  
         <oasis:entry colname="col3">No</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Mask state in analysis step</oasis:entry>  
         <oasis:entry colname="col2"><bold>No</bold></oasis:entry>  
         <oasis:entry colname="col3"><bold>Yes</bold></oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Initialization</oasis:entry>  
         <oasis:entry colname="col2">0.42 h</oasis:entry>  
         <oasis:entry colname="col3">0.42 h</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Forecast</oasis:entry>  
         <oasis:entry colname="col2">0.65 h</oasis:entry>  
         <oasis:entry colname="col3">0.65 h</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Analysis</oasis:entry>  
         <oasis:entry colname="col2"><bold>3.14</bold> h</oasis:entry>  
         <oasis:entry colname="col3"><bold>0.88</bold> h</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Others</oasis:entry>  
         <oasis:entry colname="col2">0.15 h</oasis:entry>  
         <oasis:entry colname="col3">0.12 h</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Total runtime</oasis:entry>  
         <oasis:entry colname="col2"><bold>4.36</bold> h</oasis:entry>  
         <oasis:entry colname="col3"><bold>1.95</bold> h</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p>h: hour; simulation window = <bold>3.0</bold> h; the time is wall clock time.</p></table-wrap-foot></table-wrap>

      <p>The evaluation result of the conventional EnKF is shown in
Table <xref ref-type="table" rid="Ch1.T1"/> (the middle column). It can be seen that the
total computational time (4.36 h) is relatively large compared to the
simulation window (3.0 h, i.e., from 09:00 to 12:00 UTC, 18 May 2010),
which is too much in an operational sense. Therefore, in this study, we aim
to accelerate the computation to within an acceptable runtime (i.e.,
requiring less runtime than the time period of the DA application).</p>
      <p>It can also be observed from Table <xref ref-type="table" rid="Ch1.T1"/> that the main
contribution to the total execution time is the analysis step. Compared to
the initialization and forecast time, the analysis stage takes 72 % of the
total runtime. Due to the expensive analysis step, although some approaches
(such as MPI-parallel I/O <xref ref-type="bibr" rid="bib1.bibx9" id="altparen.37"/>, domain
decomposition <xref ref-type="bibr" rid="bib1.bibx37" id="altparen.38"/>) can potentially accelerate the
initialization and forecast step, the effect on the final acceleration of the
total computational cost is little. Therefore, to get an acceptable
computational time, the cost reduction in the analysis step is the target.
One may wonder that since the number of observations is small, why does
analysis take so much time? The large state vector seems to be left
responsible for the problem. To know the exact reason, the detailed
computational cost of the analysis step must be evaluated.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <title>Cost estimation of all analysis procedures</title>
      <p>We start with the formulations of the analysis step. The analysis step is
represented by Eq. (<xref ref-type="disp-formula" rid="Ch1.E7"/>), which can be written in a full
matrix format with Eq. (<xref ref-type="disp-formula" rid="Ch1.E8"/>),

                <disp-formula id="Ch1.E8" content-type="numbered"><mml:math id="M43" display="block"><mml:mstyle class="stylechange" displaystyle="true"/><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">a</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="bold">K</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">Y</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold">H</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where the subscripts represent the matrix's dimensions.
<inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> represent the forecasted
and analyzed ensemble state matrix, and are respectively built up from
<inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">ξ</mml:mi><mml:mi>f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">ξ</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> with <inline-formula><mml:math id="M48" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> ensembles.
The measurement ensemble matrix <inline-formula><mml:math id="M49" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula> is formed by an ensemble of
<inline-formula><mml:math id="M50" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="bold-italic">v</mml:mi></mml:mrow></mml:math></inline-formula> (see Eq. <xref ref-type="disp-formula" rid="Ch1.E7"/>). <inline-formula><mml:math id="M51" display="inline"><mml:mi mathvariant="bold">H</mml:mi></mml:math></inline-formula> is the
observational matrix, which is used to select state variables (at measurement
locations) in the full ensemble state matrix corresponding to the measurement
ensemble matrix <inline-formula><mml:math id="M52" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula>. <inline-formula><mml:math id="M53" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> is the number of model state variables in a
three-dimensional (3-D) domain, i.e., <inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mn mathvariant="normal">6</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> in this study (see
Sect. <xref ref-type="sec" rid="Ch1.S2"/>). <inline-formula><mml:math id="M55" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula> is the number of measurements at one
assimilation time, which depends on the measurement type. For aircraft in
situ measurements used in this study (see
Fig. <xref ref-type="fig" rid="Ch1.F1"/>c), two measurements are made at each time
by one research flight, so that <inline-formula><mml:math id="M56" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula> is 2 here. <inline-formula><mml:math id="M57" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> is the ensemble size and
is taken as 100 in this study. As described in Eq. (<xref ref-type="disp-formula" rid="Ch1.E3"/>),
the ensemble perturbation matrix <inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">L</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> in EnKF can be
re-written as

                <disp-formula specific-use="align" content-type="numbered"><mml:math id="M59" display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msubsup><mml:mi mathvariant="bold">L</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">I</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:msub><mml:mn mathvariant="bold">1</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mlabeledtr id="Ch1.E9"><mml:mtd/><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>=</mml:mo><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold">B</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

            where <inline-formula><mml:math id="M60" display="inline"><mml:mi mathvariant="bold">I</mml:mi></mml:math></inline-formula> is an <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula> unit matrix and <inline-formula><mml:math id="M62" display="inline"><mml:mn mathvariant="bold">1</mml:mn></mml:math></inline-formula> is an
<inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula> matrix with all elements equal to 1. Thus,
<inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">L</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mi mathvariant="bold">B</mml:mi></mml:mrow></mml:math></inline-formula> where
<inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">B</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is introduced to represent
<inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">I</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:msub><mml:mn mathvariant="bold">1</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, so that
<inline-formula><mml:math id="M67" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">HL</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mi mathvariant="bold">B</mml:mi></mml:mrow></mml:math></inline-formula>, where
<inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">O</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> is used to represent (<inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">HA</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>). Here we explicitly express <inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">L</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>
and <inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">HL</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> in the form of <inline-formula><mml:math id="M72" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>
and <inline-formula><mml:math id="M73" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, respectively. This is because in our volcanic
ash DA system, <inline-formula><mml:math id="M74" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> are two of
the three inputs (another one is the measurement ensemble matrix <inline-formula><mml:math id="M76" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula>
for the analysis step). These are the three inputs used for actual
computations in the analysis step. As shown in
Fig. <xref ref-type="fig" rid="Ch1.F2"/>a, <inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> is obtained from the
forecast step, and <inline-formula><mml:math id="M78" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M79" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula> are acquired from
our stochastic observer module (see Fig. <xref ref-type="fig" rid="Ch1.F2"/>a) which is
used for a volcanic ash transport model to integrate geophysical
measurements. With the input <inline-formula><mml:math id="M80" display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula>, the measurement error covariance
<inline-formula><mml:math id="M81" display="inline"><mml:mi mathvariant="bold">R</mml:mi></mml:math></inline-formula>, as introduced in Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>), can then be computed
with

                <disp-formula specific-use="align" content-type="numbered"><mml:math id="M82" display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msub><mml:mi mathvariant="bold">R</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">Y</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold">Y</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mrow><mml:mi>m</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">Y</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold">Y</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mrow><mml:mi>m</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:mtd></mml:mtr><mml:mlabeledtr id="Ch1.E10"><mml:mtd/><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>(</mml:mo><mml:mi mathvariant="bold">YB</mml:mi><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="bold">YB</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula></p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2" specific-use="star"><caption><p>Computational evaluation of the analysis step.
<bold>(a)</bold> Illustration of the analysis step.
<bold>(b)</bold> Computational cost of all sub-parts of the analysis step.

</p></caption>
          <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/10/1751/2017/gmd-10-1751-2017-f02.png"/>

        </fig>

      <p>Based on previous definitions and Eqs. (<xref ref-type="disp-formula" rid="Ch1.E2"/>) to
(<xref ref-type="disp-formula" rid="Ch1.E7"/>), the analysis step can be reformulated as follows:

                <disp-formula specific-use="align" content-type="numbered"><mml:math id="M83" display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">a</mml:mi></mml:msubsup></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:mi mathvariant="bold">K</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>-</mml:mo><mml:msup><mml:mi mathvariant="bold">HA</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mi mathvariant="bold">P</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:msup><mml:mi mathvariant="bold">H</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">HP</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:msup><mml:mi mathvariant="bold">H</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>+</mml:mo><mml:mi mathvariant="bold">R</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>(</mml:mo><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>-</mml:mo><mml:msup><mml:mi mathvariant="bold">HA</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfrac></mml:mstyle><mml:msup><mml:mi mathvariant="bold">L</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">HL</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup><mml:mo>[</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">HL</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">HL</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:mtd></mml:mtr><mml:mlabeledtr id="Ch1.E11"><mml:mtd/><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>+</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>(</mml:mo><mml:mi mathvariant="bold">YB</mml:mi><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="bold">YB</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup><mml:msup><mml:mo>]</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>(</mml:mo><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>-</mml:mo><mml:msup><mml:mi mathvariant="bold">HA</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mtr><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mi mathvariant="bold">B</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mi mathvariant="bold">B</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup><mml:mo>[</mml:mo><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mi mathvariant="bold">B</mml:mi><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mi mathvariant="bold">B</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>+</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="bold">YB</mml:mi><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="bold">YB</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup><mml:msup><mml:mo>]</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>(</mml:mo><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>-</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo mathvariant="italic">{</mml:mo><mml:mi mathvariant="bold">I</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="bold">B</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mi mathvariant="bold">B</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup><mml:mo>[</mml:mo><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mi mathvariant="bold">B</mml:mi><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mi mathvariant="bold">B</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>+</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="bold">YB</mml:mi><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="bold">YB</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup><mml:msup><mml:mo>]</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>(</mml:mo><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>-</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>)</mml:mo><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

            where

                <disp-formula specific-use="align" content-type="numbered"><mml:math id="M84" display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mi mathvariant="bold">I</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="bold">B</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mi mathvariant="bold">B</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup><mml:mo>[</mml:mo><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mi mathvariant="bold">B</mml:mi><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mi mathvariant="bold">B</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:mtd></mml:mtr><mml:mlabeledtr id="Ch1.E12"><mml:mtd/><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>+</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="bold">YB</mml:mi><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="bold">YB</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup><mml:msup><mml:mo>]</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>(</mml:mo><mml:mi mathvariant="bold">Y</mml:mi><mml:mo>-</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>)</mml:mo><mml:mo mathvariant="italic">}</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

            Equation (<xref ref-type="disp-formula" rid="Ch1.E11"/>) shows how the analysis step is performed
in a volcanic ash DA system. In order to accelerate the analysis step, the
most time-consuming part must be reduced. Figure <xref ref-type="fig" rid="Ch1.F2"/>b
shows estimations of the computational cost for each procedure in the
analysis step. Considering that the state number <inline-formula><mml:math id="M85" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> (<inline-formula><mml:math id="M86" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mn mathvariant="normal">6</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>) is
significantly larger than the measurement number <inline-formula><mml:math id="M87" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula> (<inline-formula><mml:math id="M88" display="inline"><mml:mrow><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> here) and the
ensemble size <inline-formula><mml:math id="M89" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> (<inline-formula><mml:math id="M90" display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula>), the most time-consuming procedure in the
analysis step is thus the last one, that is
<inline-formula><mml:math id="M91" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> with a
computational cost of O(<inline-formula><mml:math id="M92" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>). Therefore, in our volcanic ash DA system,
this part is the most time-consuming part in the analysis step. Note that the
procedure <inline-formula><mml:math id="M93" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mi mathvariant="bold">B</mml:mi><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold">O</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mi mathvariant="bold">B</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup><mml:mo>+</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="bold">YB</mml:mi><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="bold">YB</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mo>′</mml:mo></mml:msup><mml:msup><mml:mo>]</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> for singular value decomposition (SVD) in our
study is not time-consuming, which is different from applications of
reservoir history matching
<xref ref-type="bibr" rid="bib1.bibx38 bib1.bibx23" id="paren.39"/>. This is because
of the SVD procedure costs O(<inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:msup><mml:mi>m</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>), and due to the measurement size on the
order of the size of the state in those cases, the SVD procedure thus
requires a huge computational cost for reservoir DA.</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <title>The mask-state algorithm (MS) for acceleration of the analysis step</title>
<sec id="Ch1.S4.SS1">
  <?xmltex \opttitle{Characteristic of ensemble state matrix $\mathbf{A}^{\mathrm{f}}$}?><title>Characteristic of ensemble state matrix <inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula></title>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3"><caption><p>Characteristics of a volcanic ash state.

</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/10/1751/2017/gmd-10-1751-2017-f03.png"/>

        </fig>

      <p>Analysis in the previous section shows that
<inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> is most expensive
in the analysis step. Each column of <inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> is constructed
from a forecasted ensemble state; thus, the dimension of
<inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> is <inline-formula><mml:math id="M99" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>. In each column, the element values
correspond to volcanic ash concentrations in a 3-D domain.
Figure <xref ref-type="fig" rid="Ch1.F3"/> shows the coverage of all ensemble
forecast states at a selected time, 10:00 UTC, 18 May 2010, without loss of
generality. A common phenomenon can be observed: that is, only a part of the
3-D domain is filled with volcanic ash. The ash clouds only concentrate in a
plume which is transported over time. This is because volcanic eruption is a
fast and strong process. The advection dominates the transport, and the
volcanic ash plume is transported with the wind. This is a particular
characteristic of volcanic ash transport, in contrast to other
atmospheric-related applications such as ozone <xref ref-type="bibr" rid="bib1.bibx6" id="paren.40"/>,
SO<inline-formula><mml:math id="M100" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx3" id="paren.41"/>, and CO<inline-formula><mml:math id="M101" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>
<xref ref-type="bibr" rid="bib1.bibx5" id="paren.42"/>. For those applications, the concentrations are
everywhere in the domain, the emission sources are also everywhere, and
observations are available throughout the domain too (especially for
satellite data), whereas for application of volcanic ash transport, the
source emission is only at the volcano; thus, usually only a limited domain
is polluted by ash. As shown in Fig. <xref ref-type="fig" rid="Ch1.F3"/>, in the
3-D domain with a grid size of 3.888 <inline-formula><mml:math id="M102" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 10<inline-formula><mml:math id="M103" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">6</mml:mn></mml:msup></mml:math></inline-formula>, the number of grids
in the area with volcanic ash is counted as 1.528 <inline-formula><mml:math id="M104" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 10<inline-formula><mml:math id="M105" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">6</mml:mn></mml:msup></mml:math></inline-formula>,
whereas the number of no-ash grids is 2.36 <inline-formula><mml:math id="M106" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 10<inline-formula><mml:math id="M107" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">6</mml:mn></mml:msup></mml:math></inline-formula>. Note that
shown in the figure are accumulated ash coverages of all ensemble states;
thus, in the no-ash grids, there is no ash for any of the ensemble states.
Thus a very large number of rows in <inline-formula><mml:math id="M108" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> are zero
corresponding to the no-ash grids. These zero rows in <inline-formula><mml:math id="M109" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>
have no contributions to <inline-formula><mml:math id="M110" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>, because a zero row in <inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> always results in
a zero row in <inline-formula><mml:math id="M112" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>. Therefore, for the case of
Fig. <xref ref-type="fig" rid="Ch1.F3"/>, <inline-formula><mml:math id="M113" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula> of the computations are
redundant and can be avoided. To realize this, one may think to limit the
domain for the entire assimilation step; then, the number of zero rows
certainly would be largely reduced. This is actually incorrect, because these
zero rows are changing along with the transport of ash clouds, and are not
constant at each analysis step. So the full domain must be considered and it
should be adaptive (choose different zero rows according to different
<inline-formula><mml:math id="M114" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> at different analysis times).</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4"><caption><p>Algorithms for CSR-based SDMM to compute the multiplication of
sparse matrix <inline-formula><mml:math id="M115" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and dense matrix <inline-formula><mml:math id="M116" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula>.
<bold>(a)</bold> Multiplication of <inline-formula><mml:math id="M117" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> by a column vector <inline-formula><mml:math id="M118" display="inline"><mml:mi mathvariant="bold-italic">v</mml:mi></mml:math></inline-formula> (in
<inline-formula><mml:math id="M119" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula>) by
CSR-based sparse matrix vector multiplication (SpMV).
<inline-formula><mml:math id="M120" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi mathvariant="bold-italic">a</mml:mi><mml:mi mathvariant="bold-italic">l</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M121" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">l</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi mathvariant="bold-italic">i</mml:mi><mml:mi mathvariant="bold-italic">d</mml:mi><mml:mi mathvariant="bold-italic">x</mml:mi></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M122" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi mathvariant="bold-italic">t</mml:mi><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:math></inline-formula> are the three arrays to
represent <inline-formula><mml:math id="M123" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> in CSR format.
<bold>(b)</bold> Looping SpMV N times (each with one column of <inline-formula><mml:math id="M124" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula>) to obtain
<inline-formula><mml:math id="M125" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>.

</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/10/1751/2017/gmd-10-1751-2017-f04.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS2">
  <title>Derivation of the mask-state algorithm (MS)</title>
      <p>Here we introduce item <inline-formula><mml:math id="M126" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">noash</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to represent the number of zero
rows in the ensemble state matrix <inline-formula><mml:math id="M127" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, and use
<inline-formula><mml:math id="M128" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to represent the number of other rows (also,
<inline-formula><mml:math id="M129" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> represents the grid size of ash plume). When computing
<inline-formula><mml:math id="M130" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>, to avoid all the
computations related to <inline-formula><mml:math id="M131" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">noash</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> rows with zero elements, the index
of other <inline-formula><mml:math id="M132" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> rows must first be decided. This index is meant to
reduce the dimensions of <inline-formula><mml:math id="M133" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>. After getting a
<inline-formula><mml:math id="M134" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> with a dimension of <inline-formula><mml:math id="M135" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>, the
index will be used again to reconstruct the full matrix
<inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> with the dimension of <inline-formula><mml:math id="M137" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>. Based on this
idea, we propose a mask-state algorithm (MS) which deals with the
time-consuming analysis update. MS includes five steps.
<list list-type="custom"><list-item><label>i.</label><p>Compute ensemble mean state <inline-formula><mml:math id="M138" display="inline"><mml:mrow><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>. The mean state
<inline-formula><mml:math id="M139" display="inline"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> can be easily computed by
averaging <inline-formula><mml:math id="M140" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> along <inline-formula><mml:math id="M141" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> columns. Due to
all elements in <inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> corresponding to ash
concentrations, all elements in <inline-formula><mml:math id="M143" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> are
larger than or equal to zero. The index of non-zero rows in
<inline-formula><mml:math id="M144" display="inline"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> is thus equivalent to that in
<inline-formula><mml:math id="M145" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>. The computational cost for this step
is O(<inline-formula><mml:math id="M146" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>).</p></list-item><list-item><label>ii.</label><p>Construct mask array <inline-formula><mml:math id="M147" display="inline"><mml:mi mathvariant="bold">z</mml:mi></mml:math></inline-formula>. Based on previously obtained <inline-formula><mml:math id="M148" display="inline"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>,
we search the non-zero elements of
<inline-formula><mml:math id="M149" display="inline"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> and record the index into a mask
array <inline-formula><mml:math id="M150" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">z</mml:mi><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>. With this strategy, we do not
need to search the full matrix <inline-formula><mml:math id="M151" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> and
build an index matrix for storage. This is a benefit for saving memory. The
computational cost for this step is O(<inline-formula><mml:math id="M152" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>).</p></list-item><list-item><label>iii.</label><p>Construct masked ensemble state matrix <inline-formula><mml:math id="M153" display="inline"><mml:mrow><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>.
Using the mask array <inline-formula><mml:math id="M154" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">z</mml:mi><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> obtained from
step (ii), <inline-formula><mml:math id="M155" display="inline"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> can be
constructed column by column according to Eq. (<xref ref-type="disp-formula" rid="Ch1.E13"/>), and
the computational cost (overhead) for this step is O(<inline-formula><mml:math id="M156" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>).<disp-formula id="Ch1.E13" content-type="numbered"><mml:math id="M157" display="block"><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="bold">f</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mi mathvariant="bold">z</mml:mi><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p></list-item><list-item><label>iv.</label><p>Compute <inline-formula><mml:math id="M158" display="inline"><mml:mrow><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> by multiplying
<inline-formula><mml:math id="M159" display="inline"><mml:mrow><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M160" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula>. Perform matrix computation
<inline-formula><mml:math id="M161" display="inline"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">a</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>. This step is similar to
<inline-formula><mml:math id="M162" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>, as described in
Sect. <xref ref-type="sec" rid="Ch1.S3.SS2"/>, but the computational cost now becomes
O(<inline-formula><mml:math id="M163" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>) instead of O(<inline-formula><mml:math id="M164" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>).</p></list-item><list-item><label>v.</label><p>Construct analyzed ensemble state matrix <inline-formula><mml:math id="M165" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>. With the
computed <inline-formula><mml:math id="M166" display="inline"><mml:mrow><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> from step (iv) and the mask array
<inline-formula><mml:math id="M167" display="inline"><mml:mi mathvariant="bold">z</mml:mi></mml:math></inline-formula> from step (ii), the final analyzed ensemble state matrix
<inline-formula><mml:math id="M168" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">a</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> can be constructed based on
Eq. (<xref ref-type="disp-formula" rid="Ch1.E14"/>). The computational cost (overhead) for this step
is O(<inline-formula><mml:math id="M169" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>).<disp-formula id="Ch1.E14" content-type="numbered"><mml:math id="M170" display="block"><mml:mstyle displaystyle="true" class="stylechange"/><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mi mathvariant="bold">z</mml:mi><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p></list-item></list></p>
      <p>According to the derivations of MS, the computational costs related to zero
rows are avoided. Here the “zero rows” do not equal “zero elements”. The
former corresponds to the regions where there is no ash for all the ensemble
members, while the latter also counts the no-ash regions specifically for
some ensembles. Certainly the consideration of all “zero elements” can
include all the sparsity information of the ensemble state matrix, but extra
computations and memories must be spent on searching the full matrix
<inline-formula><mml:math id="M171" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> with a computational cost of O(<inline-formula><mml:math id="M172" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>)
and storing a mask-state matrix with dimensions of <inline-formula><mml:math id="M173" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>. This is
expensive compared to constructing the mask array in procedure (ii).
Actually, after a careful check of the volcanic ash ensemble plumes, there is
no “bad” ensemble which is really different from others. Although the
concentration levels in ensemble members are distinct, the main direction and
the occurrence to the grid cells are more or less the same. This means that
the “zero rows” actually more or less equal “zero elements” but are much
faster than the way with “zero elements”, which confirms the suitability
and advantage of procedure (ii). Probably when there are big meteorological
uncertainties, the “zero elements” will be much larger than “zero rows”.
In this case, how to make use of the sparsity information in the ensemble
state matrix will be considered in future.</p>
      <p>Based on procedures of MS, the computational cost of
<inline-formula><mml:math id="M174" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> can be reduced.
However, without a careful evaluation, we cannot conclude MS is fast, because
the algorithm also employs other procedures. If these procedures (i), (ii),
(iii), and (v) are much cheaper than the main procedure (iv), MS can
definitely speed up the analysis step, and vice versa. Now we analyze MS's
computational cost, which can be summed as O(<inline-formula><mml:math id="M175" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>) + O(<inline-formula><mml:math id="M176" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>) +
O(<inline-formula><mml:math id="M177" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>) + O(<inline-formula><mml:math id="M178" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>) + O(<inline-formula><mml:math id="M179" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>), i.e., O(<inline-formula><mml:math id="M180" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>). Thus, the computational overhead involved in
transforming the full matrix to a small one (i.e., O(<inline-formula><mml:math id="M181" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>) for
procedure (iii)) has little effect on the total computation cost of MS (i.e.,
O(<inline-formula><mml:math id="M182" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>)). However, the computational overhead of
transforming the small matrix to the full one (i.e., O(<inline-formula><mml:math id="M183" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>) for procedure
(v)) does contribute a part, which cannot be ignored, to the total MS's
computational cost. The computational cost without MS is O(<inline-formula><mml:math id="M184" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>).</p>
      <p>The comparison between both costs (with and without MS, i.e., O(<inline-formula><mml:math id="M185" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>) and O(<inline-formula><mml:math id="M186" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>)) indicates when the number of non-zero
rows (<inline-formula><mml:math id="M187" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, i.e., the number of grids with ash) of the forecasted
ensemble state matrix satisfies <inline-formula><mml:math id="M188" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>&lt;</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula>; then, MS
can accelerate <inline-formula><mml:math id="M189" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>.
Here, O(<inline-formula><mml:math id="M190" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>) and O(<inline-formula><mml:math id="M191" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>) are on the same order
when <inline-formula><mml:math id="M192" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>&lt;</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula>. The larger the difference between
<inline-formula><mml:math id="M193" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M194" display="inline"><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula>, the better the speedup can be
achieved. According to this analysis, and the characteristic (e.g.,
<inline-formula><mml:math id="M195" display="inline"><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow><mml:mi>n</mml:mi></mml:mfrac></mml:mstyle></mml:math></inline-formula> approximately equals <inline-formula><mml:math id="M196" display="inline"><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">3</mml:mn></mml:mfrac></mml:mstyle></mml:math></inline-formula> in this case)
of volcanic ash transport as described in Sect. <xref ref-type="sec" rid="Ch1.S4.SS1"/>, the
relation is certainly satisfied and is actually <inline-formula><mml:math id="M197" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>≪</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula> (significantly smaller) for our study.
Therefore, for our volcanic ash DA system, with MS, the computational cost
for the time-consuming part <inline-formula><mml:math id="M198" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> is O(<inline-formula><mml:math id="M199" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>), which is much reduced compared to
O(<inline-formula><mml:math id="M200" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>) with conventional computations.</p>
      <p>The relation <inline-formula><mml:math id="M201" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>&lt;</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula> indicates whether we would
have speedup by the MS method; actually, it can be extended to
Eq. (<xref ref-type="disp-formula" rid="Ch1.E15"/>),

                <disp-formula id="Ch1.E15" content-type="numbered"><mml:math id="M202" display="block"><mml:mstyle class="stylechange" displaystyle="true"/><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">ms</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mfenced open="(" close=")"><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi>n</mml:mi><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          which explicitly specifies the expected amount of speedup (<inline-formula><mml:math id="M203" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">ms</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>)
of <inline-formula><mml:math id="M204" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> by the MS
algorithm. In this case study, <inline-formula><mml:math id="M205" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> is taken at 100 and
<inline-formula><mml:math id="M206" display="inline"><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow><mml:mi>n</mml:mi></mml:mfrac></mml:mstyle><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>≈</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">3</mml:mn></mml:mfrac></mml:mstyle></mml:mrow></mml:math></inline-formula>, so <inline-formula><mml:math id="M207" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">ms</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is
approximately 3.0.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T2" specific-use="star"><caption><p>Computational evaluation of all the steps of the mask-state
algorithm (MS) for <inline-formula><mml:math id="M208" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>. (See the details of each step of MS in
Sect. <xref ref-type="sec" rid="Ch1.S4.SS2"/>.) </p></caption><oasis:table frame="topbot"><oasis:tgroup cols="2">
     <oasis:colspec colnum="1" colname="col1" align="justify" colwidth="227.622047pt"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Sub-step</oasis:entry>  
         <oasis:entry colname="col2">Computational time</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>  
         <oasis:entry colname="col1">(i) Compute ensemble mean state <inline-formula><mml:math id="M209" display="inline"><mml:mrow><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2">0.0097 h</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">(ii) Construct mask array <inline-formula><mml:math id="M210" display="inline"><mml:mi mathvariant="bold">z</mml:mi></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2">0.0002 h</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">(iii) Construct masked ensemble state matrix <inline-formula><mml:math id="M211" display="inline"><mml:mrow><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2">0.0057 h</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">(iv) Compute <inline-formula><mml:math id="M212" display="inline"><mml:mrow><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> by multiplying <inline-formula><mml:math id="M213" display="inline"><mml:mrow><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M214" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2"><bold>0.8474</bold> h</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">(v) Construct analyzed ensemble state matrix <inline-formula><mml:math id="M215" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2">0.0070 h</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Total</oasis:entry>  
         <oasis:entry colname="col2"><bold>0.87</bold> h</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p>h: hour; the time is wall clock time.</p></table-wrap-foot></table-wrap>

      <p>According to Amdahl's law <xref ref-type="bibr" rid="bib1.bibx1" id="paren.43"/>, the total computational speedup
(<inline-formula><mml:math id="M216" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">total</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) by MS can be predicted by
Eq. (<xref ref-type="disp-formula" rid="Ch1.E16"/>),

                <disp-formula id="Ch1.E16" content-type="numbered"><mml:math id="M217" display="block"><mml:mstyle class="stylechange" displaystyle="true"/><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">total</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi mathvariant="normal">ms</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi mathvariant="normal">ms</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">ms</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M218" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi mathvariant="normal">ms</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the proportion of the computational cost of
<inline-formula><mml:math id="M219" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> in the overall DA
computations. It has been evaluated that the computational cost of
<inline-formula><mml:math id="M220" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> dominates the
analysis step (see Fig. <xref ref-type="fig" rid="Ch1.F2"/>b); thus, the proportion of
the computational cost of <inline-formula><mml:math id="M221" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> approximates the proportion of the analysis step in the total DA
computations (i.e., <inline-formula><mml:math id="M222" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi mathvariant="normal">ms</mml:mi></mml:msub><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>≈</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">72</mml:mn></mml:mrow></mml:math></inline-formula> % in this case, as
described in Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/>). Therefore, based on
Eq. (<xref ref-type="disp-formula" rid="Ch1.E16"/>), the maximum (“ideal”) computational speedup
can be predicted to be <inline-formula><mml:math id="M223" display="inline"><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi mathvariant="normal">ms</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:math></inline-formula> (i.e., <inline-formula><mml:math id="M224" display="inline"><mml:mrow><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>≈</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/></mml:mrow></mml:math></inline-formula>3.57
for this case study) when <inline-formula><mml:math id="M225" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">ms</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> approximates infinity. However,
this is not the actual speedup because <inline-formula><mml:math id="M226" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">ms</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is in fact specified
by Eq. (<xref ref-type="disp-formula" rid="Ch1.E15"/>). Based on the discussions above,
<inline-formula><mml:math id="M227" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">total</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> can therefore be estimated by
Eq. (<xref ref-type="disp-formula" rid="Ch1.E15"/>) at <inline-formula><mml:math id="M228" display="inline"><mml:mrow><mml:mo>≈</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/></mml:mrow></mml:math></inline-formula>2.0 in this case.</p>
</sec>
<sec id="Ch1.S4.SS3">
  <title>Experimental results</title>
      <p>Analysis of the algorithmic complexity of MS shows that MS is an efficient
approach to reduce the computational cost of the time-consuming
<inline-formula><mml:math id="M229" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>. Now MS will be
applied in the real volcanic ash DA system, to investigate whether in
practice it can speed up the analysis step well. We perform MS in the
conventional EnKF, which means initialization and forecast steps are all
computed as the conventional EnKF. The only difference between MS-EnKF and
conventional EnKF is that in the former MS is employed for the analysis step,
and in the latter is the standard analysis step. The result and related
specifications are shown in Table <xref ref-type="table" rid="Ch1.T1"/>. As introduced in
Sect. <xref ref-type="sec" rid="Ch1.S2"/>, the forecast step has been configured with the
conventional parallelization; thus, <inline-formula><mml:math id="M230" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula>+2 (102 here) cores are actually used
(one core for the DA algorithm, the other <inline-formula><mml:math id="M231" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula>+1 cores for the parallel
forecast of <inline-formula><mml:math id="M232" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> ensemble members and one ensemble mean). It can be seen from
Table <xref ref-type="table" rid="Ch1.T1"/> that MS indeed largely accelerates the analysis
step (as expected, by a factor of about 3.0 for this study), which confirms
the theoretical cost evaluation. The detailed experimental time for each step
of MS is shown in Table <xref ref-type="table" rid="Ch1.T2"/>. As expected, the
dense–dense matrix multiplication in step (iv) takes the largest part (i.e.,
0.8474 h for this case study) of the total computational time (0.87 h) of
MS. However, step (iv) has been a big improvement compared to the case
without MS (3.14 h; see Table <xref ref-type="table" rid="Ch1.T1"/>), which is because the
computational time for the other steps (e.g., steps (i–iii) cost only
0.0156 h to reduce the size of the ensemble state matrix) is little and
ignorable. Note that the total computational time of
<inline-formula><mml:math id="M233" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> with MS (i.e.,
0.87 h in Table <xref ref-type="table" rid="Ch1.T2"/>) is not exactly equal to the
computational time of the MS-EnKF analysis procedures (i.e., 0.88 h in
Table <xref ref-type="table" rid="Ch1.T1"/>). The subtraction (i.e., 0.01 h) corresponds
to the summed computational time of all the other analysis procedures (i.e.,
procedures 1–8) except for <inline-formula><mml:math id="M234" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> (see Fig. <xref ref-type="fig" rid="Ch1.F2"/>b and
Table <xref ref-type="table" rid="Ch1.T3"/>).</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T3" specific-use="star"><caption><p>Computational time for the analysis step of conventional EnKF,
MS-EnKF, and CSR-based-SDMM-EnKF. </p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="justify" colwidth="142.26378pt"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Analysis procedures (see Fig. <xref ref-type="fig" rid="Ch1.F2"/>b)</oasis:entry>  
         <oasis:entry colname="col2">Conventional EnKF</oasis:entry>  
         <oasis:entry colname="col3">MS-EnKF</oasis:entry>  
         <oasis:entry colname="col4">CSR-based-SDMM-EnKF</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>  
         <oasis:entry colname="col1">procedures 1–8</oasis:entry>  
         <oasis:entry colname="col2">0.01 h</oasis:entry>  
         <oasis:entry colname="col3">0.01 h</oasis:entry>  
         <oasis:entry colname="col4">0.01 h</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">procedure 9 (<inline-formula><mml:math id="M235" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>  
         <oasis:entry colname="col2"><bold>3.13</bold> h</oasis:entry>  
         <oasis:entry colname="col3"><bold>0.87</bold> h</oasis:entry>  
         <oasis:entry colname="col4"><bold>1.21</bold> h</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">  Total</oasis:entry>  
         <oasis:entry colname="col2"><bold>3.14</bold> h</oasis:entry>  
         <oasis:entry colname="col3"><bold>0.88</bold> h</oasis:entry>  
         <oasis:entry colname="col4"><bold>1.22</bold> h</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p>h: hour; the time is wall clock time. </p></table-wrap-foot></table-wrap>

      <p>MS is now experimentally proven as efficient to significantly reduce the
computational time for the analysis step during volcanic ash DA. Note that it
can also be observed that the computational time for the “other” parts in
Table <xref ref-type="table" rid="Ch1.T1"/> (such as operations for setting environmental
variables, starting and finalizing DA algorithms, as mentioned in
Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/>) is slightly reduced by the MS method (i.e.,
0.03 h in this case). This is because in the conventional EnKF, the ensemble
mean state <inline-formula><mml:math id="M236" display="inline"><mml:mrow><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> is calculated in the “other” parts
as an output to finalize the DA algorithms, while in MS-EnKF, the
calculations of <inline-formula><mml:math id="M237" display="inline"><mml:mrow><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> are needed and directly
involved in the “Analysis” part.</p>
      <p>The result shows that, benefitting from the success of a reduced analysis
step, the overall computational cost indeed gets significantly reduced. The
total execution time is 1.95 h, which is less than the simulation window of
3 h (09:00–12:00 UTC, 18 May 2010). This result satisfies our goal to
accelerate the computation to an acceptable runtime (i.e., requires less
runtime than the time period of the DA application). Therefore, aviation
advice based on the MS-EnKF can be provided as not only accurate, but also
sufficiently fast. Note that the result (1.95 h) is obtained after the
volcanic ash is transported to continental Europe. If the assimilation is
performed in the starting phase of volcanic ash eruption (when aircraft
measurements are available), a more significant acceleration would be
obtained. This is because in this case the volcanic ash is only transported
in an area near to the volcano; thus, the number of no-ash grid cells will
take a large proportion (much higher than <inline-formula><mml:math id="M238" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula> for this case study) of the
full domain.</p>
      <p>There is another interesting point. According to
Fig. <xref ref-type="fig" rid="Ch1.F3"/>, the ash grids comprise 39.3 % of the
total grids. Thus, the minimum computing time by using MS to utilize this
model's characteristic should be <inline-formula><mml:math id="M239" display="inline"><mml:mrow><mml:mo>≈</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">1.234</mml:mn></mml:mrow></mml:math></inline-formula> h (i.e.,
0.393 <inline-formula><mml:math id="M240" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 3.14 h). However, the experimental result shows that the
computational time goes down to 0.88 h (see Table <xref ref-type="table" rid="Ch1.T1"/>).
One reason for this time decrease is that when the size of the matrix is
reduced, the memory access cost also goes down (e.g., through better cache
usages). Another possible reason is that the ash grid number actually
decreases with time (not always taking 39.3 % of the total grid number),
due to ash sedimentation and deposition processes <xref ref-type="bibr" rid="bib1.bibx11" id="paren.44"/>.</p>
      <p>Note that in this study we only perform the commonly used ensemble
parallelization for the forecast step (already efficient compared to the
expensive analysis step) but do not choose model-based parallelization (e.g.,
tracer or domain decomposition). As specified in
Table <xref ref-type="table" rid="Ch1.T1"/>, no parallelization is implemented on the six
tracers. This is because due to the important aggregation process
<xref ref-type="bibr" rid="bib1.bibx10" id="paren.45"/>, there are big dependencies between different ash
components and thus it does not make much sense to parallelize them. As for
domain-decomposed parallelization <xref ref-type="bibr" rid="bib1.bibx37" id="paren.46"/>, it is not efficient
for our application. This is because volcanic ash is special in the sense
that the model is only doing computations in a small part of the domain
(i.e., there are no data in a rather large part of domain), and this active
part is continuously changing. Thus, a fixed domain decomposition is not very
useful here because of the changing plume position. In this sense, some
advanced approach such as adaptive domain-decomposed parallelization
<xref ref-type="bibr" rid="bib1.bibx25" id="paren.47"/> should be adopted to achieve additional
acceleration to the volcanic ash forecast stage. This is an interesting
subject for future application, when a more complicated model is employed,
only ensemble parallelization may be not enough for the forecast stage.</p>
</sec>
</sec>
<sec id="Ch1.S5">
  <title>Comparison between MS and standard sparse matrix methods</title>
<sec id="Ch1.S5.SS1">
  <title>Issues related to the generation of CSR-based arrays</title>
      <p>According to Sect. <xref ref-type="sec" rid="Ch1.S4"/>, MS has proven to be capable of solving the
computational issue of <inline-formula><mml:math id="M241" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>. Motivated by the model's characteristics, MS was proposed from
an application's perspective and achieved a good result by managing the
irregular sparsity in our complicated volcanic ash DA system. The main reason
why MS is efficient is that the sparsity of <inline-formula><mml:math id="M242" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> can be
well utilized by MS. In Sect. <xref ref-type="sec" rid="Ch1.S4"/>, we only performed the comparison
between MS and the case of full storage dense matrices. However, the problem
abstracted here (<inline-formula><mml:math id="M243" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>) is actually a
sparse–dense matrix multiplication (SDMM) problem, since
<inline-formula><mml:math id="M244" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> is sparse and <inline-formula><mml:math id="M245" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> is dense (see
Eq. <xref ref-type="disp-formula" rid="Ch1.E12"/> for <inline-formula><mml:math id="M246" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula>). Thus, one may wonder what the result would
be if the comparison of MS is made to more standard sparse matrix methods,
such as compressed sparse row (CSR)-based methods
<xref ref-type="bibr" rid="bib1.bibx35 bib1.bibx2" id="paren.48"/>, which are commonly used for sparse
matrix vector/matrix multiplication.</p>
      <p>Before we make the comparison, we need to first address the intrinsic problem
when considering standard sparse matrix methods in EnKF for
<inline-formula><mml:math id="M247" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>. The issue is
that it is not possible to directly generate a sparse storage format (e.g.,
CSR) of <inline-formula><mml:math id="M248" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> without first generating the full matrix
<inline-formula><mml:math id="M249" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>. This is mainly because <inline-formula><mml:math id="M250" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> comes
from the model-driven ensemble forecast step, where each
<inline-formula><mml:math id="M251" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> column corresponds to one member of the ensemble.
During model forecast, we know there are indeed no-ash grids. However, it is
not certain where the plume is exactly after one forecasting time step. This
is highly dependent on the weather conditions and the model processes (e.g.,
advection and diffusion for horizontal grids, sedimentation and deposition
for vertical grids). Thus, a fixed and wide domain is usually needed by the
model to avoid complications, resulting in the generation of the full storage
of <inline-formula><mml:math id="M252" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> (to be used in
<inline-formula><mml:math id="M253" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>). Therefore, if
we want to implement a CSR storage format for the sparse matrix
<inline-formula><mml:math id="M254" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, we must first generate the full storage
<inline-formula><mml:math id="M255" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> from the ensemble forecast step, and then we generate
the three CSR arrays based on <inline-formula><mml:math id="M256" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>.</p>
      <p>Generating CSR arrays is usually much more expensive (computationally) than a
single sparse matrix-vector multiplication (SpMV). Thus, if we generate CSR
arrays for only performing one-time SpMV, it would be meaningless from HPC's
point of view. Fortunately, this is not the case for
<inline-formula><mml:math id="M257" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> (i.e., SDMM),
which can actually be considered as N-times SpMV. (Here, <inline-formula><mml:math id="M258" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> has N
columns, and one SpMV means the multiplication of <inline-formula><mml:math id="M259" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> by
one column of <inline-formula><mml:math id="M260" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula>.) Thus, CSR-based SDMM might also be a candidate
in reducing the computation time of
<inline-formula><mml:math id="M261" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>. It remains
interesting to compare the performance of CSR-based SDMM and MS in dealing
with <inline-formula><mml:math id="M262" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> for our
study case.</p>
</sec>
<sec id="Ch1.S5.SS2">
  <title>Result of CSR-based SDMM</title>
      <p>To implement CSR-based SDMM for <inline-formula><mml:math id="M263" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>, the three CSR arrays for <inline-formula><mml:math id="M264" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> (denoted
<inline-formula><mml:math id="M265" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi mathvariant="bold-italic">a</mml:mi><mml:mi mathvariant="bold-italic">l</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M266" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">l</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi mathvariant="bold-italic">i</mml:mi><mml:mi mathvariant="bold-italic">d</mml:mi><mml:mi mathvariant="bold-italic">x</mml:mi></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M267" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi mathvariant="bold-italic">t</mml:mi><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:math></inline-formula> in this study) need first
to be generated. The array <inline-formula><mml:math id="M268" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi mathvariant="bold-italic">a</mml:mi><mml:mi mathvariant="bold-italic">l</mml:mi></mml:mrow></mml:math></inline-formula> of size <inline-formula><mml:math id="M269" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi mathvariant="bold-italic">a</mml:mi><mml:mi mathvariant="bold-italic">l</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> stores
non-zero values of <inline-formula><mml:math id="M270" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, where
<inline-formula><mml:math id="M271" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi mathvariant="bold-italic">a</mml:mi><mml:mi mathvariant="bold-italic">l</mml:mi></mml:mrow></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>≈</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>. (In this study case,
<inline-formula><mml:math id="M272" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> = 3.888 <inline-formula><mml:math id="M273" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 10<inline-formula><mml:math id="M274" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">6</mml:mn></mml:msup></mml:math></inline-formula>, <inline-formula><mml:math id="M275" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>≈</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">3</mml:mn></mml:mfrac></mml:mstyle><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula>,
and <inline-formula><mml:math id="M276" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> = 100.) The array <inline-formula><mml:math id="M277" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">l</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi mathvariant="bold-italic">i</mml:mi><mml:mi mathvariant="bold-italic">d</mml:mi><mml:mi mathvariant="bold-italic">x</mml:mi></mml:mrow></mml:math></inline-formula> of the same size <inline-formula><mml:math id="M278" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi mathvariant="bold-italic">a</mml:mi><mml:mi mathvariant="bold-italic">l</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>
stores the column index of the non-zeros. The array <inline-formula><mml:math id="M279" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi mathvariant="bold-italic">t</mml:mi><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:math></inline-formula> saves
the start and end pointers of the non-zeros of the rows in
<inline-formula><mml:math id="M280" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>. The size of <inline-formula><mml:math id="M281" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi mathvariant="bold-italic">t</mml:mi><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:math></inline-formula> is <inline-formula><mml:math id="M282" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> + 1.</p>
      <p>After the above three CSR arrays are generated, CSR-based SpMV can be
performed for multiplying <inline-formula><mml:math id="M283" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> by a column vector
<inline-formula><mml:math id="M284" display="inline"><mml:mi mathvariant="bold-italic">v</mml:mi></mml:math></inline-formula> in <inline-formula><mml:math id="M285" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> (see Algorithm 1 in Fig. <xref ref-type="fig" rid="Ch1.F4"/>a).
With that, Algorithm 2 (Fig. <xref ref-type="fig" rid="Ch1.F4"/>b) can be implemented by
looping Algorithm 1 for <inline-formula><mml:math id="M286" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> times to obtain
<inline-formula><mml:math id="M287" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>. The experimental
result of CSR-based SDMM is shown in Table <xref ref-type="table" rid="Ch1.T4"/>,
where all the environmental conditions (such as the DA system, the
programming environment) are the same as the case of MS. This gives a fair
comparison between CSR-based SDMM and MS. In addition, for a pure algorithmic
comparison with the serial MS, here the CSR-based SDMM is also performed in a
serial case.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T4" specific-use="star"><caption><p>Computational evaluation of the sub-steps of the sparse–dense
matrix multiplication with compressed sparse row storage (CSR-based SDMM) for
<inline-formula><mml:math id="M288" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>.
</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="2">
     <oasis:colspec colnum="1" colname="col1" align="justify" colwidth="227.622047pt"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Sub-step</oasis:entry>  
         <oasis:entry colname="col2">Computational time</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>  
         <oasis:entry colname="col1">(i) Compute three arrays (in CSR format) of <inline-formula><mml:math id="M289" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2">0.0407 h</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">(ii) Compute CSR-based SpMV for the first column of <inline-formula><mml:math id="M290" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2">0.0117 h</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">(iii) Loop (ii) for N-1 times for other N-1 columns of <inline-formula><mml:math id="M291" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2"><bold>1.1576</bold> h</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Total</oasis:entry>  
         <oasis:entry colname="col2"><bold>1.21</bold> h</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p>CSR-based SDMM is formed by
(ii) and (iii). h: hour; the time is wall clock time. </p></table-wrap-foot></table-wrap>

      <p>From Table <xref ref-type="table" rid="Ch1.T4"/>, we can first confirm that the
computational time (i.e., 0.0407 h) for the generation of the three
CSR-based arrays (<inline-formula><mml:math id="M292" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi mathvariant="bold-italic">a</mml:mi><mml:mi mathvariant="bold-italic">l</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M293" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">l</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi mathvariant="bold-italic">i</mml:mi><mml:mi mathvariant="bold-italic">d</mml:mi><mml:mi mathvariant="bold-italic">x</mml:mi></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M294" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi mathvariant="bold-italic">t</mml:mi><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:math></inline-formula> to
represent the sparse matrix <inline-formula><mml:math id="M295" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>) indeed takes more time
than the computational time of one CSR-based SpMV (i.e., 0.0117 h). Thus,
there is little value in performing sub-step (i) (see
Table <xref ref-type="table" rid="Ch1.T4"/>) if only one SpMV (i.e., sub-step (ii))
is needed. However, to get all <inline-formula><mml:math id="M296" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> (i.e., 100) columns of
<inline-formula><mml:math id="M297" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, the sub-step (ii) is looped for <inline-formula><mml:math id="M298" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> times, resulting
in an ignorable impact of sub-step (i) on the total computational time (i.e.,
1.21 h) of CSR-based SDMM.</p>
      <p>The result of CSR-based SDMM also shows that the standard sparse matrix
methods can reduce the computational time of
<inline-formula><mml:math id="M299" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>, by comparing
with the conventional way in Table <xref ref-type="table" rid="Ch1.T3"/>. However, it
can also be observed that the computational time of CSR-based SDMM is larger
than MS (i.e., 1.21 h versus 0.87 h in
Table <xref ref-type="table" rid="Ch1.T3"/>). Thus, although application of sparse
matrix multiplication methods is positive, it is still slower than MS on our
problem.</p>
</sec>
<sec id="Ch1.S5.SS3">
  <title>Comparison between CSR-based SDMM and MS</title>
      <p>In the CSR-based SDMM, only non-zero elements in <inline-formula><mml:math id="M300" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>
participate in the multiplication between <inline-formula><mml:math id="M301" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and
<inline-formula><mml:math id="M302" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula>; thus, redundant computation (related to zero elements in
<inline-formula><mml:math id="M303" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>) is avoided. So the computation time of
<inline-formula><mml:math id="M304" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> is reduced with CSR-based SDMM. In the
following, we analyze the performance difference between CSR-based SDMM and
MS.</p>
      <p>Firstly, from the programming's perspective, in CSR-based SDMM, the loop
number for the rows of <inline-formula><mml:math id="M305" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> is from 1 to <inline-formula><mml:math id="M306" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> (see
Fig. <xref ref-type="fig" rid="Ch1.F4"/>a), while the corresponding loop number in MS is
from 1 to <inline-formula><mml:math id="M307" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (see step (iv) of MS in
Sect. <xref ref-type="sec" rid="Ch1.S4.SS2"/>, <inline-formula><mml:math id="M308" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>≈</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">3</mml:mn><mml:mo>)</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula>). Although
only non-zero elements are used in the multiplication in CSR-based SDMM, the
length of the outer loop is still <inline-formula><mml:math id="M309" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> (much larger than <inline-formula><mml:math id="M310" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>),
which is the essential reason that MS is faster than CSR-based SDMM. Note
that as discussed in Sect. <xref ref-type="sec" rid="Ch1.S4.SS1"/>, there are many zero rows
in <inline-formula><mml:math id="M311" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>; thus, CSR-based SDMM actually does nothing when
it comes to a zero row, but still needs to execute the loop. Within each loop
number, it has to check the information from <inline-formula><mml:math id="M312" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi mathvariant="bold-italic">t</mml:mi><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:math></inline-formula> (size <inline-formula><mml:math id="M313" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>),
where the value corresponding to a zero row is usually set to be the value in
<inline-formula><mml:math id="M314" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi mathvariant="bold-italic">t</mml:mi><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:math></inline-formula> corresponding to the first subsequent non-zero row.</p>
      <p>Secondly, with respect to the algorithm, CSR-based SDMM utilizes the sparsity
of <inline-formula><mml:math id="M315" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> by its generation of three CSR arrays, while MS
not only utilizes the sparsity information of the sparse matrix
<inline-formula><mml:math id="M316" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, but also utilizes the consistency of ensemble
forecasts; that is, ensemble forecasted states are not consistent in values
but usually consistent in non-zero locations. This is a typical property in
ensemble-based DA, resulting in <inline-formula><mml:math id="M317" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> ensemble plumes being different in
concentration values but having similar transport directions/shapes (see
Sect. <xref ref-type="sec" rid="Ch1.S4.SS2"/>). Thus, most of the zero elements in
<inline-formula><mml:math id="M318" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> are actually in zero rows of <inline-formula><mml:math id="M319" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>
for an EnKF application, which leads to a small number of non-zero rows
(<inline-formula><mml:math id="M320" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) compared to the full number of rows (<inline-formula><mml:math id="M321" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>) of
<inline-formula><mml:math id="M322" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>. Therefore, only considering <inline-formula><mml:math id="M323" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> rows in
<inline-formula><mml:math id="M324" display="inline"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> (see step (iv) of MS in
Sect. <xref ref-type="sec" rid="Ch1.S4.SS2"/>) is more advantageous for an EnKF application
than considering all <inline-formula><mml:math id="M325" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> rows in CSR-based SDMM. Based on the above analysis,
MS can be considered a specific sparse matrix method, which typically works
for ensemble-based DA applications.</p>
      <p>It is useful to apply standard sparse matrix methods (e.g., CSR-based SDMM)
for our assimilation application. The accelerated analysis step by CSR-based
SDMM (1.22 h; see Table <xref ref-type="table" rid="Ch1.T3"/>) also reduces the
total computational time (i.e., 2.29 h; see Table <xref ref-type="table" rid="Ch1.T1"/>
for the computational time of initialization, forecast, and others) to an
acceptable level (i.e., less than 3 h for our case study). In practice, due
to the better performance of MS than CSR-based SDMM, we will use MS as a
better choice for assimilation applications. In addition, we do not only
intend to present MS, but also intend to reveal which part is the most
time-consuming part for plume-type assimilation of in situ observations.</p>
</sec>
</sec>
<sec id="Ch1.S6">
  <title>Discussions on MS</title>
<sec id="Ch1.S6.SS1">
  <title>Applicability</title>
      <p>For volcanic ash forecasts, only a relatively small domain is polluted
compared to the full 3-D domain, so that MS can work efficiently. Using MS is
also applicable for many other DA problems, where the domain is not fully
polluted by the species. It does not matter what the emission looks like and
whether the releases are short- or long-lived species. Given an assimilation
problem, the only restriction for MS to gain an acceleration is whether the
whole domain is fully polluted or partly polluted. The assimilation problems
where MS can achieve the acceleration effect on the computations of
<inline-formula><mml:math id="M326" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> include all the
volcanic-related ash/gas assimilations, e.g., assimilation of satellite
data/LIDAR data/in situ data; (sand/desert) dust-storm-related assimilation;
tornado-related assimilation; assimilation of exploding nuclear plants or
factories; chemicals or oils leaking into seas; global (forecast) fire
assimilation; and assimilation of environmental pollutant transport, e.g.,
severe smog. In addition, for DA applications (e.g., ozone, SO<inline-formula><mml:math id="M327" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>) where
pollutants spread over the whole domain, usually the focus is only on the
high concentrations, and a threshold can be set to ignore the very low values
without losing the necessary assimilation accuracy. In this case, MS can also
lead to a potential acceleration since many very low concentrations can be
explicitly truncated to be zeros.</p>
      <p>It has been analyzed that when the number of non-zero rows (<inline-formula><mml:math id="M328" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>,
i.e., the number of ash grids in a 3-D domain) of <inline-formula><mml:math id="M329" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>
satisfies <inline-formula><mml:math id="M330" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>&lt;</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula>, MS can work faster than standard EnKF. For
volcanic ash application, because <inline-formula><mml:math id="M331" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is much less than <inline-formula><mml:math id="M332" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>, the
acceleration is quite large. Hence, in this case, we propose to embed MS in
all ensemble-based DA methods because it is fast and the implementation using
MS is exact to the standard ensemble-based methods; i.e., it does not
introduce any approximation in view of MS procedures. Actually this proposal
can be extended to all real applications, even if the condition is not
satisfied. This is because in this case the computational cost of MS for
<inline-formula><mml:math id="M333" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> becomes
O(<inline-formula><mml:math id="M334" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>), which is the same as that of using the standard assimilation
(shown in Fig. <xref ref-type="fig" rid="Ch1.F2"/>b). Therefore, if the state numbers
are equal to or close to the total number of grid points in the domain, the
added computational cost of using MS is very small (negligible), so that the
computational time with MS is almost the same as the time of using the
standard approach, whereas when the condition <inline-formula><mml:math id="M335" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>&lt;</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula> is
satisfied, MS will accelerate the analysis step. Thus MS is generic and can
be directly used in any ensemble-based DA, and this acceleration can be
automatically realized for some potential applications, without spending time
investigating whether the condition is satisfied. In a real (or operational)
3-D DA system, MS can be easily included; i.e., we only need to invoke the MS
module when computing <inline-formula><mml:math id="M336" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>, without any other change to the current framework. Note that MS
is applicable in ensemble-based DA but not in variational-based DA. This is
because in a variational-based DA system, the minimization of a cost function
is mostly operated within several/many continuous time steps
<xref ref-type="bibr" rid="bib1.bibx27 bib1.bibx28" id="paren.49"/>; thus, it is convenient to always
use the full (i.e., non-masked) domain to represent different state matrices
(corresponding to different time steps in variational-based DA).</p>
      <p>As stated in Eq. (<xref ref-type="disp-formula" rid="Ch1.E15"/>), the speedup of the MS method is
approximately the inverse of <inline-formula><mml:math id="M337" display="inline"><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow><mml:mi>n</mml:mi></mml:mfrac></mml:mstyle></mml:math></inline-formula>. So far there are no
statistical data on the value of <inline-formula><mml:math id="M338" display="inline"><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow><mml:mi>n</mml:mi></mml:mfrac></mml:mstyle></mml:math></inline-formula>. Considering the
problem of volcanic ash transport, there is one emission point (at the
volcano); all the ashes in atmospheres are transported by the directional
wind drive from the same source point. Thus volcanic ash cloud is actually
transported in a shape of a plume, which in general does not cover the full
but only a small part of the 3-D domain. At the start phase of a volcanic ash
eruption, <inline-formula><mml:math id="M339" display="inline"><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow><mml:mi>n</mml:mi></mml:mfrac></mml:mstyle></mml:math></inline-formula> is much smaller than 1.0 (started from
0). During transport over a long time (1.5 months for this case study),
<inline-formula><mml:math id="M340" display="inline"><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub></mml:mrow><mml:mi>n</mml:mi></mml:mfrac></mml:mstyle></mml:math></inline-formula> increases to approximately <inline-formula><mml:math id="M341" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula>. Therefore, the
speedup of MS in ensemble-based volcanic ash DA will be significant.</p>
</sec>
<sec id="Ch1.S6.SS2">
  <title>MS and localization</title>
      <p>Based on the formulation of MS, one may think it can be taken as a
localization approach <xref ref-type="bibr" rid="bib1.bibx15" id="paren.50"/>. There is indeed a
similarity between MS and the localization approach, in a sense that when
computing <inline-formula><mml:math id="M342" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>, both
get rid of a large number of cells and only do computations related to the
selected grids. These two algorithms are however functionally different. This
is because the localization approach is meant for reducing spurious
covariances outside a local region which is built up around the measurement;
thus, the results with and without localization approaches are different,
while MS is developed for the acceleration purpose. The masked region is
discontinuous and independent of locations of measurement, but dependent on
the model domain. Thus, there is no difference in the assimilation results
between using MS and without using it. Therefore, based on the functional
difference, MS cannot be taken as a localization approach.</p>
      <p>In this study, we do not employ the localization strategy in the analysis
step, because we use a rather large ensemble size of 100 to guarantee the
accuracy, as introduced in Sect. <xref ref-type="sec" rid="Ch1.S2"/>. But for some applications
(e.g., ozone, CO<inline-formula><mml:math id="M343" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>, sulfur dioxide), especially when assimilating satellite
data, localization is a necessary approach and has been widely used in
reducing spurious covariances <xref ref-type="bibr" rid="bib1.bibx3 bib1.bibx5 bib1.bibx6" id="paren.51"/>. In these cases, because the
localization approach forces the analysis only to update the state within a
localization region, one may think that localization could replace MS and
that there would be no significance in employing MS. Actually this is not
correct. We explain the reason as follows.</p>
      <p>The localization approach is usually realized in Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>) by
employing a Schur product of a localization matrix and the forecast error
covariance matrix <xref ref-type="bibr" rid="bib1.bibx16 bib1.bibx17" id="paren.52"/> given
by

                <disp-formula id="Ch1.E17" content-type="numbered"><mml:math id="M344" display="block"><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><?xmltex \hack{\hbox\bgroup\fontsize{9.5}{9.5}\selectfont$\displaystyle}?><mml:mi mathvariant="bold">K</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="bold">f</mml:mi><mml:mo>∘</mml:mo><mml:msup><mml:mi mathvariant="bold">P</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>)</mml:mo><mml:mi mathvariant="bold">H</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mi>T</mml:mi></mml:msup><mml:mo>[</mml:mo><mml:mi mathvariant="bold">H</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="bold">f</mml:mi><mml:mo>∘</mml:mo><mml:msup><mml:mi mathvariant="bold">P</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>)</mml:mo><mml:mi mathvariant="bold">H</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mi>T</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:mi mathvariant="bold">R</mml:mi><mml:msup><mml:mo>]</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>.</mml:mo><?xmltex \hack{$\egroup}?></mml:mrow></mml:math></disp-formula>

          The Schur product <inline-formula><mml:math id="M345" display="inline"><mml:mrow><mml:mi mathvariant="bold">f</mml:mi><mml:mo>∘</mml:mo><mml:msup><mml:mi mathvariant="bold">P</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> in
Eq. (<xref ref-type="disp-formula" rid="Ch1.E17"/>) is defined by the element-wise
multiplication of the covariance matrix <inline-formula><mml:math id="M346" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">P</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and a
localization matrix <inline-formula><mml:math id="M347" display="inline"><mml:mi mathvariant="bold">f</mml:mi></mml:math></inline-formula>. <inline-formula><mml:math id="M348" display="inline"><mml:mi mathvariant="bold">f</mml:mi></mml:math></inline-formula> is defined based on the
distance between two locations; thus, it is dependent on the domain and needs
information on the full ensemble state locations. In this way, <inline-formula><mml:math id="M349" display="inline"><mml:mrow><mml:mi mathvariant="bold">f</mml:mi><mml:mo>∘</mml:mo><mml:msup><mml:mi mathvariant="bold">P</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> can contain more zeros than
<inline-formula><mml:math id="M350" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">P</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, but the dimensions are not changed, so that the
computations related to <inline-formula><mml:math id="M351" display="inline"><mml:mrow><mml:mi mathvariant="bold">f</mml:mi><mml:mo>∘</mml:mo><mml:msup><mml:mi mathvariant="bold">P</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> are actually
not reduced. Therefore, we can understand the localization approach in the
analysis step as that the states within and outside a local region are both
updated with increments, but just the increments outside the region are zero
(which seems like not updating). This is also the reason why the localization
approach is not meant for acceleration, but only for reducing spurious
covariances. Now it is clear that localization cannot replace MS. Actually
both can be performed together in dealing with the time-consuming part
<inline-formula><mml:math id="M352" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>. The localization
approach can first transfer <inline-formula><mml:math id="M353" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> to a localized matrix
with more zero rows. Then MS can be used to accelerate the multiplication of
the localized matrix and <inline-formula><mml:math id="M354" display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula>. In this way, MS is expected to
accelerate <inline-formula><mml:math id="M355" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> with a
high speedup rate, because the computational cost of more zero rows in the
localized ensemble state matrix is avoided.</p>
</sec>
<sec id="Ch1.S6.SS3">
  <title>MS and parallelization</title>
      <p>Motivated by the model's physics, the implementation MS currently is for the
serial case. This implementation has reduced the computation time to an
acceptable time (i.e., the simulation time is less than the period of
forecast in real-world time). It is however interesting to discuss the
potential of parallelization of the dense–dense matrix multiplication
(<inline-formula><mml:math id="M356" display="inline"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">a</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mover accent="true"><mml:mi mathvariant="bold">A</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi mathvariant="normal">ash</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) in step (iv) of the algorithm (see
Sect. <xref ref-type="sec" rid="Ch1.S4.SS2"/> and Table <xref ref-type="table" rid="Ch1.T2"/>). The
related matrix multiplication can be easily parallelized on multiple
processors. Optimization and evaluation on the parallelized MS will be
considered in future. For the current case study, the computational time
(3.13 h; see Table <xref ref-type="table" rid="Ch1.T3"/>) for an “ideal” reduction
by parallelization of MS is not much larger than the acceleration (already)
gained by MS (2.26 h, subtraction between 3.13 h and 0.87 h; see
Table <xref ref-type="table" rid="Ch1.T3"/>). Therefore, from the application's
perspective, further acceleration by parallelization is not required.</p>
      <p>Alternatively, one may also consider to (1) directly parallelize the
expensive matrix multiplication of
<inline-formula><mml:math id="M357" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">a</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi mathvariant="bold">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>N</mml:mi></mml:mrow><mml:mi mathvariant="normal">f</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, without first performing MS, or (2) implement
CSR-based SDMM (see Sect. <xref ref-type="sec" rid="Ch1.S5"/>) with parallelization. Both are
possible alternative approaches to accelerate the expensive matrix
multiplication. The first approach can be implemented by a user's own
designed parallelization, or by utilizing scaLAPACK
(<uri>https://www.netlib.org/scalapack/</uri>, where the main function is
“pdgemm”). The second approach can be realized by using some general
parallel sparse–dense matrix multiplication methods (e.g., sending each
column of X and three CSR arrays of <inline-formula><mml:math id="M358" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> to each processor
to calculate each column of <inline-formula><mml:math id="M359" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>) or using a good parallel
algebra library like PeTSC (<uri>https://www.mcs.anl.gov/petsc/</uri>) which
allows users to specify own orderings and comes with machine optimized
parallel matrix–matrix multiplication operations. However, given the fact
that MS can also be parallelized using similar ways or the same libraries, it
is fair to not consider parallelization for all cases (i.e., using MS, not
using MS, using CSR-based SDMM). Actually, the parallelization in MS could be
performed much more easily than other approaches in dealing with
<inline-formula><mml:math id="M360" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula>, because the
dense–dense matrix multiplication (parallelization in step (iv) of MS) is
easier to parallelize than the sparse–dense matrix multiplication (direct
parallelization for <inline-formula><mml:math id="M361" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> or parallelized CSR-based SDMM).</p>
      <p>In this paper, for the current usage, we keep the possibility of parallelization open,
because a serial MS has been efficient already.</p>
</sec>
</sec>
<sec id="Ch1.S7" sec-type="conclusions">
  <title>Conclusions</title>
      <p>In this study, based on evaluations of the computational cost of volcanic ash
DA, the analysis step turned out to be very expensive. Although some
potential approaches can accelerate the initialization and forecast steps,
there would be no notable improvement to the total computational time due to
the dominant analysis step. Therefore, to get an acceptable computational
cost, the key is to efficiently reduce the execution time of the analysis
step.</p>
      <p>After a detailed evaluation of various parts of the analysis stage, the most
time-consuming part was revealed. The mask-state algorithm (MS) was developed
based on a study of the characteristics of the ensemble ash states. The
algorithm transforms the full ensemble state matrix into a relatively small
matrix using a constructed mask array. Subsequently, the computation of the
analysis step was sufficiently reduced. MS is developed as a generic
approach; thus, it can be embedded in all ensemble-based DA implementations.
The extra computational cost of the algorithm is small and usually
negligible.</p>
      <p>The conventional ensemble-based DA with MS is shown to successfully reduce
the total computational time to an acceptable level, i.e., less than the time
period of the assimilation application. Consequently, timely and accurate
volcanic ash forecasts can be provided for aviation advice. This approach is
flexible. It boosts the performance without considering any model-based
parallelization such as domain or component decomposition. Thus, when a
parallel model is available, MS can easily be combined with the model to gain
a further speedup. It implements exactly the standard DA without any
approximation and with easy configurations, so that it can be used to
accelerate the standard DA in a wide range of applications. <?xmltex \hack{\newpage}?>
In this case study with the LOTOS-EUROS model (version 1.10), after the
parallelization is performed for the forecast step of EnKF assimilation, the
analysis step takes 72 % of the total runtime, which means the analysis
step is the bottleneck. This case might not be general for all ash forecasts,
as the computational cost for initialization and forecast greatly depends on
the forecast model that is used. For the current development, it makes sense
to use the LOTOS-EUROS model, because the model has been configured and
evaluated in <xref ref-type="bibr" rid="bib1.bibx11" id="paren.53"/> by comparison with other famous
models (e.g., NAME, <xref ref-type="bibr" rid="bib1.bibx19" id="altparen.54"/>, and WRF-Chem,
<xref ref-type="bibr" rid="bib1.bibx40" id="altparen.55"/>) in simulating volcanic ash transport. However,
if a more expensive ash forecasting model is used, then the bottleneck would
be the forecast step. In this case, the forecast step should be the goal for
acceleration, and probably a parallel model or adaptive domain decomposition
(as discussed in Sect. <xref ref-type="sec" rid="Ch1.S4.SS3"/>) needs to be employed together
with the parallel ensemble forecasts.</p>
      <p>The use of in situ measurements is one important reason why MS works
perfectly. For each analysis step, the number of measurements are quite
small, and the procedure of the singular value decomposition (SVD) costs
little. However, in some applications when many measurements are assimilated
(e.g., satellite-based data <xref ref-type="bibr" rid="bib1.bibx13" id="altparen.56"/> or seismic-based data
<xref ref-type="bibr" rid="bib1.bibx23" id="altparen.57"/>), and the number of measurements is
on the same order as the number of state variables, the most time-consuming
part will be the SVD. In these cases, the contributions of MS will be
limited. The reduction of the total computing time using MS is therefore less
significant; an effective acceleration algorithm for the analysis step must
be used and should consider the computationally expensive SVD in the first
place.</p>
</sec>

      
      </body>
    <back><notes notes-type="codedataavailability">

      <p>The averaged aircraft in situ data used in this study
are available from Fig. <xref ref-type="fig" rid="Ch1.F1"/>c. The used continuous
aircraft data and the model output data can be accessed by request
(G.Fu@tudelft.nl). The mask-state algorithm (MS) is implemented in OpenDA
(the open source software for DA, <uri>www.openda.com</uri>) and the software can
be downloaded from sourceforge
(<uri>https://sourceforge.net/projects/openda</uri>).</p>
  </notes><notes notes-type="authorcontribution">

      <p>Guangliang Fu, Sha Lu, and Arjo Segers simulated the volcanic ash
transport using the
LOTOS-EUROS model. Guangliang Fu, Hai Xiang Lin, and Tongchao Lu
evaluated the computational efforts. Guangliang Fu, Hai Xiang Lin, Arnold
Heemink, and Shiming Xu developed the algorithms. Guangliang Fu, Hai Xiang
Lin, and Nils van Velzen
carried out computer experiments and
analyzed the performance of the developed algorithm. Guangliang Fu and
Hai Xiang Lin
wrote the paper.</p>
  </notes><notes notes-type="competinginterests">

      <p>The authors declare that they have no conflict of interest.</p>
  </notes><ack><title>Acknowledgements</title><p>We are very grateful to the editor and four anonymous reviewers for their
reviews. We thank the Netherlands Supercomputing Center for supporting us
with the Cartesius cluster for the experiments in our study. We are grateful
to Konradin Weber for providing the aircraft measurements. <?xmltex \hack{\newline}?><?xmltex \hack{\newline}?> Edited by: R. Sander<?xmltex \hack{\newline}?> Reviewed by: four
anonymous referees</p></ack><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Amdahl(1967)</label><mixed-citation>Amdahl, G. M.: Validity of the Single Processor Approach to Achieving Large
Scale Computing Capabilities, in: Proceedings of the April 18-20, 1967,
Spring Joint Computer Conference, AFIPS '67 (Spring), pp. 483–485, ACM, New
York, NY, USA, <ext-link xlink:href="http://dx.doi.org/10.1145/1465482.1465560" ext-link-type="DOI">10.1145/1465482.1465560</ext-link>, 1967.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Bank and Douglas(1993)</label><mixed-citation>Bank, R. and Douglas, C.: Sparse matrix multiplication package (SMMP), Adv.
Comput. Math., 1,
127–137, <ext-link xlink:href="http://dx.doi.org/10.1007/bf02070824" ext-link-type="DOI">10.1007/bf02070824</ext-link>, 1993.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Barbu et al.(2009)Barbu, Segers, Schaap, Heemink, and
Builtjes</label><mixed-citation>Barbu, A. L., Segers, A. J., Schaap, M., Heemink, A. W., and Builtjes, P.
J. H.: A multi-component data assimilation experiment directed to sulphur
dioxide and sulphate over Europe, Atmos. Environ., 43, 1622–1631,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2008.12.005" ext-link-type="DOI">10.1016/j.atmosenv.2008.12.005</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Casadevall(1994)</label><mixed-citation>Casadevall, T. J.: The 1989–1990 eruption of Redoubt Volcano, Alaska:
impacts on aircraft operations, J. Volcanol. Geoth.
Res., 62, 301–316, <ext-link xlink:href="http://dx.doi.org/10.1016/0377-0273(94)90038-8" ext-link-type="DOI">10.1016/0377-0273(94)90038-8</ext-link>, 1994.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Chatterjee et al.(2012)Chatterjee, Michalak, Anderson, Mueller, and
Yadav</label><mixed-citation>Chatterjee, A., Michalak, A. M., Anderson, J. L., Mueller, K. L., and Yadav,
V.: Toward reliable ensemble Kalman filter estimates of CO<inline-formula><mml:math id="M362" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> fluxes, J.
Geophys. Res., 117, D22306, <ext-link xlink:href="http://dx.doi.org/10.1029/2012jd018176" ext-link-type="DOI">10.1029/2012jd018176</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Curier et al.(2012)Curier, Timmermans, Calabretta-Jongen, Eskes,
Segers, Swart, and Schaap</label><mixed-citation>Curier, R. L., Timmermans, R., Calabretta-Jongen, S., Eskes, H., Segers, A.,
Swart, D., and Schaap, M.: Improving ozone forecasts over Europe by
synergistic use of the LOTOS-EUROS chemical transport model and in-situ
measurements, Atmos. Environ., 60, 217–226,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2012.06.017" ext-link-type="DOI">10.1016/j.atmosenv.2012.06.017</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Eliasson et al.(2011)Eliasson, Palsson, and
Weber</label><mixed-citation>Eliasson, J., Palsson, A., and Weber, K.: Monitoring ash clouds for aviation,
Nature, 475, p. 455, <ext-link xlink:href="http://dx.doi.org/10.1038/475455b" ext-link-type="DOI">10.1038/475455b</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Evensen(2003)</label><mixed-citation>Evensen, G.: The Ensemble Kalman Filter: theoretical formulation and practical
implementation, Ocean Dynam., 53, 343–367,
<ext-link xlink:href="http://dx.doi.org/10.1007/s10236-003-0036-9" ext-link-type="DOI">10.1007/s10236-003-0036-9</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Filgueira et al.(2014)Filgueira, Atkinson, Tanimura, and
Kojima</label><mixed-citation>Filgueira, R., Atkinson, M., Tanimura, Y., and Kojima, I.: Applying
Selectively Parallel I/O Compression to Parallel Storage Systems, in:
Euro-Par 2014 Parallel Processing, edited by Silva, F., Dutra, I., and
Santos Costa, V., vol. 8632 of Lecture Notes in Computer Science,
Springer International Publishing, 282–293,
<ext-link xlink:href="http://dx.doi.org/10.1007/978-3-319-09873-9_24" ext-link-type="DOI">10.1007/978-3-319-09873-9_24</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Folch et al.(2010)Folch, Costa, Durant, and
Macedonio</label><mixed-citation>Folch, A., Costa, A., Durant, A., and Macedonio, G.: A model for wet
aggregation of ash particles in volcanic plumes and clouds: 2. Model
application, J. Geophys. Res., 115, B09202, <ext-link xlink:href="http://dx.doi.org/10.1029/2009jb007176" ext-link-type="DOI">10.1029/2009jb007176</ext-link>,
2010.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Fu et al.(2015)Fu, Lin, Heemink, Segers, Lu, and
Palsson</label><mixed-citation>Fu, G., Lin, H. X., Heemink, A. W., Segers, A. J., Lu, S., and Palsson, T.:
Assimilating aircraft-based measurements to improve Forecast Accuracy of
Volcanic Ash Transport, Atmos. Environ., 115, 170–184,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2015.05.061" ext-link-type="DOI">10.1016/j.atmosenv.2015.05.061</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Fu et al.(2016)Fu, Heemink, Lu, Segers, Weber, and
Lin</label><mixed-citation>Fu, G., Heemink, A., Lu, S., Segers, A., Weber, K., and Lin, H.-X.: Model-based aviation advice on distal volcanic ash clouds by assimilating aircraft
in situ measurements, Atmos. Chem. Phys., 16, 9189–9200, <ext-link xlink:href="http://dx.doi.org/10.5194/acp-16-9189-2016" ext-link-type="DOI">10.5194/acp-16-9189-2016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Fu et al.(2017)Fu, Prata, Lin, Heemink, Segers, and Lu</label><mixed-citation>Fu, G., Prata, F., Lin, H. X., Heemink, A., Segers, A., and Lu, S.: Data
assimilation for volcanic ash plumes using a satellite observational
operator: a case study on the 2010 Eyjafjallajökull volcanic eruption,
Atmos. Chem. Phys., 17, 1187–1205, <ext-link xlink:href="http://dx.doi.org/10.5194/acp-17-1187-2017" ext-link-type="DOI">10.5194/acp-17-1187-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Gudmundsson et al.(2012)Gudmundsson, Thordarson, Höskuldsson,
Larsen, Björnsson, Prata, Oddsson, Magnússon, Högnadóttir,
Petersen, Hayward, Stevenson, and Jónsdóttir</label><mixed-citation>Gudmundsson, M. T., Thordarson, T., Höskuldsson, A., Larsen, G.,
Björnsson, H., Prata, F. J., Oddsson, B., Magnússon, E.,
Högnadóttir, T., Petersen, G. N., Hayward, C. L., Stevenson, J. A.,
and Jónsdóttir, I.: Ash generation and distribution from the
April–May 2010 eruption of Eyjafjallajökull, Iceland, Scientific
Reports, 2, <ext-link xlink:href="http://dx.doi.org/10.1038/srep00572" ext-link-type="DOI">10.1038/srep00572</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Hamill et al.(2001)Hamill, Whitaker, and
Snyder</label><mixed-citation>Hamill, T. M., Whitaker, J. S., and Snyder, C.: Distance-Dependent Filtering
of Background Error Covariance Estimates in an Ensemble Kalman Filter, Mon.
Weather Rev., 129, 2776–2790,
<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0493(2001)129&lt;2776:ddfobe&gt;2.0.co;2" ext-link-type="DOI">10.1175/1520-0493(2001)129&lt;2776:ddfobe&gt;2.0.co;2</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Houtekamer and Mitchell(1998)</label><mixed-citation>Houtekamer, P. L. and Mitchell, H. L.: Data Assimilation Using an Ensemble
Kalman Filter Technique, Mon. Weather Rev., 126, 796–811,
<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0493(1998)126&lt;0796:dauaek&gt;2.0.co;2" ext-link-type="DOI">10.1175/1520-0493(1998)126&lt;0796:dauaek&gt;2.0.co;2</ext-link>, 1998.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Houtekamer and Mitchell(2001)</label><mixed-citation>Houtekamer, P. L. and Mitchell, H. L.: A Sequential Ensemble Kalman Filter for
Atmospheric Data Assimilation, Mon. Weather Rev., 129, 123–137,
<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0493(2001)129&lt;0123:asekff&gt;2.0.co;2" ext-link-type="DOI">10.1175/1520-0493(2001)129&lt;0123:asekff&gt;2.0.co;2</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Houtekamer et al.(2014)Houtekamer, He, and
Mitchell</label><mixed-citation>Houtekamer, P. L., He, B., and Mitchell, H. L.: Parallel Implementation of an
Ensemble Kalman Filter, Mon. Weather Rev., 142, 1163–1182,
<ext-link xlink:href="http://dx.doi.org/10.1175/mwr-d-13-00011.1" ext-link-type="DOI">10.1175/mwr-d-13-00011.1</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Jones et al.(2007)Jones, Thomson, Hort, and Devenish</label><mixed-citation>Jones, A., Thomson, D., Hort, M., and Devenish, B.: The U.K. Met Office's
Next-Generation Atmospheric Dispersion Model, NAME III, in: Air Pollution
Modeling and Its Application XVII, edited by: Borrego, C. and Norman, A.-L.,
Springer US,   580–589, <ext-link xlink:href="http://dx.doi.org/10.1007/978-0-387-68854-1_62" ext-link-type="DOI">10.1007/978-0-387-68854-1_62</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Kalnay et al.(2012)Kalnay, Ota, Miyoshi, and Liu</label><mixed-citation>Kalnay, E., Ota, Y., Miyoshi, T., and Liu, J.: A simpler formulation of
forecast sensitivity to observations: application to ensemble Kalman
filters, Tellus A, 64, 18462, <ext-link xlink:href="http://dx.doi.org/10.3402/tellusa.v64i0.18462" ext-link-type="DOI">10.3402/tellusa.v64i0.18462</ext-link>,
2012.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Keppenne(2000)</label><mixed-citation>Keppenne, C. L.: Data Assimilation into a Primitive-Equation Model with a
Parallel Ensemble Kalman Filter, Mon. Weather Rev., 128, 1971–1981,
<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0493(2000)128&lt;1971:daiape&gt;2.0.co;2" ext-link-type="DOI">10.1175/1520-0493(2000)128&lt;1971:daiape&gt;2.0.co;2</ext-link>, 2000.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Keppenne and Rienecker(2002)</label><mixed-citation>Keppenne, C. L. and Rienecker, M. M.: Initial Testing of a Massively Parallel
Ensemble Kalman Filter with the Poseidon Isopycnal Ocean General Circulation
Model, Mon. Wea. Rev., 130, 2951–2965,
<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0493(2002)130&lt;2951:itoamp&gt;2.0.co;2" ext-link-type="DOI">10.1175/1520-0493(2002)130&lt;2951:itoamp&gt;2.0.co;2</ext-link>, 2002.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Khairullah et al.(2013)Khairullah, Lin, Hanea, and
Heemink</label><mixed-citation>Khairullah, M., Lin, H., Hanea, R. G., and Heemink, A. W.: Parallelization of
Ensemble Kalman Filter (EnKF) for Oil Reservoirs with Time-lapse Seismic
Data, International Journal of Mathematical, Computational Science and
Engineering, 7, <uri>http://waset.org/Publication/16317</uri>,  2013.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Liang et al.(2009)Liang, Sepehrnoori, and
Delshad</label><mixed-citation>Liang, B., Sepehrnoori, K., and Delshad, M.: An Automatic History Matching
Module with Distributed and Parallel Computing, Petroleum Science and
Technology, 27, 1092–1108, <ext-link xlink:href="http://dx.doi.org/10.1080/10916460802455962" ext-link-type="DOI">10.1080/10916460802455962</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Lin et al.(1998)Lin, Cosman, Heemink, Stijnen, and van
Beek</label><mixed-citation>Lin, H.-X., Cosman, A., Heemink, A., Stijnen, J., and van Beek, P.:
Parallelization of the Particle Model SIMPAR, in: Advances in Hydro-Science
and Engineering, edited by Holz, K. P., Bechteler, W., Wang, S. S. Y., and
Kawahara, M., vol. 3, Center for Computational Hydroscience and Engineering,
available at: <uri>https://www.researchgate.net/publication/252671025_Parallelization_of_the_Particle_Model_SIMPAR</uri>
(last access: 3 April 2017), 1998.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Lu et al.(2016a)Lu, Lin, Heemink, Segers, and
Fu</label><mixed-citation>Lu, S., Lin, H. X., Heemink, A., Segers, A., and Fu, G.: Estimation of
volcanic ash emissions through assimilating satellite data and ground-based
observations, J. Geophys. Res.-Atmos., 121, 10971–10994,
<ext-link xlink:href="http://dx.doi.org/10.1002/2016JD025131" ext-link-type="DOI">10.1002/2016JD025131</ext-link>, 2016a.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Lu et al.(2016b)Lu, Lin, Heemink, Fu, and
Segers</label><mixed-citation>Lu, S., Lin, H. X., Heemink, A. W., Fu, G., and Segers, A. J.: Estimation of
Volcanic Ash Emissions Using Trajectory-Based 4D-Var Data Assimilation, Mon.
Weather Rev., 144, 575–589, <ext-link xlink:href="http://dx.doi.org/10.1175/mwr-d-15-0194.1" ext-link-type="DOI">10.1175/mwr-d-15-0194.1</ext-link>, 2016b.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Lu et al.(2017)Lu, Heemink, Lin, Segers, and Fu</label><mixed-citation>Lu, S., Heemink, A., Lin, H. X., Segers, A., and Fu, G.: Evaluation criteria
on the design for assimilating remote sensing data using variational
approaches, Mon. Weather Rev., 0, 1–11, <ext-link xlink:href="http://dx.doi.org/10.1175/mwr-d-16-0289.1" ext-link-type="DOI">10.1175/mwr-d-16-0289.1</ext-link>,
2017.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Miyazaki et al.(2015)Miyazaki, Eskes, and
Sudo</label><mixed-citation>Miyazaki, K., Eskes, H. J., and Sudo, K.: A tropospheric chemistry reanalysis
for the years 2005–2012 based on an assimilation of OMI, MLS, TES, and
MOPITT satellite data, Atmos. Chem. Phys., 15, 8315–8348,
<ext-link xlink:href="http://dx.doi.org/10.5194/acp-15-8315-2015" ext-link-type="DOI">10.5194/acp-15-8315-2015</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Nerger and Hiller(2013)</label><mixed-citation>Nerger, L. and Hiller, W.: Software for ensemble-based data assimilation
systems – Implementation strategies and scalability, Comput.
Geosci., 55, 110–118, <ext-link xlink:href="http://dx.doi.org/10.1016/j.cageo.2012.03.026" ext-link-type="DOI">10.1016/j.cageo.2012.03.026</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Oxford-Economics(2010)</label><mixed-citation>Oxford-Economics: The Economic Impacts of Air Travel Restrictions Due to
Volcanic Ash, Report for Airbus, Tech. rep.,
available at: <uri>http://www.oxfordeconomics.com/my-oxford/projects/129051</uri>
(last access: 3 April 2017),
2010.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Petrie and Dance(2010)</label><mixed-citation>Petrie, R. E. and Dance, S. L.: Ensemble-based data assimilation and the
localisation problem, Weather, 65, 65–69, <ext-link xlink:href="http://dx.doi.org/10.1002/wea.505" ext-link-type="DOI">10.1002/wea.505</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Quinn and Abarbanel(2011)</label><mixed-citation>Quinn, J. C. and Abarbanel, H. D. I.: Data assimilation using a GPU
accelerated path integral Monte Carlo approach, J. Comput.
Phys., 230, 8168–8178, <ext-link xlink:href="http://dx.doi.org/10.1016/j.jcp.2011.07.015" ext-link-type="DOI">10.1016/j.jcp.2011.07.015</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Riishojgaard(1998)</label><mixed-citation>Riishojgaard, L. P.: A direct way of specifying flow-dependent background
error correlations for meteorological analysis systems, Tellus A, 50,
42–57, <ext-link xlink:href="http://dx.doi.org/10.1034/j.1600-0870.1998.00004.x" ext-link-type="DOI">10.1034/j.1600-0870.1998.00004.x</ext-link>, 1998.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Saad(2003)</label><mixed-citation>Saad, Y.: Iterative Methods for Sparse Linear Systems, Society for Industrial
and Applied Mathematics, <ext-link xlink:href="http://dx.doi.org/10.1137/1.9780898718003" ext-link-type="DOI">10.1137/1.9780898718003</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Schaap et al.(2008)Schaap, Timmermans, Roemer, Boersen, Builtjes,
Sauter, Velders, and Beck</label><mixed-citation>Schaap, M., Timmermans, R. M. A., Roemer, M., Boersen, G. A. C., Builtjes, P.
J. H., Sauter, F. J., Velders, G. J. M., and Beck, J. P.: The LOTOS EUROS
model: description, validation and latest developments, Int.
J. Environ. Pollut., 32,    270,
<ext-link xlink:href="http://dx.doi.org/10.1504/ijep.2008.017106" ext-link-type="DOI">10.1504/ijep.2008.017106</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Segers(2002)</label><mixed-citation>Segers, A. J.: Data Assimilation in Atmospheric Chemistry Models Using Kalman
Filtering, Delft Univ Pr,
available at: <uri>http://repository.tudelft.nl/islandora/object/uuid:113b6229-c33a-4100-93be-22e1c8912672?collection=research</uri>
(last access: 3 April 2017),
2002.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Tavakoli et al.(2013)Tavakoli, Pencheva, and
Wheeler</label><mixed-citation>Tavakoli, R., Pencheva, G., and Wheeler, M. F.: Multi-level Parallelization of
Ensemble Kalman Filter for Reservoir History Matching, in: SPE Reservoir
Simulation Symposium, Society of Petroleum Engineers,
<ext-link xlink:href="http://dx.doi.org/10.2118/141657-ms" ext-link-type="DOI">10.2118/141657-ms</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Weber et al.(2012)Weber, Eliasson, Vogel, Fischer, Pohl, van Haren,
Meier, Grobéty, and Dahmann</label><mixed-citation>Weber, K., Eliasson, J., Vogel, A., Fischer, C., Pohl, T., van Haren, G.,
Meier, M., Grobéty, B., and Dahmann, D.: Airborne in-situ investigations
of the Eyjafjallajökull volcanic ash plume on Iceland and over
north-western Germany with light aircrafts and optical particle counters,
Atmos. Environ., 48, 9–21, <ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2011.10.030" ext-link-type="DOI">10.1016/j.atmosenv.2011.10.030</ext-link>,
2012.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bibx40"><label>Webley et al.(2012)Webley, Steensen, Stuefer, Grell, Freitas, and
Pavolonis</label><mixed-citation>Webley, P. W., Steensen, T., Stuefer, M., Grell, G., Freitas, S., and
Pavolonis, M.: Analyzing the Eyjafjallajökull 2010 eruption using
satellite remote sensing, lidar and WRF-Chem dispersion and tracking model,
J. Geophys. Res., 117, D00U26, <ext-link xlink:href="http://dx.doi.org/10.1029/2011jd016817" ext-link-type="DOI">10.1029/2011jd016817</ext-link>, 2012.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bibx41"><label>Zehner(2010)</label><mixed-citation>Zehner, C. (Ed.): Monitoring Volcanic Ash From Space, ESA communication
Production Office, <ext-link xlink:href="http://dx.doi.org/10.5270/atmch-10-01" ext-link-type="DOI">10.5270/atmch-10-01</ext-link>, 2010.</mixed-citation></ref>

  </ref-list><app-group content-type="float"><app><title/>

    </app></app-group></back>
    <!--<article-title-html>Accelerating volcanic ash data assimilation using a mask-state algorithm based on an ensemble Kalman filter: a case study with the LOTOS-EUROS model (version 1.10) </article-title-html>
<abstract-html><p class="p">In this study, we investigate a strategy to accelerate the data assimilation
(DA) algorithm. Based on evaluations of the computational time, the analysis
step of the assimilation turns out to be the most expensive part. After a
study of the characteristics of the ensemble ash state, we propose a
mask-state algorithm which records the sparsity information of the full
ensemble state matrix and transforms the full matrix into a relatively small
one. This will reduce the computational cost in the analysis step.
Experimental results show the mask-state algorithm significantly speeds up
the analysis step. Subsequently, the total amount of computing time for
volcanic ash DA is reduced to an acceptable level. The mask-state algorithm
is generic and thus can be embedded in any ensemble-based DA framework.
Moreover, ensemble-based DA with the mask-state algorithm is promising and
flexible, because it implements exactly the standard DA without any
approximation and it realizes the satisfying performance without any change
in the full model.</p></abstract-html>
<ref-html id="bib1.bib1"><label>Amdahl(1967)</label><mixed-citation>
Amdahl, G. M.: Validity of the Single Processor Approach to Achieving Large
Scale Computing Capabilities, in: Proceedings of the April 18-20, 1967,
Spring Joint Computer Conference, AFIPS '67 (Spring), pp. 483–485, ACM, New
York, NY, USA, <a href="http://dx.doi.org/10.1145/1465482.1465560" target="_blank">doi:10.1145/1465482.1465560</a>, 1967.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Bank and Douglas(1993)</label><mixed-citation>
Bank, R. and Douglas, C.: Sparse matrix multiplication package (SMMP), Adv.
Comput. Math., 1,
127–137, <a href="http://dx.doi.org/10.1007/bf02070824" target="_blank">doi:10.1007/bf02070824</a>, 1993.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Barbu et al.(2009)Barbu, Segers, Schaap, Heemink, and
Builtjes</label><mixed-citation>
Barbu, A. L., Segers, A. J., Schaap, M., Heemink, A. W., and Builtjes, P.
J. H.: A multi-component data assimilation experiment directed to sulphur
dioxide and sulphate over Europe, Atmos. Environ., 43, 1622–1631,
<a href="http://dx.doi.org/10.1016/j.atmosenv.2008.12.005" target="_blank">doi:10.1016/j.atmosenv.2008.12.005</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Casadevall(1994)</label><mixed-citation>
Casadevall, T. J.: The 1989–1990 eruption of Redoubt Volcano, Alaska:
impacts on aircraft operations, J. Volcanol. Geoth.
Res., 62, 301–316, <a href="http://dx.doi.org/10.1016/0377-0273(94)90038-8" target="_blank">doi:10.1016/0377-0273(94)90038-8</a>, 1994.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Chatterjee et al.(2012)Chatterjee, Michalak, Anderson, Mueller, and
Yadav</label><mixed-citation>
Chatterjee, A., Michalak, A. M., Anderson, J. L., Mueller, K. L., and Yadav,
V.: Toward reliable ensemble Kalman filter estimates of CO<sub>2</sub> fluxes, J.
Geophys. Res., 117, D22306, <a href="http://dx.doi.org/10.1029/2012jd018176" target="_blank">doi:10.1029/2012jd018176</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Curier et al.(2012)Curier, Timmermans, Calabretta-Jongen, Eskes,
Segers, Swart, and Schaap</label><mixed-citation>
Curier, R. L., Timmermans, R., Calabretta-Jongen, S., Eskes, H., Segers, A.,
Swart, D., and Schaap, M.: Improving ozone forecasts over Europe by
synergistic use of the LOTOS-EUROS chemical transport model and in-situ
measurements, Atmos. Environ., 60, 217–226,
<a href="http://dx.doi.org/10.1016/j.atmosenv.2012.06.017" target="_blank">doi:10.1016/j.atmosenv.2012.06.017</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Eliasson et al.(2011)Eliasson, Palsson, and
Weber</label><mixed-citation>
Eliasson, J., Palsson, A., and Weber, K.: Monitoring ash clouds for aviation,
Nature, 475, p. 455, <a href="http://dx.doi.org/10.1038/475455b" target="_blank">doi:10.1038/475455b</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Evensen(2003)</label><mixed-citation>
Evensen, G.: The Ensemble Kalman Filter: theoretical formulation and practical
implementation, Ocean Dynam., 53, 343–367,
<a href="http://dx.doi.org/10.1007/s10236-003-0036-9" target="_blank">doi:10.1007/s10236-003-0036-9</a>, 2003.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Filgueira et al.(2014)Filgueira, Atkinson, Tanimura, and
Kojima</label><mixed-citation>
Filgueira, R., Atkinson, M., Tanimura, Y., and Kojima, I.: Applying
Selectively Parallel I/O Compression to Parallel Storage Systems, in:
Euro-Par 2014 Parallel Processing, edited by Silva, F., Dutra, I., and
Santos Costa, V., vol. 8632 of Lecture Notes in Computer Science,
Springer International Publishing, 282–293,
<a href="http://dx.doi.org/10.1007/978-3-319-09873-9_24" target="_blank">doi:10.1007/978-3-319-09873-9_24</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Folch et al.(2010)Folch, Costa, Durant, and
Macedonio</label><mixed-citation>
Folch, A., Costa, A., Durant, A., and Macedonio, G.: A model for wet
aggregation of ash particles in volcanic plumes and clouds: 2. Model
application, J. Geophys. Res., 115, B09202, <a href="http://dx.doi.org/10.1029/2009jb007176" target="_blank">doi:10.1029/2009jb007176</a>,
2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Fu et al.(2015)Fu, Lin, Heemink, Segers, Lu, and
Palsson</label><mixed-citation>
Fu, G., Lin, H. X., Heemink, A. W., Segers, A. J., Lu, S., and Palsson, T.:
Assimilating aircraft-based measurements to improve Forecast Accuracy of
Volcanic Ash Transport, Atmos. Environ., 115, 170–184,
<a href="http://dx.doi.org/10.1016/j.atmosenv.2015.05.061" target="_blank">doi:10.1016/j.atmosenv.2015.05.061</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Fu et al.(2016)Fu, Heemink, Lu, Segers, Weber, and
Lin</label><mixed-citation>
Fu, G., Heemink, A., Lu, S., Segers, A., Weber, K., and Lin, H.-X.: Model-based aviation advice on distal volcanic ash clouds by assimilating aircraft
in situ measurements, Atmos. Chem. Phys., 16, 9189–9200, <a href="http://dx.doi.org/10.5194/acp-16-9189-2016" target="_blank">doi:10.5194/acp-16-9189-2016</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Fu et al.(2017)Fu, Prata, Lin, Heemink, Segers, and Lu</label><mixed-citation>
Fu, G., Prata, F., Lin, H. X., Heemink, A., Segers, A., and Lu, S.: Data
assimilation for volcanic ash plumes using a satellite observational
operator: a case study on the 2010 Eyjafjallajökull volcanic eruption,
Atmos. Chem. Phys., 17, 1187–1205, <a href="http://dx.doi.org/10.5194/acp-17-1187-2017" target="_blank">doi:10.5194/acp-17-1187-2017</a>, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Gudmundsson et al.(2012)Gudmundsson, Thordarson, Höskuldsson,
Larsen, Björnsson, Prata, Oddsson, Magnússon, Högnadóttir,
Petersen, Hayward, Stevenson, and Jónsdóttir</label><mixed-citation>
Gudmundsson, M. T., Thordarson, T., Höskuldsson, A., Larsen, G.,
Björnsson, H., Prata, F. J., Oddsson, B., Magnússon, E.,
Högnadóttir, T., Petersen, G. N., Hayward, C. L., Stevenson, J. A.,
and Jónsdóttir, I.: Ash generation and distribution from the
April–May 2010 eruption of Eyjafjallajökull, Iceland, Scientific
Reports, 2, <a href="http://dx.doi.org/10.1038/srep00572" target="_blank">doi:10.1038/srep00572</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Hamill et al.(2001)Hamill, Whitaker, and
Snyder</label><mixed-citation>
Hamill, T. M., Whitaker, J. S., and Snyder, C.: Distance-Dependent Filtering
of Background Error Covariance Estimates in an Ensemble Kalman Filter, Mon.
Weather Rev., 129, 2776–2790,
<a href="http://dx.doi.org/10.1175/1520-0493(2001)129&lt;2776:ddfobe&gt;2.0.co;2" target="_blank">doi:10.1175/1520-0493(2001)129&lt;2776:ddfobe&gt;2.0.co;2</a>, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Houtekamer and Mitchell(1998)</label><mixed-citation>
Houtekamer, P. L. and Mitchell, H. L.: Data Assimilation Using an Ensemble
Kalman Filter Technique, Mon. Weather Rev., 126, 796–811,
<a href="http://dx.doi.org/10.1175/1520-0493(1998)126&lt;0796:dauaek&gt;2.0.co;2" target="_blank">doi:10.1175/1520-0493(1998)126&lt;0796:dauaek&gt;2.0.co;2</a>, 1998.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Houtekamer and Mitchell(2001)</label><mixed-citation>
Houtekamer, P. L. and Mitchell, H. L.: A Sequential Ensemble Kalman Filter for
Atmospheric Data Assimilation, Mon. Weather Rev., 129, 123–137,
<a href="http://dx.doi.org/10.1175/1520-0493(2001)129&lt;0123:asekff&gt;2.0.co;2" target="_blank">doi:10.1175/1520-0493(2001)129&lt;0123:asekff&gt;2.0.co;2</a>, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Houtekamer et al.(2014)Houtekamer, He, and
Mitchell</label><mixed-citation>
Houtekamer, P. L., He, B., and Mitchell, H. L.: Parallel Implementation of an
Ensemble Kalman Filter, Mon. Weather Rev., 142, 1163–1182,
<a href="http://dx.doi.org/10.1175/mwr-d-13-00011.1" target="_blank">doi:10.1175/mwr-d-13-00011.1</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Jones et al.(2007)Jones, Thomson, Hort, and Devenish</label><mixed-citation>
Jones, A., Thomson, D., Hort, M., and Devenish, B.: The U.K. Met Office's
Next-Generation Atmospheric Dispersion Model, NAME III, in: Air Pollution
Modeling and Its Application XVII, edited by: Borrego, C. and Norman, A.-L.,
Springer US,   580–589, <a href="http://dx.doi.org/10.1007/978-0-387-68854-1_62" target="_blank">doi:10.1007/978-0-387-68854-1_62</a>, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Kalnay et al.(2012)Kalnay, Ota, Miyoshi, and Liu</label><mixed-citation>
Kalnay, E., Ota, Y., Miyoshi, T., and Liu, J.: A simpler formulation of
forecast sensitivity to observations: application to ensemble Kalman
filters, Tellus A, 64, 18462, <a href="http://dx.doi.org/10.3402/tellusa.v64i0.18462" target="_blank">doi:10.3402/tellusa.v64i0.18462</a>,
2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Keppenne(2000)</label><mixed-citation>
Keppenne, C. L.: Data Assimilation into a Primitive-Equation Model with a
Parallel Ensemble Kalman Filter, Mon. Weather Rev., 128, 1971–1981,
<a href="http://dx.doi.org/10.1175/1520-0493(2000)128&lt;1971:daiape&gt;2.0.co;2" target="_blank">doi:10.1175/1520-0493(2000)128&lt;1971:daiape&gt;2.0.co;2</a>, 2000.
</mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Keppenne and Rienecker(2002)</label><mixed-citation>
Keppenne, C. L. and Rienecker, M. M.: Initial Testing of a Massively Parallel
Ensemble Kalman Filter with the Poseidon Isopycnal Ocean General Circulation
Model, Mon. Wea. Rev., 130, 2951–2965,
<a href="http://dx.doi.org/10.1175/1520-0493(2002)130&lt;2951:itoamp&gt;2.0.co;2" target="_blank">doi:10.1175/1520-0493(2002)130&lt;2951:itoamp&gt;2.0.co;2</a>, 2002.
</mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Khairullah et al.(2013)Khairullah, Lin, Hanea, and
Heemink</label><mixed-citation>
Khairullah, M., Lin, H., Hanea, R. G., and Heemink, A. W.: Parallelization of
Ensemble Kalman Filter (EnKF) for Oil Reservoirs with Time-lapse Seismic
Data, International Journal of Mathematical, Computational Science and
Engineering, 7, <a href="http://waset.org/Publication/16317" target="_blank">http://waset.org/Publication/16317</a>,  2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Liang et al.(2009)Liang, Sepehrnoori, and
Delshad</label><mixed-citation>
Liang, B., Sepehrnoori, K., and Delshad, M.: An Automatic History Matching
Module with Distributed and Parallel Computing, Petroleum Science and
Technology, 27, 1092–1108, <a href="http://dx.doi.org/10.1080/10916460802455962" target="_blank">doi:10.1080/10916460802455962</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Lin et al.(1998)Lin, Cosman, Heemink, Stijnen, and van
Beek</label><mixed-citation>
Lin, H.-X., Cosman, A., Heemink, A., Stijnen, J., and van Beek, P.:
Parallelization of the Particle Model SIMPAR, in: Advances in Hydro-Science
and Engineering, edited by Holz, K. P., Bechteler, W., Wang, S. S. Y., and
Kawahara, M., vol. 3, Center for Computational Hydroscience and Engineering,
available at: <a href="https://www.researchgate.net/publication/252671025_Parallelization_of_the_Particle_Model_SIMPAR" target="_blank">https://www.researchgate.net/publication/252671025_Parallelization_of_the_Particle_Model_SIMPAR</a>
(last access: 3 April 2017), 1998.
</mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Lu et al.(2016a)Lu, Lin, Heemink, Segers, and
Fu</label><mixed-citation>
Lu, S., Lin, H. X., Heemink, A., Segers, A., and Fu, G.: Estimation of
volcanic ash emissions through assimilating satellite data and ground-based
observations, J. Geophys. Res.-Atmos., 121, 10971–10994,
<a href="http://dx.doi.org/10.1002/2016JD025131" target="_blank">doi:10.1002/2016JD025131</a>, 2016a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Lu et al.(2016b)Lu, Lin, Heemink, Fu, and
Segers</label><mixed-citation>
Lu, S., Lin, H. X., Heemink, A. W., Fu, G., and Segers, A. J.: Estimation of
Volcanic Ash Emissions Using Trajectory-Based 4D-Var Data Assimilation, Mon.
Weather Rev., 144, 575–589, <a href="http://dx.doi.org/10.1175/mwr-d-15-0194.1" target="_blank">doi:10.1175/mwr-d-15-0194.1</a>, 2016b.
</mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Lu et al.(2017)Lu, Heemink, Lin, Segers, and Fu</label><mixed-citation>
Lu, S., Heemink, A., Lin, H. X., Segers, A., and Fu, G.: Evaluation criteria
on the design for assimilating remote sensing data using variational
approaches, Mon. Weather Rev., 0, 1–11, <a href="http://dx.doi.org/10.1175/mwr-d-16-0289.1" target="_blank">doi:10.1175/mwr-d-16-0289.1</a>,
2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Miyazaki et al.(2015)Miyazaki, Eskes, and
Sudo</label><mixed-citation>
Miyazaki, K., Eskes, H. J., and Sudo, K.: A tropospheric chemistry reanalysis
for the years 2005–2012 based on an assimilation of OMI, MLS, TES, and
MOPITT satellite data, Atmos. Chem. Phys., 15, 8315–8348,
<a href="http://dx.doi.org/10.5194/acp-15-8315-2015" target="_blank">doi:10.5194/acp-15-8315-2015</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Nerger and Hiller(2013)</label><mixed-citation>
Nerger, L. and Hiller, W.: Software for ensemble-based data assimilation
systems – Implementation strategies and scalability, Comput.
Geosci., 55, 110–118, <a href="http://dx.doi.org/10.1016/j.cageo.2012.03.026" target="_blank">doi:10.1016/j.cageo.2012.03.026</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Oxford-Economics(2010)</label><mixed-citation>
Oxford-Economics: The Economic Impacts of Air Travel Restrictions Due to
Volcanic Ash, Report for Airbus, Tech. rep.,
available at: <a href="http://www.oxfordeconomics.com/my-oxford/projects/129051" target="_blank">http://www.oxfordeconomics.com/my-oxford/projects/129051</a>
(last access: 3 April 2017),
2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Petrie and Dance(2010)</label><mixed-citation>
Petrie, R. E. and Dance, S. L.: Ensemble-based data assimilation and the
localisation problem, Weather, 65, 65–69, <a href="http://dx.doi.org/10.1002/wea.505" target="_blank">doi:10.1002/wea.505</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Quinn and Abarbanel(2011)</label><mixed-citation>
Quinn, J. C. and Abarbanel, H. D. I.: Data assimilation using a GPU
accelerated path integral Monte Carlo approach, J. Comput.
Phys., 230, 8168–8178, <a href="http://dx.doi.org/10.1016/j.jcp.2011.07.015" target="_blank">doi:10.1016/j.jcp.2011.07.015</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Riishojgaard(1998)</label><mixed-citation>
Riishojgaard, L. P.: A direct way of specifying flow-dependent background
error correlations for meteorological analysis systems, Tellus A, 50,
42–57, <a href="http://dx.doi.org/10.1034/j.1600-0870.1998.00004.x" target="_blank">doi:10.1034/j.1600-0870.1998.00004.x</a>, 1998.
</mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Saad(2003)</label><mixed-citation>
Saad, Y.: Iterative Methods for Sparse Linear Systems, Society for Industrial
and Applied Mathematics, <a href="http://dx.doi.org/10.1137/1.9780898718003" target="_blank">doi:10.1137/1.9780898718003</a>, 2003.
</mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Schaap et al.(2008)Schaap, Timmermans, Roemer, Boersen, Builtjes,
Sauter, Velders, and Beck</label><mixed-citation>
Schaap, M., Timmermans, R. M. A., Roemer, M., Boersen, G. A. C., Builtjes, P.
J. H., Sauter, F. J., Velders, G. J. M., and Beck, J. P.: The LOTOS EUROS
model: description, validation and latest developments, Int.
J. Environ. Pollut., 32,    270,
<a href="http://dx.doi.org/10.1504/ijep.2008.017106" target="_blank">doi:10.1504/ijep.2008.017106</a>, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Segers(2002)</label><mixed-citation>
Segers, A. J.: Data Assimilation in Atmospheric Chemistry Models Using Kalman
Filtering, Delft Univ Pr,
available at: <a href="http://repository.tudelft.nl/islandora/object/uuid:113b6229-c33a-4100-93be-22e1c8912672?collection=research" target="_blank">http://repository.tudelft.nl/islandora/object/uuid:113b6229-c33a-4100-93be-22e1c8912672?collection=research</a>
(last access: 3 April 2017),
2002.
</mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Tavakoli et al.(2013)Tavakoli, Pencheva, and
Wheeler</label><mixed-citation>
Tavakoli, R., Pencheva, G., and Wheeler, M. F.: Multi-level Parallelization of
Ensemble Kalman Filter for Reservoir History Matching, in: SPE Reservoir
Simulation Symposium, Society of Petroleum Engineers,
<a href="http://dx.doi.org/10.2118/141657-ms" target="_blank">doi:10.2118/141657-ms</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Weber et al.(2012)Weber, Eliasson, Vogel, Fischer, Pohl, van Haren,
Meier, Grobéty, and Dahmann</label><mixed-citation>
Weber, K., Eliasson, J., Vogel, A., Fischer, C., Pohl, T., van Haren, G.,
Meier, M., Grobéty, B., and Dahmann, D.: Airborne in-situ investigations
of the Eyjafjallajökull volcanic ash plume on Iceland and over
north-western Germany with light aircrafts and optical particle counters,
Atmos. Environ., 48, 9–21, <a href="http://dx.doi.org/10.1016/j.atmosenv.2011.10.030" target="_blank">doi:10.1016/j.atmosenv.2011.10.030</a>,
2012.

</mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Webley et al.(2012)Webley, Steensen, Stuefer, Grell, Freitas, and
Pavolonis</label><mixed-citation>
Webley, P. W., Steensen, T., Stuefer, M., Grell, G., Freitas, S., and
Pavolonis, M.: Analyzing the Eyjafjallajökull 2010 eruption using
satellite remote sensing, lidar and WRF-Chem dispersion and tracking model,
J. Geophys. Res., 117, D00U26, <a href="http://dx.doi.org/10.1029/2011jd016817" target="_blank">doi:10.1029/2011jd016817</a>, 2012.

</mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Zehner(2010)</label><mixed-citation>
Zehner, C. (Ed.): Monitoring Volcanic Ash From Space, ESA communication
Production Office, <a href="http://dx.doi.org/10.5270/atmch-10-01" target="_blank">doi:10.5270/atmch-10-01</a>, 2010.
</mixed-citation></ref-html>--></article>
