In this study, we investigate a strategy to accelerate the data assimilation (DA) algorithm. Based on evaluations of the computational time, the analysis step of the assimilation turns out to be the most expensive part. After a study of the characteristics of the ensemble ash state, we propose a mask-state algorithm which records the sparsity information of the full ensemble state matrix and transforms the full matrix into a relatively small one. This will reduce the computational cost in the analysis step. Experimental results show the mask-state algorithm significantly speeds up the analysis step. Subsequently, the total amount of computing time for volcanic ash DA is reduced to an acceptable level. The mask-state algorithm is generic and thus can be embedded in any ensemble-based DA framework. Moreover, ensemble-based DA with the mask-state algorithm is promising and flexible, because it implements exactly the standard DA without any approximation and it realizes the satisfying performance without any change in the full model.

Volcanic ash erupted into atmospheres can lead to severe influences on
aviation society

However, to make the methodology efficient also in an operational (real-time)
sense, the computational efforts must be acceptable. For volcanic ash DA
problems, so far, no studies on the computational aspects have been reported
in the literature. Actually, when large amounts of volcanic ash erupted into
atmospheres, the computational speed of volcanic ash forecasts is just as
important as the forecast accuracy

Due to the computational complexity of ensemble-based algorithms and the
large scale of dynamical applications, applying these methods usually
introduces a large computational cost. This has been reported from the
literature on different applications. For example, for operational weather
forecasting with ensemble-based DA,

To accelerate an ensemble-based DA system, the ensemble forecast step can
first be parallelized because the propagation of different ensemble members
is independent. Thus if a computer with a sufficiently large number of
parallel processors is available, all the ensemble members can be
simultaneously integrated. In the analysis stage, to calculate the Kalman
gain and the ensemble error covariance matrix, all ensemble states must be
combined together. In weather forecasting and oceanography sciences,

Although for other applications there were many efforts in dealing with large computational requirements in an ensemble-based DA system, most of them cannot be directly used to accelerate volcanic ash DA. This is because the acceleration algorithms are strongly dependent on specific problems, such as model complexity (high or low resolution), observation type (dense or sparse), or primary requirement (accuracy or speed). These factors determine, for a specific application, which part is the most time-consuming, and which part is intrinsically sequential. Thus, no unified approach for efficient acceleration of all the applications can be found. Although the successful approaches in other applications cannot be directly employed in volcanic ash forecasts, their success does stress the importance of designing a proper approach based on the computational analysis of a specific DA system. Therefore, the computational cost of our volcanic ash DA system will first be analyzed. Then, based on the computational analysis, we will investigate a strategy to accelerate the ensemble-based DA system for volcanic ash forecasts.

This paper is organized as follows. Section

In this study, the EnKF

To simulate a volcanic ash plume, an atmospheric transport model is needed.
In this paper, the LOTOS-EUROS (abbreviation of LOng Term Ozone Simulation –
EURopean Operational Smog) model is used

Methodology of ensemble-based DA.

The experiment in this study starts at

When the model propagates to 09:40 UTC, 18 May 2010, the volcanic ash state
gets sequentially analyzed by the DA process by combining real aircraft in
situ measurements of PM

The EnKF with the above setups is abbreviated as “conventional EnKF” and
used in this study for the computational evaluation. Note that in the study
we do not use covariance localization as proposed by

Ensemble-based DA is a useful approach to improve the forecast accuracy of
volcanic ash transport. However, if it is time-consuming, it cannot be taken
as efficient due to the high requirement on speed for volcanic ash DA (see
Sect.

As introduced in Sect.

Comparison of the computational cost of conventional EnKF and
MS-EnKF. (The results are obtained from the bullx B720 thin nodes of the
Cartesius cluster, which is a computing facility of SURFsara, the Netherlands
Supercomputing Centre. Each node is configured with 2

h: hour; simulation window =

The evaluation result of the conventional EnKF is shown in
Table

It can also be observed from Table

We start with the formulations of the analysis step. The analysis step is
represented by Eq. (

Computational evaluation of the analysis step.

Based on previous definitions and Eqs. (

Characteristics of a volcanic ash state.

Analysis in the previous section shows that

Algorithms for CSR-based SDMM to compute the multiplication of
sparse matrix

Here we introduce item

Compute ensemble mean state

Construct mask array

Construct masked ensemble state matrix

Compute

Construct analyzed ensemble state matrix

According to the derivations of MS, the computational costs related to zero
rows are avoided. Here the “zero rows” do not equal “zero elements”. The
former corresponds to the regions where there is no ash for all the ensemble
members, while the latter also counts the no-ash regions specifically for
some ensembles. Certainly the consideration of all “zero elements” can
include all the sparsity information of the ensemble state matrix, but extra
computations and memories must be spent on searching the full matrix

Based on procedures of MS, the computational cost of

The comparison between both costs (with and without MS, i.e., O(

The relation

Computational evaluation of all the steps of the mask-state
algorithm (MS) for

h: hour; the time is wall clock time.

According to Amdahl's law

Analysis of the algorithmic complexity of MS shows that MS is an efficient
approach to reduce the computational cost of the time-consuming

Computational time for the analysis step of conventional EnKF, MS-EnKF, and CSR-based-SDMM-EnKF.

h: hour; the time is wall clock time.

MS is now experimentally proven as efficient to significantly reduce the
computational time for the analysis step during volcanic ash DA. Note that it
can also be observed that the computational time for the “other” parts in
Table

The result shows that, benefitting from the success of a reduced analysis
step, the overall computational cost indeed gets significantly reduced. The
total execution time is 1.95 h, which is less than the simulation window of
3 h (09:00–12:00 UTC, 18 May 2010). This result satisfies our goal to
accelerate the computation to an acceptable runtime (i.e., requires less
runtime than the time period of the DA application). Therefore, aviation
advice based on the MS-EnKF can be provided as not only accurate, but also
sufficiently fast. Note that the result (1.95 h) is obtained after the
volcanic ash is transported to continental Europe. If the assimilation is
performed in the starting phase of volcanic ash eruption (when aircraft
measurements are available), a more significant acceleration would be
obtained. This is because in this case the volcanic ash is only transported
in an area near to the volcano; thus, the number of no-ash grid cells will
take a large proportion (much higher than

There is another interesting point. According to
Fig.

Note that in this study we only perform the commonly used ensemble
parallelization for the forecast step (already efficient compared to the
expensive analysis step) but do not choose model-based parallelization (e.g.,
tracer or domain decomposition). As specified in
Table

According to Sect.

Before we make the comparison, we need to first address the intrinsic problem
when considering standard sparse matrix methods in EnKF for

Generating CSR arrays is usually much more expensive (computationally) than a
single sparse matrix-vector multiplication (SpMV). Thus, if we generate CSR
arrays for only performing one-time SpMV, it would be meaningless from HPC's
point of view. Fortunately, this is not the case for

To implement CSR-based SDMM for

After the above three CSR arrays are generated, CSR-based SpMV can be
performed for multiplying

Computational evaluation of the sub-steps of the sparse–dense
matrix multiplication with compressed sparse row storage (CSR-based SDMM) for

CSR-based SDMM is formed by (ii) and (iii). h: hour; the time is wall clock time.

From Table

The result of CSR-based SDMM also shows that the standard sparse matrix
methods can reduce the computational time of

In the CSR-based SDMM, only non-zero elements in

Firstly, from the programming's perspective, in CSR-based SDMM, the loop
number for the rows of

Secondly, with respect to the algorithm, CSR-based SDMM utilizes the sparsity
of

It is useful to apply standard sparse matrix methods (e.g., CSR-based SDMM)
for our assimilation application. The accelerated analysis step by CSR-based
SDMM (1.22 h; see Table

For volcanic ash forecasts, only a relatively small domain is polluted
compared to the full 3-D domain, so that MS can work efficiently. Using MS is
also applicable for many other DA problems, where the domain is not fully
polluted by the species. It does not matter what the emission looks like and
whether the releases are short- or long-lived species. Given an assimilation
problem, the only restriction for MS to gain an acceleration is whether the
whole domain is fully polluted or partly polluted. The assimilation problems
where MS can achieve the acceleration effect on the computations of

It has been analyzed that when the number of non-zero rows (

As stated in Eq. (

Based on the formulation of MS, one may think it can be taken as a
localization approach

In this study, we do not employ the localization strategy in the analysis
step, because we use a rather large ensemble size of 100 to guarantee the
accuracy, as introduced in Sect.

The localization approach is usually realized in Eq. (

Motivated by the model's physics, the implementation MS currently is for the
serial case. This implementation has reduced the computation time to an
acceptable time (i.e., the simulation time is less than the period of
forecast in real-world time). It is however interesting to discuss the
potential of parallelization of the dense–dense matrix multiplication
(

Alternatively, one may also consider to (1) directly parallelize the
expensive matrix multiplication of

In this paper, for the current usage, we keep the possibility of parallelization open, because a serial MS has been efficient already.

In this study, based on evaluations of the computational cost of volcanic ash DA, the analysis step turned out to be very expensive. Although some potential approaches can accelerate the initialization and forecast steps, there would be no notable improvement to the total computational time due to the dominant analysis step. Therefore, to get an acceptable computational cost, the key is to efficiently reduce the execution time of the analysis step.

After a detailed evaluation of various parts of the analysis stage, the most time-consuming part was revealed. The mask-state algorithm (MS) was developed based on a study of the characteristics of the ensemble ash states. The algorithm transforms the full ensemble state matrix into a relatively small matrix using a constructed mask array. Subsequently, the computation of the analysis step was sufficiently reduced. MS is developed as a generic approach; thus, it can be embedded in all ensemble-based DA implementations. The extra computational cost of the algorithm is small and usually negligible.

The conventional ensemble-based DA with MS is shown to successfully reduce
the total computational time to an acceptable level, i.e., less than the time
period of the assimilation application. Consequently, timely and accurate
volcanic ash forecasts can be provided for aviation advice. This approach is
flexible. It boosts the performance without considering any model-based
parallelization such as domain or component decomposition. Thus, when a
parallel model is available, MS can easily be combined with the model to gain
a further speedup. It implements exactly the standard DA without any
approximation and with easy configurations, so that it can be used to
accelerate the standard DA in a wide range of applications.
In this case study with the LOTOS-EUROS model (version 1.10), after the
parallelization is performed for the forecast step of EnKF assimilation, the
analysis step takes 72 % of the total runtime, which means the analysis
step is the bottleneck. This case might not be general for all ash forecasts,
as the computational cost for initialization and forecast greatly depends on
the forecast model that is used. For the current development, it makes sense
to use the LOTOS-EUROS model, because the model has been configured and
evaluated in

The use of in situ measurements is one important reason why MS works
perfectly. For each analysis step, the number of measurements are quite
small, and the procedure of the singular value decomposition (SVD) costs
little. However, in some applications when many measurements are assimilated
(e.g., satellite-based data

The averaged aircraft in situ data used in this study
are available from Fig.

Guangliang Fu, Sha Lu, and Arjo Segers simulated the volcanic ash transport using the LOTOS-EUROS model. Guangliang Fu, Hai Xiang Lin, and Tongchao Lu evaluated the computational efforts. Guangliang Fu, Hai Xiang Lin, Arnold Heemink, and Shiming Xu developed the algorithms. Guangliang Fu, Hai Xiang Lin, and Nils van Velzen carried out computer experiments and analyzed the performance of the developed algorithm. Guangliang Fu and Hai Xiang Lin wrote the paper.

The authors declare that they have no conflict of interest.

We are very grateful to the editor and four anonymous reviewers for their reviews. We thank the Netherlands Supercomputing Center for supporting us with the Cartesius cluster for the experiments in our study. We are grateful to Konradin Weber for providing the aircraft measurements. Edited by: R. Sander Reviewed by: four anonymous referees