Articles | Volume 16, issue 12
Development and technical paper
27 Jun 2023
Development and technical paper |  | 27 Jun 2023

Reconstructing tephra fall deposits via ensemble-based data assimilation techniques

Leonardo Mingari, Antonio Costa, Giovanni Macedonio, and Arnau Folch

In recent years, there has been a growing interest in ensemble approaches for modelling the atmospheric transport of volcanic aerosol, ash, and lapilli (tephra). The development of such techniques enables the exploration of novel methods for incorporating real observations into tephra dispersal models. However, traditional data assimilation algorithms, including ensemble Kalman filter (EnKF) methods, can yield suboptimal state estimates for positive-definite variables such as those related to volcanic aerosols and tephra deposits. This study proposes two new ensemble-based data assimilation techniques for semi-positive-definite variables with highly skewed uncertainty distributions, including aerosol concentrations and tephra deposit mass loading: the Gaussian with non-negative constraints (GNC) and gamma inverse-gamma (GIG) methods. The proposed methods are applied to reconstruct the tephra fallout deposit resulting from the 2015 Calbuco eruption using an ensemble of 256 runs performed with the FALL3D dispersal model. An assessment of the methodologies is conducted considering two independent datasets of deposit thickness measurements: an assimilation dataset and a validation dataset. Different evaluation metrics (e.g. RMSE, MBE, and SMAPE) are computed for the validation dataset, and the results are compared to two references: the ensemble prior mean and the EnKF analysis. Results show that the assimilation leads to a significant improvement over the first-guess results obtained from the simple ensemble forecast. The evidence from this study suggests that the GNC method was the most skilful approach and represents a promising alternative for assimilation of volcanic fallout data. The spatial distributions of the tephra fallout deposit thickness and volume according to the GNC analysis are in good agreement with estimations based on field measurements and isopach maps reported in previous studies. On the other hand, although it is an interesting approach, the GIG method failed to improve the EnKF analysis.

1 Introduction

Multiple hazards are associated with volcanic eruptions including lava flows, pyroclastic density currents, lahars, volcanic plumes, and tephra fallout. Specifically, the dispersal of volcanic plumes poses a serious threat to flight safety (e.g. Clarkson et al.2016), and the subsequent fallout of tephra can cause structural damage to buildings and infrastructure due to excessive loading, as well as disrupting communication networks, airports, power plants, and water and energy distribution networks (Wilson et al.2014). Additionally, fresh fallout deposits may be resuspended by aeolian processes, affecting the air quality and prolonging the impacts of an eruption many years afterwards (Folch et al.2014; Dominguez et al.2020; Mingari et al.2020).

The characterisation and quantification of past eruptive events are also of paramount importance for volcano hazard and risk assessment studies, which infer the likelihood of future eruption scenarios based on past volcano behaviour. Explosive volcanic eruptions are often characterised and classified by means of tephra deposits (Bonadonna et al.2015), which provide critical information to infer eruption source parameters (ESPs) relevant to hazards, such as eruption column height, mass eruption rate, and total erupted volume (Martí et al.2016; Constantinescu et al.2022). Traditionally, volcanologists rely on simple field-based models to obtain certain ESPs (e.g. erupted volume) assuming an exponential-like decay with distance for some deposit-related variables such as deposit thickness (Pyle1989; Bonadonna and Costa2013). However, it is well-recognised that this simplistic approach is inappropriate for tephra fall deposits with complex distribution patterns (e.g. Bonadonna et al.1998; Martí et al.2016). In fact, many deposits exhibit abrupt thickness variations over short distances, display well-developed secondary maxima, show grain-size bimodality (Durant et al.2009), are stratified deposits with alternating layer characteristics, and include other complexities that make the reconstruction of tephra fallout deposits challenging (Scasso et al.1994).

In contrast, physics-based approaches, built upon volcanic ash transport and dispersal (VATD) models, include multiple physical parameterisations and are a much more powerful tool for representing the real distribution of tephra deposits. However, the accuracy of deterministic models is highly sensitive to uncertain model input parameters (e.g. eruption column height or physical properties of particles) and the underlying meteorological fields. Alternatively, probabilistic modelling approaches provide a framework to incorporate uncertainties associated with model input data. Specifically, ensemble-based modelling strategies allow one to characterise and quantify model uncertainties and have been proven to enhance VATD model skills (Bonadonna et al.2012; Madankan et al.2014; Stefanescu et al.2014). For example, several VATD models have been used to conduct ensemble simulations, including ASH3D (Denlinger et al.2012), COSMO-ART (Vogel et al.2014), HYSPLIT (Dare et al.2016; Zidikheri et al.2018), NAME (Dacre and Harvey2018; Beckett et al.2020), and FALL3D (Sandri et al.2016; Folch et al.2022b; Martinez et al.2022). Furthermore, different inversion modelling techniques based on ensemble approaches have been shown to produce improved volcanic ash forecasts consistent with observations by constraining ash emission estimates and model parameters (Pelley et al.2015; Zidikheri et al.2017; Harvey et al.2020).

The incorporation of ensemble capabilities in VATD models lays the foundation for developing and implementing ensemble-based data assimilation and inversion techniques (see Folch and Mingari2023, for a recent detailed review). Two main approaches have been explored in the literature to assimilate volcanic aerosol observations from satellites: ensemble Kalman filters (Fu et al.2016, 2017; Osores et al.2020; Pardini et al.2020; Mingari et al.2022b) and ensemble particle filter methods (Zidikheri and Lucas2021a, b). Specifically, ensemble Kalman filter (EnKF) methods, used for sequential data assimilation, are based on the Kalman filter (Kalman1960). They approximate the probability distributions by an ensemble of system states and assume that the prior model errors and the observation noise are Gaussian. However, lower-bounded variables such as water-vapour mixing ratio (Kliewer et al.2016), rainfall (Husak et al.2007) and aerosol concentrations (O'Neill et al.2000) frequently have skewed and near-zero distributions and are not well-described by Gaussian distributions. As a result, traditional EnKF methods in VATD models often yield suboptimal state estimates (Folch and Mingari2023).

This study explores two new ensemble-based data assimilation techniques for positive-definite variables and their implementation in VATD models, the Gaussian with non-negative constraints (GNC) method and the gamma, inverse-gamma, and Gaussian ensemble Kalman filter (GIGG-EnKF), a sequential method proposed by Bishop (2016) for highly skewed non-negative distributions. Posselt and Bishop (2018) applied this approach for the non-linear data assimilation of precipitation rate observations and compared the results with the analysis produced by a classical EnKF algorithm. It was concluded that the analysis ensemble of precipitation rates produced by the GIGG-EnKF bears a closer resemblance to the Bayesian posterior when the distribution is skewed.

This study aims to reconstruct the tephra fall deposit of the 2015 Calbuco eruption from a scattered set of observations. The rich existing dataset available for this eruption, consisting of deposit samples collected up to 500 km downwind from the volcano, provides an excellent test case to evaluate the proposed methodology. The Gaussian with non-negative constraints (GNC) method and the gamma inverse-gamma (GIG) method, based on the GIG equation set proposed by Bishop (2016), are used to assimilate deposit thickness data. Both methods are used here to reconstruct a complete map of the tephra fall deposit from a dataset of uncertain observations and an ensemble of model realisations based on numerical simulations performed with the FALL3D dispersal model. In addition, a technique for emission source inversion based on the GNC method is also presented and discussed. As an initial step, this paper is focused on the assimilation of tephra deposits, which is crucial for long-term tephra hazard assessment, leaving the assimilation of volcanic clouds and the potential use of these two methods in operational ash forecast contexts to future studies.

The paper is organised as follows. The ensemble-based data assimilation methods are introduced in Sect. 2. A brief description of the 2015 Calbuco eruption is outlined in Sect. 3 where details about the observational datasets are given. Subsequently, Sect. 3 describes the numerical experiments and shows the results obtained by both methods. In Sect. 4, the GNC method is used to invert the Calbuco source term. Section 5 focuses on potential implications of the proposed methodology, and possible future applications and limitations are further discussed. Finally, conclusions are drawn in Sect. 6.

2 Methodology

Data assimilation (DA) techniques have been widely used to study and forecast geophysical systems and have been applied in a variety of research and operational settings (Carrassi et al.2018). Data assimilation methods aim at obtaining an estimation of the state of a dynamical system (e.g. a component of the Earth system such as the atmosphere or the ocean) by exploiting information from numerical models and observations.

The ensemble Kalman filter (EnKF) is a remarkable example of a sequential data assimilation scheme based on the Kalman filter theory (Kalman1960) using a Monte Carlo approach (Evensen1994; Burgers et al.1998). Given a probability density function (PDF) of the model state (the so-called prior or forecast) and the observation likelihood, the goal is to estimate the updated PDF (the so-called posterior or analysis) taking into account the observation likelihood. Assume that the state of the physical system is represented by a model state vector x∈ℝn, where n is the system dimension, and that the observations are given by a vector yo∈ℝp, where p is the number of observations. EnKF uses an ensemble of model states to represent the distribution of the model state. Specifically, this ensemble-based data assimilation technique relies on a forward model, which is used to generate an ensemble of trajectories of the model dynamics, and the state estimate of the system is represented by an ensemble of m system state vectors xi∈ℝn, with m being the ensemble size. The average model state vector xRn can be approximated by the ensemble mean:

(1) x 1 m i = 1 m x i ,

whereas the ensemble-based error covariance matrix is used to approximate the covariance P according to

(2) P 1 m - 1 X X ,

where the matrix of ensemble perturbations XRn×m is given by [x1-x,,xm-x]. In the EnKF framework, all PDFs are assumed to be Gaussian, and consequently Eqs. (1) and (2) for mean and covariance fully define the multivariate normal distribution. The EnKF method decomposes into a forecast step and an analysis step. In the forecast step, an ensemble of model states is evolved up to the observation time using the forward model in order to estimate a prior ensemble of model states xif (i=1…m). No observation information is included in this forecast or prior state estimate. The analysis step is performed by updating each individual ensemble member according to

(3) x i a = x i f + K y i o - H x i f

in order to generate a posterior ensemble of model states xia (i=1…m). The p-dimensional vectors yio in Eq. (3) represent an ensemble of perturbed observations with ensemble mean equal to the actual observation vector (yio=yo) and error covariance R; the observation operator is denoted by H, and the matrix K is the ensemble-based Kalman gain:

(4) K = P f H ( HP f H + R ) - 1 .

The ensemble-based versions of the matrices P and measurement covariance matrix R are used here. Using Eq. (3), it is possible to obtain the EnKF update equations for the (ensemble-based) mean and covariance (Evensen1994; Burgers et al.1998):


2.1 Deposit reconstruction methods

In this work, the state of the physical system is fully determined by the two-dimensional tephra deposit load (in kg m−2), and the components of the model state vector x represent the mass load at the n grid points of the computational domain. The FALL3D model for atmospheric passive transport and deposition of volcanic tephra is used to produce the prior ensemble of volcanic deposit states xif by running multiple instances of the forward model. The eruption source parameters (ESPs) are perturbed around a first-guess configuration in order to define a model run for each ensemble member, as detailed in Sect. 3.3.

The observations vector yo represents a list of scattered deposit thickness observations (in cm). In consequence, the operator H that relates the model state to the measurements can be considered to be linear in the present case: the deposit mass load (model state) and the deposit thickness at the measurement site (observation) are related by a proportionality factor, i.e. the bulk density of the tephra deposit, which, for the sake of simplicity, is assumed to be constant and equal to 800 kg m−3. In addition, horizontal bi-linear interpolations are also required in order transform from the computational domain to the measurement sampling sites.

According to the analysis scheme in the EnKF, given the prior ensemble and the observations, the deposit can be reconstructed by means of Eq. (5a) since the analysis ensemble mean can be interpreted as the best estimate (e.g. see Burgers et al.1998) due to the underlying assumption in Kalman filters that errors are Gaussian. However, the EnKF analysis becomes suboptimal for non-Gaussian distributions. Specifically, when dealing with dispersal models of aerosols and volcanic tephra, the Gaussian assumption is critical and the EnKF analysis scheme can lead to unrealistic results in those cases, as will be shown in Sect. 3. In consequence, two new methods are proposed for the reconstruction of volcanic deposits. To this purpose, let us rewrite Bayes' theorem in terms of model state mapped in the observation space. Assume that a linear observation operator HRp×n exists, which translates the model state x into the observation space:

(6) y = H x ,

where y represent a p-dimensional vector. In a probabilistic framework, the PDF of the state y conditioned to the observation yo is relevant to the assimilation techniques and can be computed via Bayes' theorem (Jazwinski1970):

(7) P ( y | y o ) = P ( y o | y ) P ( y ) P ( y o ) ,

where P(y|yo) is the posterior distribution, P(y) is the prior PDF, P(yo|y) is the likelihood of the data conditioned on the state y, and P(yo) is the marginal distribution of the observation. In consequence, the determination of the posterior PDF requires the specification of both the prior and the observational PDFs. This paper proposes two ensemble-based assimilation strategies which rely on Eq. (7) and differ in the assumptions made about these PDFs. The GNC method (Sect. 2.2) uses an all-at-once assimilation approach looking for the model state that maximises the vectorial form of Eq. (7) and observations are assimilated all at once. In contrast, the GIG method (Sect. 2.3) uses a serial assimilation approach in which the univariate version of Eq. (7) is explicitly written for each single observation and the full dataset of p observations is assimilated in a sequential way.

2.2 The GNC method

The Gaussian with non-negative constraints (GNC) method assumes a multi-dimensional Gaussian probability distribution for y, defined in Eq. (6), given as

(8) P ( y ) exp - 1 2 ( y - y ) P - 1 ( y - y ) ,

where PRp×p is the error covariance matrix and yRp is the average model state vector in the observation space. Similarly, for the sake of simplicity, measurements are assumed to be normally distributed with observation error covariance matrix RRp×p, i.e.

(9) P ( y o | y ) exp - 1 2 ( y o - y ) R - 1 ( y o - y ) .

The most likely state is the one that maximises the posterior PDF, as in Eq. (7), or, equivalently, the one that minimises the GNC cost function J:

(10) J ( y ) ( y - y ) P - 1 ( y - y ) + ( y o - y ) R - 1 ( y o - y ) .

Note that the expression above is actually very similar to the cost function used in classical variational methods (e.g. 3D-Var, Carrassi et al.2018) with the difference that y plays the role of the model background state in VAR methods, and the first term in Eq. (10) is computed in the observation space rather than in the model space as usual. This is justified because expressing the functional J in the observation space is advantageous in cases in which observations are localised and/or nearly zero, i.e. circumscribed to a portion of the computational domain (this is what typically occurs for volcanic clouds and fallout deposits). Moreover, the functional in Eq. (10) yields a greatly reduced system when compared to classical VAR methods because the observation space normally has a much lower dimension (pn).

Given a prior ensemble of m state vectors xi representing m model realisations at the analysis time, the GNC method looks for the best linear estimate of the system state in the subspace spanned by the ensemble of vectors:

(11) x = w 1 x 1 + + w m x m ,

where wi≥0 (i=1…m) is a set of non-negative weight factors for each ensemble member. The important point here is that the non-negative constraints on wi relax the Gaussian hypotheses and avoid the occurrence of non-physical solutions. The linearity of the observation operator H allows the analysis to be expressed in the observation space as

(12) y = w 1 y 1 + + w m y m = Y w ,

where yi=Hxi and the matrix YRp×m is defined by [y1,,ym]. The ensemble mean is used to approximate the average model state vector yRp, i.e.

(13) y 1 m i = 1 m y i ,

whereas the ensemble-based error covariance matrix is used to approximate P according to

(14) P 1 m - 1 Y Y ,

where the matrix of ensemble perturbations YRp×m is given by [y1-y,,ym-y]. Replacing Eqs. (12)–(14) in Eq. (10), the GNC cost function J can be expressed as the equivalent quadratic form:

(15) J ( w ) = 1 2 w Q w + b w +



In order to find the optimal vector of weight factors w, the optimisation problem minw0J(w) must be solved. Then, the analysis vector state xa can be computed by replacing the optimal w in Eq. (11). The minimisation of the quadratic form in Eq. (15) subject to the constraints wi0i is a non-negative quadratic programming problem, and there is no analytical solution for the global minimum due to the non-negativity constraint. However, it can be solved using the iterative approach proposed by Sha et al. (2007) as

(17) w i w i - b i + b i 2 + a i c i a i

as long as the matrix Q is symmetric and positive semi-definite, as can be easily verified from Eq. (16a). Under the multiplicative updates in Eq. (17), the cost function decreases monotonically to the value of its global minimum as shown by Sha et al. (2007). The vectors a=A+w and c=A-w must be updated in each iterative step, where A+=|Q|+Q and A-=|Q|-Q. For illustrative purposes, Fig. 1 shows the decrease in the normalised cost function, defined as J/p, under the multiplicative updates in Eq. (17) for the case study presented later in Sect. 3. More than 104 iterative steps were required to get residuals low enough to satisfy the convergence criteria.

Figure 1Iterative approach to minimise the GNC cost function J subject to the non-negativity constraints. Under the multiplicative updates in Eq. (17) the cost function decreases monotonically. In this particular example, the convergence required more than 104 iterative steps.

2.3 The GIG method

Bishop (2016) introduced a variation of the ensemble Kalman filter (EnKF) that solves the univariate Bayes' theorem for non-negative variables with skewed (asymmetrical) probability distributions. The so-called GIGG-EnKF (with GIGG standing for gamma, inverse-gamma, and Gaussian) allows non-negative variables typically involving near-zero values (i.e. with right-skewed probability distributions), such as aerosol, water vapour, cloud, and precipitation concentrations, to be directly assimilated, thus avoiding the use of Gaussian anamorphosis non-linear transformations (e.g. Amezcua and Leeuwen2014). The GIGG-EnKF algorithm is based on the generalised two-stage multivariate ensemble filter described by Anderson (2003). The first stage involves the univariate GIGG-EnKF in which an ensemble-based estimate of the posterior distribution of the observed variable is generated from a single observation and a prior ensemble of state estimates. In the second stage, the univariate method is extended by propagating this information to the complete model state vector using a linear regression approximation.

According to the strategy proposed by Bishop (2016), in the first step, Bayes' theorem is solved in the univariate form of Eq. (7) assuming a distribution pair for the prior probability and the likelihood PDF of the error-prone observations given the truth, respectively. A single observation is assimilated using an appropriate equation set depending on three different cases: GIG, IGG, and G. The GIG equation set is aimed at situations in which the prior can be described by a gamma distribution and the observation likelihood can be represented by an inverse-gamma distribution. In addition, Bishop (2016) introduced the IGG equation set (inverse-gamma prior and gamma observation likelihood) and the G equation set (Gaussian distributions). Volcanic aerosols have been found to be well-approximated by gamma distributions (Mingari et al.2022b). Similarly, it is shown in this paper that the prior distributions of deposit mass loading can be well-represented to some extent by gamma distributions (see Sect. 3.4). Consequently, this paper will focus exclusively on the GIG case. In the GIG case, the posterior probability is given by a gamma PDF:

(18) P y j | y j o y j Π j r - 1 - 1 exp - y j Π j r y j a ,

where yj and yjo are the jth components of the vectors y and yo, respectively. The posterior univariate gamma PDF is characterised by two parameters, namely the analysis mean yja and the type 1 relative error variance of the analysis Πjr:=var(yja)/(yja)2, that in the GIG method are given by


where P̃jr and R̃jr are the type 2 relative error variance of the prior and observations, respectively (see Table A1 in the Appendix for details).

In order to generate an analysis ensemble yjia with the low-order moments of the posterior distribution being consistent with Eqs. (19a) and (19b), Bishop (2016) provides the following stochastic equation for the case of a univariate gamma prior and inverse-gamma observation-likelihood PDFs.

(20) y j i a - y j a y j a = y j i f - y j f y j f 2 + var y j i f + P ̃ j r P ̃ j r + R ̃ j r - 1 × × z j i - z j z j 2 - 2 var z j i - y j i f - y j f y j f 2 + var y j i f

Here, yja can be computed using Eq. (19a). Equation (20) ensures that the analysis ensemble yjia is consistent with the type 1 relative error variance of the posterior given by Eq. (19b) provided zji is randomly sampled from a gamma PDF with type 1 relative error variance Rjz and mean zj given by


In addition, the ensemble generated in this way is ensured to converge to the true posterior PDF for large ensembles.

The univariate case is extended according to the second-stage linear regression step proposed by Anderson (2003) in order to find the analysis ensemble for the complete model state vector. The update of the kth model state vector variable of the ith ensemble member due to the jth observation is computed according to

(22) x k i a = x k i f + cov x k f , y j f var y j f y j i a - y j i f .

The inverse-gamma PDFs assign non-zero probability densities only for positive observations, and, as a result, zero observations cannot be properly assimilated using the GIG equation set (see e.g. Eq. 19a). This problem is addressed here by redefining zero observation data according to

(23) y j o r × ϵ min ,

where r(0,1] is a random number and ϵmin is a typical error expected for zero-valued observations of deposit thickness (assumed to be ϵmin= 1 mm in this work since only visible tephra deposits are considered here).

The GIG method is a sequential procedure: a single observation is assimilated in order to update the prior ensemble forecast using the GIG equation set; subsequently, this procedure is repeated until all observations have been sequentially assimilated. In contrast, the GNC method described in Sec. 2.2 represents an all-at-once assimilation technique. Another important difference is that the GIG method is stochastic; i.e. different applications of the method will lead to different realisations of the analysis. To summarise, a pseudocode of the sequential procedure used to implement the GIG method is detailed in Algorithm 1.

Algorithm 1Pseudocode of the GIG method based on the Bishop (2016) algorithm for the case in which the prior is a gamma distribution and the observation likelihood is an inverse-gamma distribution.

List of observations {yjo} with their relative errors
Analysis ensemble xkia
get xkif
Generate the prior ensemble
procedure GIG Method
randomly shuffle observation list
for all yjo do
Iterate over observations
if yjo=0 then
rrandom number(0,1]
Redefine zero observations
end if
yja Eq. (19a)
yjia Eq. (20)
xkia Eq. (22)
Generate the analysis ensemble
Update the prior ensemble
end for
end procedure

Figure 2Location of the sampling sites corresponding to the dataset reported by Van Eaton et al. (2016) (blue diamond) and an independent dataset composed of 45 measurements (red circle) provided by Florencia Reckziegel (personal communication, September 2020). The map also shows the isopachs of fall deposit thickness in millimetres reported by Romero et al. (2016) used for validation purposes in this work.

3 Reconstruction of the 2015 Calbuco deposit

In this section, the procedures described in Sect. 2 are applied to the 2015 Calbuco eruption in order to obtain the analysed deposit thickness. With this in mind, a total of 204 field measurements of deposit thickness will be considered for assimilation and validation purposes. This dataset is composed of 160 measurements reported by Van Eaton et al. (2016) and an independent dataset of 44 measurements provided by Florencia Reckziegel (personal communication, September 2020). Figure 2 shows the location of the sampling sites for both datasets.

3.1 Fallout deposit and datasets

The 2015 eruption of the Calbuco stratovolcano (41.33 S, 72.61 W) in southern Chile involved two major eruptive pulses on 22–23 April along with a third minor pulse on 30 April (Romero et al.2016). During the most energetic phase on 23 April, stratospheric eruption columns higher than 15 km above the vent level ( 17 km above sea level) were developed. Regions over southern Chile and the Argentinian Patagonia were severely affected by tephra fall. According to different estimations based on field studies, deposit volume ranges between 0.27 and 0.58 km3 (Romero et al.2016; Van Eaton et al.2016).

The availability of independent and comprehensive datasets of field observations makes the Calbuco tephra deposit an excellent case study. Van Eaton et al. (2016) reported the thickness and stratigraphy of the fall deposits at 163 sampling sites within a 500 km radius from the volcano summit. In addition, a complementary dataset composed of 45 independent measurements of deposit thickness (Florencia Reckziegel, personal communication) distributed over a larger region is also available (Fig. 2). Finally, a hand-drawn isopach map built from a third independent dataset (Romero et al.2016) is also used to evaluate the tephra deposit distribution. The corresponding isopachs for 0.1, 0.5, 1.0, 2.0, and 4 mm are represented in Fig. 2. The three datasets are summarised in Table 1.

Van Eaton et al. (2016)Romero et al. (2016)

Table 1Calbuco deposit datasets considered in this study.

Download Print Version | Download XLSX

It is interesting to note the presence of a secondary thickness maximum  200 km downwind from the vent, located around two major cities of the Argentinian Patagonia, Junín de los Andes (39.95 S, 71.07 W) and San Martín de los Andes (40.16 S, 71.35 W), likely due to ash aggregation processes (e.g. Costa et al.2010). The emergence of this distal maximum indicates that complex plume dynamics were involved in the volcanic eruption.

The assimilation methods require a dataset of measurements along with the corresponding absolute or relative errors. Specifically, the GNC method requires the absolute error ϵj associated with the jth measurement yjo (the observation error covariance matrix R is assumed to be diagonal with elements ϵj2). On the other hand, the GIG method requires the relative error ϵjr=ϵj/yjt, where yjt is the true value of the jth observation.

The strategy adopted in this work to provide reasonable error estimates is based on a clustering algorithm, and observation error standard deviations are assumed to be dependent on the measured value. In summary, observational data are organised into groups with similar characteristics, and an absolute and relative error is assigned to each group or cluster. The error for the jth measurement yjo is approximated by the standard deviation associated with the corresponding cluster; the true value yjt, required to estimate relative errors, is approximated by the cluster mean value. A more detailed explanation of the strategy used to estimate errors can be found in the Supplement.

3.2 Validation metrics

As validation metrics, we consider the weighted versions of the mean bias error (MBE) and the root mean square error (RMSE) to measure the differences between observations and analyses, defined as


Notice that if the non-weighted (ωj=1) versions of Eq. (24b) are used, the MBE (in cm) quantifies the tendency to overestimate or underestimate observations for the overall dataset, whereas the RMSE (in cm) measures the average magnitude of the errors. These two metrics are suitable when a uniform distribution of errors is expected. However, the datasets in this work contain measurements spanning 4 orders of magnitude, and the assumption of a constant absolute error seems to be inappropriate in this case because only proximal data (i.e. the largest measurements of deposit thickness) contribute significantly to the non-weighted versions of MBE and RMSE. Instead, in order to treat the deviations more evenly, the observation uncertainties are used to define the weights according to ωj=1/ϵj. These weighted versions of MBE and RMSE represent dimensionless evaluation metrics, and the impact of the assimilation can be better characterised by means of these metrics. Ideally, the MBE should be close to zero and RMSE should be close to 1.

Another relative metric used to evaluate the deposit reconstruction is the symmetric mean absolute percentage error (SMAPE), defined as

(25) SMAPE = 100 × 2 p j = 1 p | y j o - y j | | y j o | + | y j | ,

and is expressed in percentages. Notice that this metric does not depend on the observation errors.

3.3 Ensemble modelling

Numerical simulations were carried out using the latest version release of FALL3D (v8.2), an open-source offline Eulerian model for atmospheric transport and deposition of aerosols and particles, including tephra species. FALL3D solves the so-called advection–diffusion–sedimentation (ADS) equation (Folch et al.2020; Prata et al.2021). The new FALL3D version has been designed to support increasingly larger scientific workloads and prepare the code for the transition to extreme-scale computing systems (Folch et al.2023). Specifically, the code version v8.x has been released with several improvements over previous versions, including improvements in the model physics, numerical algorithmic methods, and computational efficiency. In addition, from version v8.1 onwards, the FALL3D model enables ensemble simulations to be performed very efficiently by means of a single parallel task (Folch et al.2022b). Ensemble modelling allows one to characterise and quantify model uncertainties due to poorly constrained input parameters and errors in the model physics parameterisations or the underlying model-driving meteorological data. In addition, the ability to generate ensemble runs makes it possible to improve forecasts by incorporating observations using different ensemble-based data assimilation techniques.

The configuration of the FALL3D model used in this work is summarised in Table 2. A three-dimensional computational domain with a horizontal resolution of  4 km (0.05) and 180 × 160 × 45 grid points was defined. The total grain-size distribution (TGSD) of tephra injected into the atmosphere consists of the sum of two log-normal distributions (i.e. bi-Gaussian in Φ units) including 40 tephra bins. The modes and standard deviations of the bimodal distribution were computed using the parameterisation proposed by Costa et al. (2016), which estimates them from the eruption intensity and magma viscosity. The mode of the coarser population was located at 1.2Φ with a standard deviation of 1.71 Φ, while the mode of the finer population was 3.49 Φ with a standard deviation of 1.46 Φ. The weight of each subpopulation was set to pc=0.15 and pf=0.85 for the coarse and fine population, respectively. The vertical mass distribution of the source term depends on the eruptive column-top height (H) according to the following parameterisation (Pfeiffer et al.2005):

(26) d m d z 1 - z H exp A s z H - 1 λ s ,

where As and λs are the so-called Suzuki parameters (Suzuki1983; Pfeiffer et al.2005). Finally, the mass emission rate was computed from the eruptive column-top height using the relationship proposed by Degruyter and Bonadonna (2012).

(Pfeiffer et al.2005)(Degruyter and Bonadonna2012)

Table 2FALL3D model configuration parameters for the 2015 Calbuco runs.

Download Print Version | Download XLSX

A 256-member prior ensemble was generated by perturbing the eruption source parameters (ESPs) and the horizontal wind components around a reference value using either uniform or truncated normal distributions. Table 3 lists the perturbed model parameters along with the corresponding reference values and sampling uncertainty ranges. Latin hypercube sampling (LHS, McKay et al.1979) was used to efficiently sample the parameter space. Two eruptive phases were considered with reference values of column-top heights at H= 15 kma.v.l. (above vent level) for each phase. The column-top height H was independently perturbed for each phase assuming a sampling range of ± 25 %. The source start time for the central member was defined at 21:00 UTC on 22 April 2015 (first phase) and at 04:00 UTC on 23 April 2015 (second phase) assuming a duration of 1.6 and 6 h for each phase. Source start times and phase duration were also perturbed.

Table 3Ensemble configuration. The perturbed model parameters are eruption column height (H), eruption phase start time (Ti), phase duration (ΔT), parameters As and λs of the Suzuki vertical mass distribution, the fine mode of the bi-Gaussian TGSD, the density of aggregates, and the wind components.

a Two eruptive phases are considered. Heights are given in kilometres above the vent level. b Hours since 22 April 2015 at 00:00 UTC. c ECMWF atmospheric reanalysis (137 model levels).

Download Print Version | Download XLSX

3.4 Prior ensemble distribution

Before showing how the assimilation methods perform, it is worth characterising the prior ensemble distribution and checking whether the assumptions of the GIG method are fulfilled (i.e. the prior distribution can be approximated by a gamma PDF). To this purpose, the skewness μ̃3 (i.e. the third standardised moment, μ3/σ3) of the prior distribution was computed from the random samples yjif, i.e. the forecasted deposit thickness according to the ith ensemble member (i=1…m) at the sampling site of the jth observation (j=1…p). Figure 3 shows the results for each observation point by plotting μ̃3 as a function of the ratio of the standard deviation to the mean, i.e. Pjr (using the notation introduced in Sect. 2.3). For illustrative purposes, the skewness of three different theoretical distributions (Gaussian, log-normal, and gamma) is shown. As expected, the symmetric Gaussian distribution, characterised by μ̃3=0, does not reproduce the positively skewed prior distribution. The log-normal family of probability distributions represents an example of distributions for positive-definite variables with a lower bound. However, as shown in Fig. 3, the log-normal distribution cannot properly represent the prior distribution because the theoretical skewness is extremely large in this case. In contrast, the skewness computed from the prior distributions (blue dots) is well-approximated by the relationship μ̃3=2Pjr, which is the theoretical expression corresponding to the gamma distribution (solid black line).

Figure 3Skewness as a function of the ratio of the standard deviation to the mean for the prior distribution at the sampling locations (blue circles). Results for some theoretical distributions (Gaussian, log-normal, and gamma) are also shown for comparison.


Figure 4Histograms of sampled prior distributions along with the corresponding theoretical gamma distribution (solid line) at some selected observation sites.


In order to further understand the similarities between the gamma and prior distributions, Fig. 4 explicitly shows histograms of sampled prior distributions along with the corresponding theoretical gamma distribution for some observation sites. The theoretical gamma distributions were constructed using the sampled first and second moments. As observed, good agreement is found between the two distributions in almost all cases. Note that when the type 1 relative error variance is greater than 1 (Pjr>1) the gamma probability density decreases monotonically (Fig. 4a–c) and the mode becomes zero. In contrast, when Pjr<1 or, equivalently, when yj2>var(yj), the mode becomes positive (Fig. 4d–p). The results obtained is this section justify the suitability of the GIG method to deal with the assimilation of volcanic deposit data.

3.5 Analyses

In this section, we compare the tephra deposit field reconstructed according to four strategies: (i) forecast, i.e. the prior ensemble mean, (ii) the EnKF method, i.e. the analysis ensemble mean via Eq. (5a), (iii) the GNC method, and (iv) the GIG method. Notice that the GNC method gives a set of weight factors for each ensemble member wi (i=1,,m), and the best estimate of the system state is obtained by replacing the optimal weight factors in Eq. (11). On the other hand, the GIG method produces an ensemble of analysis states and the analysis ensemble mean is used for comparison purposes in this section. The full dataset of 204 measurements was assimilated to compute the analysis according to strategies (ii)–(iv).

The results of the tephra fall deposit reconstruction are shown in Fig. 5. For comparative purposes, the Romero et al. (2016) deposit contours (hand-drawn isopachs for 0.1, 0.5, 1, 2, and 4 mm) are superimposed in Fig. 5. These contours are based on an independent dataset of thickness measurements (different from the assimilation dataset). The presence of the distal secondary thickness maximum was reproduced by the reconstructed deposits taking into account the ash aggregation processes in the numerical simulation. To this purpose, the parameterisation proposed (Cornell et al.1983; Folch et al.2010) was used assuming an aggregate particle class having a diameter of 200 µm and a density sampled in a range centred around 450 kg m−3 (see Table 3).

Figure 5Reconstructed tephra fall deposit according to the forecast (a), EnKF method (b), GNC method (c), and GIG method (d). The Romero et al. (2016) deposit contours (isopachs for 0.1, 0.5, 1, 2, and 4 mm) are also superimposed for comparative purposes.

The forecast (Fig. 5a) indicates an excessively high deposited mass. In particular, the 2 and 4 mm contours are overestimated by almost an order of magnitude. This is corrected by the three methods (Fig. 5b and c), which yield analyses with reduced total mass on the ground.

The EnKF analysis shows a very noisy spatial distribution with large oscillations and negative values in some regions, leading to artificial spatial structures. On the other hand, the GNC shows smoother deposit thickness contours with a spatial distribution having a more physically plausible structure. The GIG method represents an intermediate case between the EnKF and GNC methods. Although this method gives unrealistic results as well, the fraction of negative values and the amplitude of oscillations are noticeably reduced compared to the EnKF method (negative data were remove and reassigned to zero).

3.6 Validation

In order to evaluate the performance of the analysis schemes, the full dataset of observations was split into two subsets: an assimilation dataset and a validation dataset. The assimilation dataset was used to produce new analyses, and the validation metrics defined in Sect. 3.2 were computed for the validation dataset. This methodology ensures that validation is done against non-assimilated observations. However, the validation metrics will be meaningless if the assimilation and validation datasets are strongly correlated (e.g. if measurement sites are very close to each other). The splitting procedure aims to reduce the correlation between the two subsets and is described in the Supplement.

Figure 6Comparison between analyses and observations for the forecast (a, e), EnKF (b, f), GNC (c, g), and GIG (d, h) methods. Panels in the left column (a–d) present the assimilation dataset (60 % of the full dataset), and the validation dataset (40 % of the full dataset) is shown in the right column (e–h).


Figure 6 compares the analysis results at each sampling site with observations from the assimilation dataset (Fig. 6a–d) and the validation dataset (Fig. 6e–h) considering a dataset partition of 60 % and 40 %, respectively. The forecast systematically overestimates observations (Fig. 6a and e), as already noted in Sect. 3.5. Since measurements span a range of 4 orders of magnitude between 10−2 and 102cm, good agreement for the entire range of data values turns out to be challenging. Nevertheless, results for the analyses corresponding to EnKF (Fig. 6b and f), GNC (Fig. 6c and g), and GIG (Fig. 6d and h) methods show that most of the data lie within the 1-to-10 ratio band (black dashed lines).

Figure 7Evaluation metrics for the prior ensemble mean (forecast) and the EnKF, GNC, and GIG analysis schemes. The metrics were computed from the assimilation and validation datasets considering a partition of the full dataset of 60 % and 40 %, respectively.


In order to quantify deviations from observations according to the prior ensemble mean (forecast) and the analysis schemes, the evaluation metrics computed for the assimilation dataset (60 %) and the validation dataset (40 %) are reported in Fig. 7. Since the GIG method is stochastic, six realisations of the analysis were computed, and the average over the realisations are reported for each metric. The EnKF, GNC, and GIG methods improve all metrics compared to the forecast. In particular, the RMSE computed from the assimilation dataset (Fig. 7a, green bar) is reduced from 7.1 (forecast) to 0.9 (EnKF) and 1.2 (GNC, GIG). Since the EnKF provides the best estimate in terms of RMSE, this is an expected result. However, when the RMSE is computed for the validation dataset (Fig. 7a, brown bar), the performance of EnKF (2.3) and GIG (5.2) methods degrades significantly, and the best performance is achieved by the GNC method (1.1). Note that this result is very close to the ideal value RMSE = 1, meaning that deviations are similar to the observation uncertainty (see Eq. 24b).

The bias is presented in Fig. 7b. The GNC MBE (0.3) is positive, meaning that measurements are underestimated, but this bias is within the observation uncertainty range since MBE < 1. In contrast, the analysis overestimates observations according to EnKF (MBE =0.5) and GIG (MBE =1.5) methods. The best SMAPE results were obtained for GNC (SMAPE = 31.8 %) and EnKF (SMAPE = 32.6 %) as shown in Fig. 7c. Finally, Fig. 7d presents the 1-to-3 ratio band score, defined as the percentage of data points within the 1-to-3 ratio band (blue dashed lines in Fig. 6). Again, the best result was obtained for the GNC method (84.1 %).

In conclusion, the GNC method outperforms the EnKF and GIG methods in term of all metrics computed for the validation dataset. In contrast, the EnKF and GIG analyses perform well over regions around the observation sites, but the analyses cannot fully capture all deposit features beyond these regions. In order to illustrate this point, the uncorrected EnKF analysis (i.e. negative data were not removed) and the location of the assimilated observations (60 % of the full dataset) are presented in Fig. 8, where colour-shaded regions represent positive data and negative data are masked. Large oscillations emerge in regions without observation data, leading to artificial structures in the deposit field. This example clearly highlights the problems arising when the EnKF is applied to this case and the importance of dealing with the assimilation of volcanic deposit data using alternative approaches.

Figure 8Tephra fall deposit according to the EnKF method and location of the assimilated observations (60 % of the full dataset). Colour-shaded regions represent positive data, and negative data are masked in order to illustrate the emergence of large oscillations and unphysical values in regions with scarce observational data.

To conclude this section, different partitions of the observational dataset are considered, and the RMSE is computed for the validation dataset. Results are shown in Fig. 9, expressed in terms of the percentage of assimilated observations for each partition. Again, the results show that the GNC outperforms the EnKF and GIG methods systematically, not just for a particular choice of the validation dataset. Unfortunately, it is not possible to conclude that the GIG method has improved the results of the EnKF analysis from the evidence presented in this paper.

Figure 9RMSE computed for the validation dataset for different partitions of the full dataset of observations expressed in terms of the percentage of assimilated observations.


Figure 10Profiles of emission rate and time series of eruption source parameters (ESPs) for the 2015 Calbuco eruption according to the GNC inverse modelling approach. The solid line represents the cumulative erupted volume (km3), and the dash-dotted line indicates the mass emission rate (kg s−1).


4 Source term inversion: application to the 2015 Calbuco eruption

A major advantage of the GNC method is that it allows estimating the eruption source parameters (ESPs) in a straightforward way, with inverse modelling coming at no extra computational cost. This is because FALL3D solves an almost linear problem with weak non-linearity effects (e.g. due to gravity current, wet deposition, or aggregation), and consequently, a rescaling of the emission source term si associated with the ith ensemble member according to siwisi leads to a deposit mass loading rescaled correspondingly, with wi being the weight factor provided by the GNC method. In this case, the best estimation of the total source term is given by

(27) s a = i w i s i ,

where {si} represents the emission source terms of the prior ensemble members (in kgm-3s-1). Figure 10 shows the emission rate profiles resulting from the source inversion, expressed in terms of the linear source emission strength (in kgm-1s-1), i.e. sa×dA, where dA is the area of the grid cell.

As most of the 256 weight factors converge to zero, wi→0, the profiles in Fig. 10 reflect only those ensemble members that effectively contribute to the analysed deposit mass loading. According to the GNC inverse modelling results, each eruptive phase is characterised by different vertical mass distribution and emission rates. The first phase results in higher cloud-top heights, reaching almost 20 kma.s.l., whereas the column heights during the second phase remain at around 16 kma.s.l. Also note that the prior ensemble was defined assuming the same emission source and sampling parameters for both eruptive phases. For instance, the ensemble configuration was defined assuming the same reference value for column heights, i.e. 15 km above vent level for both phases (Table 3) or approximately 17 kma.s.l. Therefore, the resulting asymmetry between the two pulses observed in Fig. 10 is a consequence of the GNC inversion procedure. In other words, the GNC method can discard inappropriate ensemble members and pick out those that are consistent with observations.

The time series for mass eruption rate and total erupted volume are also depicted in Fig. 10. Although the maximum emission rate is reached during the more intense first phase, most of the total erupted mass stems from the longer second phase. In order to estimate the total erupted volume, a bulk density of ρb= 800 kg m−3 was assumed (a typical value for fresh deposits). In particular, the final total erupted volume was around 0.25 km3. According to the inversion, the erupted volumes corresponding to the first and second phases were 0.078 km3 (31.6 %) and 0.169 km3 (68.4 %), respectively. These results are in good agreement with the estimations reported by Romero et al. (2016), which give a total bulk tephra fall deposit volume of 0.27 ± 0.007 km3 with 62 % of the total volume corresponding to the second phase.

5 Discussion

Traditional ensemble-based DA methods such as the ensemble Kalman filter (EnKF) are based on the Gaussian hypothesis. However, it is well-known that analyses produced by these methods are suboptimal when either the model state variables or the observation errors are not Gaussian-distributed. Volcanic aerosol concentrations and tephra deposit mass loading are two remarkable examples of non-Gaussian-distributed variables with highly skewed distributions (Mingari et al.2022b). This explains why the application of EnKF-like methods in VATD models often leads to non-physical results and oscillations (e.g. the occurrence of negative concentrations). The ensemble-based GNC method introduced in this paper and the GIG method (Bishop2016) represent potential alternatives for dealing with the assimilation of volcanic data. In addition, the proposed methodologies could be beneficial beyond volcanic tephra, for example in situations involving non-negative variables with right-skewed probability distributions, such as water-vapour mixing ratio, rainfall, or aerosol concentrations. The GNC and GIG methods differ in the assumptions made about the prior distribution and the likelihood of the observation conditioned on the true state. Both methods have been applied to the assimilation of volcanic deposit data, and the results were compared to the traditional EnKF analysis by means of different evaluation metrics. The GNC method produced physically plausible results and significantly outperformed the other methods and the prior ensemble mean. Unfortunately, it was not possible to conclude that the GIG method improved the EnKF analysis from the results presented here.

The GNC method assumes a multi-dimensional Gaussian distribution and solves an optimisation problem with non-negative constraints to ensure plausible physical solutions. The GNC method, constrained here to assimilate deposit observations, can be easily extended to other observables as long as the observation operator is linear. For example, VATD models could use it to assimilate column mass observations of volcanic aerosols, but the assimilation of other satellite-retrieved variables (e.g. aerosol optical depth) would require an alternative approach. The solution obtained through the minimisation process in Eq. (17) converges to an analysis state which, by construction, improves the prior ensemble mean and any individual ensemble member since the linear estimator Eq. (11) includes, among the possible solutions, the prior ensemble mean (wi=1/m) and any specific ensemble member (e.g. wi=δij is solution for the jth member). Furthermore, the GNC method ensures that the prior RMSE is reduced by the analysis state. This can be checked from Eq. (10) by noting that the weighted RMSE of the prior ensemble mean is the initial value of the normalised cost function (defined as J/p), provided that the iterative solving procedure is started from the uniform vector with components wi=1/m and the matrix R is diagonal. Figure 7a illustrates this property of the solution; e.g. the RMSE computed for the validation dataset was reduced from RMSE = 8.9 (prior ensemble mean) to RMSE = 1.1 (GNC). In contrast, the analysis RMSE is not ensured to be less than the RMSE associated with individual ensemble members. This is a desirable property of the solution for statistically non-significant members. In fact, note that the first term in Eq. (10) penalises deviations from the ensemble mean (i.e. statistically non-significant members), while the second term penalises deviations from the observations. As a result, the solution provided by the GNC method satisfies two properties: (i) the analysis is statistically significant, and (ii) deviations from the observations are small.

The GIG method is a sequential assimilation procedure proposed by Bishop (2016), in which single observations are sequentially assimilated. The GIG method is based on the GIG equation set for the special case in which the prior distribution can be described by a gamma PDF and the observation likelihood by an inverse-gamma PDF. This is a stochastic method providing an ensemble of analyses and does not require a linear observation operator in principle. These reasons make the GIG method a good candidate for implementation in VATD models as it would allow performing multiple assimilation cycles by restarting the model from the analysis ensembles. The GIG method enables near-zero semi-positive-definite variables with highly skewed uncertainty distributions to be assimilated and avoids the occurrence of negative mass loading at the observation site. However, the observation information is propagated to the extended model–observation state vector using the linear regression approximation in Eq. (22). This approach introduces artificial structures in the spatial distribution of the deposit beyond the observation sites, including a few negative values, and the validation metrics degrade significantly over regions with scarce observational data assimilated. As a conclusion, the linear regression approximation should be reformulated or corrected, e.g. exploring localisation techniques, in order to enhance the quality of the deposit reconstruction using the GIG method.

6 Conclusions

This paper has proposed two ensemble-based data assimilation methods for semi-positive-definite variables. The methods were applied to reconstruct the tephra fallout deposit of the 2015 Calbuco eruption in Chile by assimilating measurements of deposit thickness. An assessment based on an independent validation dataset was carried out for the GNC and GIG methods in terms of different evaluation metrics, and the results were compared to two references: the ensemble prior mean and the EnKF analysis.

The evidence from this study suggests that the GNC method was the most skilful approach and represents a promising alternative for assimilation of volcanic fallout data. The GNC method provides an ensemble of weight factors and can also be used for source term inversion in a straightforward way. Unlike the majority of source term inversion methods (e.g. Folch and Mingari2023), which focus on determining specific ESPs associated with oversimplified parameterisations of the source term, this approach reconstructs the overall space–time distribution of the source and it is not constrained by any specific parameterisation of the emission source term.

On the other hand, although it is an interesting approach, the GIG method failed to improve the EnKF analysis. Evidently, the linear regression used by the GIG method needs to be reformulated or corrected. The GIG method is a second-order method and provides an ensemble of analyses without the linear observation operator assumption. Consequently, it represents an attractive alternative for assimilation of volcanic aerosol observations from satellite retrievals. To this purpose, the analysis ensemble from the GIG method could be used to perform multiple assimilation cycles by restarting an ensemble forecast. This approach has the potential to improve the accuracy of operational forecasts of volcanic clouds. In its present form, the GNC method is not suitable for data assimilation of volcanic aerosol observations in the context of operational forecasting as it does not provide an analysis ensemble. To achieve this, future studies should focus on extending the method in order to formulate a second-order analysis scheme based on the GNC method.

Appendix A: List of symbols using the following convention: matrices in uppercase bold, vectors in lowercase bold, scalars in italics
Symbol Description
General definitions
m ensemble size
n dimension of model state vector
p number of observations
x∈ℝn Model state vector
yo∈ℝp Observations vector
y∈ℝp Model state vector in the observation space
HRp×n Observation operator
GNC method
PRp×p Model covariance matrix
RRp×p Observation error covariance matrix
w∈ℝm Vector of weight factors
yRp Average model state vector (obs. space)
YRp×m Ensemble model state matrix (obs. space)
YRp×m Ensemble perturbation matrix (
GIG method
yj jth component of y
yjo jth component of yo
yjf Mean of prior distribution of yj
yja Mean of analysis distribution of yj(Eq. 19a)
Pjr Type 1 relative error variance of prior
Πjr Type 1 relative error variance of analysis(Eq. 19b)
Rjr Type 1 relative error variance ofobservation
P̃jr Type 2 relative error variance of prior
R̃jr Type 2 relative error variance ofobservation
 Here, yjt is the true value of the jth observation.
Code and data availability

FALL3D-8.2.0 is available under version 3 of the GNU General Public License (GPL) at (Folch et al.2022a; Folch et al.2020; Prata et al.2021). Observational datasets, code used for the GNC and GIG methods, and the input model parameter file along with the pre- and post-processing scripts have been archived on Zenodo at (Mingari et al.2022a).


The supplement related to this article is available online at:

Author contributions

Conceptualisation: LM, AC. Methodology: LM. Software: LM, AF, GM. Resources: LM. Writing – original draft: LM. Writing – review and editing: LM, AC, AF, GM. Visualisation: LM. Supervision: AC, AF. Funding acquisition: AF. All authors have read and approved the final version of the paper.

Competing interests

The contact author has declared that none of the authors has any competing interests.


Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


We thank Alexa Van Eaton from USGS for providing us with the assimilation dataset and the digitalised isopach contours. We are also grateful to Florencia Reckziegel for kindly providing us with her dataset. We acknowledge the Partnership for Advanced Computing in Europe (PRACE) for granting us access to the Joliot–Curie supercomputer at the CEA's Very Large Computing Center (TGCC, France). Finally, we acknowledge the constructive reviews from Matthieu Plu and two anonymous reviewers.

Financial support

This work has been partially funded by the H2020 Center of Excellence for Exascale in Solid Earth (ChEESE) (grant no. 823844) and by the European Union's Horizon Europe Research and Innovation Programme (DT-GEO (grant no. 101058129)). The research leading to these results has received funding from EuroHPC (ChEESE-2P (grant no. 101093038)).

Review statement

This paper was edited by Yuefei Zeng and reviewed by Matthieu Plu and two anonymous referees.


Amezcua, J. and Leeuwen, P. J. V.: Gaussian anamorphosis in the analysis step of the EnKF: a joint state-variable/observation approach, Tellus A, 66, 23493,, 2014. a

Anderson, J. L.: A Local Least Squares Framework for Ensemble Filtering, Mon. Weather Rev., 131, 634–642,<0634:ALLSFF>2.0.CO;2, 2003. a, b

Beckett, F. M., Witham, C. S., Leadbetter, S. J., Crocker, R., Webster, H. N., Hort, M. C., Jones, A. R., Devenish, B. J., and Thomson, D. J.: Atmospheric Dispersion Modelling at the London VAAC: A Review of Developments since the 2010 Eyjafjallajökull Volcano Ash Cloud, Atmosphere, 11, 352,, 2020. a

Bishop, C. H.: The GIGG-EnKF: ensemble Kalman filtering for highly skewed non-negative uncertainty distributions, Q. J. Roy. Meteor. Soc., 142, 1395–1412,, 2016. a, b, c, d, e, f, g, h, i

Bonadonna, C. and Costa, A.: Plume height, volume, and classification of volcanic explosive eruptions based on the Weibull function, B. Volcanol., 75, 1–19,, 2013. a

Bonadonna, C., Ernst, G., and Sparks, R.: Thickness variations and volume estimates of tephra fall deposits: the importance of particle Reynolds number, J. Volcanol. Geoth. Res., 81, 173–187,, 1998. a

Bonadonna, C., Folch, A., Loughlin, S., and Puempel, H.: Future developments in modelling and monitoring of volcanic ash clouds: outcomes from the first IAVCEI-WMO workshop on Ash Dispersal Forecast and Civil Aviation, B. Volcanol., 74, 1–10, 2012. a

Bonadonna, C., Biass, S., and Costa, A.: Physical characterization of explosive volcanic eruptions based on tephra deposits: Propagation of uncertainties and sensitivity analysis, J. Volcanol. Geoth. Res., 296, 80–100,, 2015. a

Burgers, G., Van Leeuwen, P. J., and Evensen, G.: Analysis scheme in the ensemble Kalman filter, Mon. Weather Rev., 126, 1719–1724, 1998. a, b, c

Carrassi, A., Bocquet, M., Bertino, L., and Evensen, G.: Data assimilation in the geosciences: An overview of methods, issues, and perspectives, WIREs Clim. Change, 9, e535,, 2018. a, b

Clarkson, R. J., Majewicz, E. J., and Mack, P.: A re-evaluation of the 2010 quantitative understanding of the effects volcanic ash has on gas turbine engines, P. I. Mech. Eng. G-J. Aer., 230, 2274–2291,, 2016. a

Constantinescu, R., White, J. T., Connor, C. B., Hopulele-Gligor, A., Charbonnier, S., Thouret, J.-C., Lindsay, J. M., and Bertin, D.: Uncertainty Quantification of Eruption Source Parameters Estimated From Tephra Fall Deposits, Geophys. Res. Lett., 49, e2021GL097425,, 2022. a

Cornell, W., Carey, S., and Sigurdsson, H.: Computer simulation and transport of the Campanian Y-5 ash, J. Volcanol. Geoth. Res., 17, 89–109,, 1983. a

Costa, A., Folch, A., and Macedonio, G.: A model for wet aggregation of ash particles in volcanic plumes and clouds: I. Theoretical formulation, J. Geophys. Res.-Sol. Ea., 115, B09201,, 2010. a

Costa, A., Pioli, L., and Bonadonna, C.: Assessing tephra total grain-size distribution: Insights from field data analysis, Earth Planet. Sc. Lett., 443, 90–107,, 2016. a

Dacre, H. F. and Harvey, N. J.: Characterizing the Atmospheric Conditions Leading to Large Error Growth in Volcanic Ash Cloud Forecasts, J. Appl. Meteorol. Clim., 57, 1011–1019,, 2018. a

Dare, R. A., Smith, D. H., and Naughton, M. J.: Ensemble Prediction of the Dispersion of Volcanic Ash from the 13 February 2014 Eruption of Kelut, Indonesia, J. Appl. Meteorol. Clim., 55, 61–78,, 2016. a

Degruyter, W. and Bonadonna, C.: Improving on mass flow rate estimates of volcanic eruptions, Geophys. Res. Lett., 39, L16308,, 2012. a, b

Denlinger, R. P., Pavolonis, M., and Sieglaff, J.: A robust method to forecast volcanic ash clouds, J. Geophys. Res.-Atmos., 117, D13208,, 2012. a

Dominguez, L., Bonadonna, C., Forte, P., Jarvis, P. A., Cioni, R., Mingari, L., Bran, D., and Panebianco, J. E.: Aeolian remobilisation of the 2011-Cordón Caulle Tephra-Fallout Deposit: example of an important process in the life cycle of Volcanic Ash, Front. Earth Sci., 7, 343,, 2020. a

Durant, A. J., Rose, W. I., Sarna-Wojcicki, A. M., Carey, S., and Volentik, A. C. M.: Hydrometeor-enhanced tephra sedimentation: Constraints from the 18 May 1980 eruption of Mount St. Helens, J. Geophys. Res.-Sol. Ea., 114, B03204,, 2009. a

Evensen, G.: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics, J. Geophys. Res.-Oceans, 99, 10143–10162,, 1994. a, b

Folch, A. and Mingari, L.: Data Assimilation of Volcanic Clouds: Recent Advances and Implications on Operational Forecasts, in: Applications of Data Assimilation and Inverse Problems in the Earth Sciences (Special Publications of the International Union of Geodesy and Geophysics, edited by: Ismail-Zadeh, A., Castelli, F., Jones, D., and Sanchez, S., 144–156, Cambridge University Press, Cambridge,, 2023. a, b, c

Folch, A., Costa, A., Durant, A., and Macedonio, G.: A model for wet aggregation of ash particles in volcanic plumes and clouds: II. Model application, J. Geophys. Res.-Sol. Ea., 115, B09202,, 2010. a

Folch, A., Mingari, L., Osores, M. S., and Collini, E.: Modeling volcanic ash resuspension – application to the 14–18 October 2011 outbreak episode in central Patagonia, Argentina, Nat. Hazards Earth Syst. Sci., 14, 119–133,, 2014. a

Folch, A., Mingari, L., Gutierrez, N., Hanzich, M., Macedonio, G., and Costa, A.: FALL3D-8.0: a computational model for atmospheric transport and deposition of particles, aerosols and radionuclides – Part 1: Model physics and numerics, Geosci. Model Dev., 13, 1431–1458,, 2020. a, b

Folch, A., Costa, A., Macedonio, G. and Mingari, L.: FALL3D (8.2.0), Zenodo [code],, 2022a. a

Folch, A., Mingari, L., and Prata, A. T.: Ensemble-Based Forecast of Volcanic Clouds Using FALL3D-8.1, Front. Earth Sci., 9, 741841,, 2022b. a, b

Folch, A., Abril, C., Afanasiev, M., Amati, G., Bader, M., Badia, R. M., Bayraktar, H. B., Barsotti, S., Basili, R., Bernardi, F., Boehm, C., Brizuela, B., Brogi, F., Cabrera, E., Casarotti, E., Castro, M. J., Cerminara, M., Cirella, A., Cheptsov, A., Conejero, J., Costa, A., de la Asunción, M., de la Puente, J., Djuric, M., Dorozhinskii, R., Espinosa, G., Esposti-Ongaro, T., Farnós, J., Favretto-Cristini, N., Fichtner, A., Fournier, A., Gabriel, A.-A., Gallard, J.-M., Gibbons, S. J., Glimsdal, S., González-Vida, J. M., Gracia, J., Gregorio, R., Gutierrez, N., Halldorsson, B., Hamitou, O., Houzeaux, G., Jaure, S., Kessar, M., Krenz, L.,Krischer, L., Laforet, S., Lanucara, P., Li, B., Lorenzino, M. C., Lorito, S., Løvholt, F., Macedonio, G., Macías, J., Marín, G., Martínez Montesinos, B., Mingari, L., Moguilny, G., Montellier, V., Monterrubio-Velasco, M., Moulard, G. E., Nagaso, M., Nazaria, M., Niethammer, C., Pardini, F., Pienkowska, M., Pizzimenti, L., Poiata, N., Rannabauer, L., Rojas, O., Rodriguez, J. E., Romano, F., Rudyy, O., Ruggiero, V., Samfass, P., Sánchez-Linares, C., Sanchez, S., Sandri, L., Scala, A., Schaeffer, N., Schuchart, J., Selva, J., Sergeant, A., Stallone, A., Taroni, M., Thrastarson, S., Titos, M., Tonelllo, N., Tonini, R., Ulrich, T., Vilotte, J.-P., Vöge, M., Volpe, M., Wirp, S. A., and Wössner, U.: The EU Center of Excellence for Exascale in Solid Earth (ChEESE): Implementation, results, and roadmap for the second phase, Future Gener. Comp. Sy., 146, 47–61,, 2023. a

Fu, G., Heemink, A., Lu, S., Segers, A., Weber, K., and Lin, H.-X.: Model-based aviation advice on distal volcanic ash clouds by assimilating aircraft in situ measurements, Atmos. Chem. Phys., 16, 9189–9200,, 2016. a

Fu, G., Prata, F., Lin, H. X., Heemink, A., Segers, A., and Lu, S.: Data assimilation for volcanic ash plumes using a satellite observational operator: a case study on the 2010 Eyjafjallajökull volcanic eruption, Atmos. Chem. Phys., 17, 1187–1205,, 2017. a

Harvey, N. J., Dacre, H. F., Webster, H. N., Taylor, I. A., Khanal, S., Grainger, R. G., and Cooke, M. C.: The impact of ensemble meteorology on inverse modeling estimates of volcano emissions and ash dispersion forecasts: Grímsvötn 2011, Atmosphere, 11, 1022,, 2020. a

Husak, G. J., Michaelsen, J., and Funk, C.: Use of the gamma distribution to represent monthly rainfall in Africa for drought monitoring applications, Int. J. Climatol., 27, 935–944,, 2007. a

Jazwinski, A. H.: Stochastic processes and filtering theory, Academic Press, Inc., New York, ISBN 9780080960906, 1970. a

Kalman, R. E.: A New Approach to Linear Filtering and Prediction Problems, J. Basic Eng.-T. ASME, 82, 35–45,, 1960. a, b

Kliewer, A. J., Fletcher, S. J., Jones, A. S., and Forsythe, J. M.: Comparison of Gaussian, logarithmic transform and mixed Gaussian–log-normal distribution based 1DVAR microwave temperature–water-vapour mixing ratio retrievals, Q.+J. Roy. Meteor. Soc., 142, 274–286,, 2016. a

Madankan, R., Pouget, S., Singla, P., Bursik, M., Dehn, J., Jones, M., Patra, A., Pavolonis, M., Pitman, E., Singh, T., and Webley, P.: Computation of probabilistic hazard maps and source parameter estimation for volcanic ash transport and dispersion, J. Comput. Phys., 271, 39–59,, 2014. a

Martí, A., Folch, A., Costa, A., and Engwell, S.: Reconstructing the plinian and co-ignimbrite sources of large volcanic eruptions: a novel approach for the Campanian Ignimbrite, Sci. Rep., 6, 1–11,, 2016. a, b

Martinez, B. M., Luzón, M. T., Sandri, L., Rudyy, O., Cheptsov, A., Macedonio, G., Folch, A., Barsotti, S., Selva, J., and Costa, A.: On the feasibility and usefulness of high-performance computing in probabilistic volcanic hazard assessment: An application to tephra hazard from Campi Flegrei, Front. Earth Sci., 10, 941789,, 2022. a

McKay, M. D., Beckman, R. J., and Conover, W. J.: A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code, Technometrics, 21, 239–245, 1979. a

Mingari, L., Folch, A., Dominguez, L., and Bonadonna, C.: Volcanic Ash Resuspension in Patagonia: Numerical Simulations and Observations, Atmosphere, 11, 977,, 2020. a

Mingari, L., Costa, A., Macedonio, G., and Folch, A.: gmd-2022-246 (2.0), Zenodo [data set],, 2022a. a

Mingari, L., Folch, A., Prata, A. T., Pardini, F., Macedonio, G., and Costa, A.: Data assimilation of volcanic aerosol observations using FALL3D+PDAF, Atmos. Chem. Phys., 22, 1773–1792,, 2022b. a, b, c

O'Neill, N. T., Ignatov, A., Holben, B. N., and Eck, T. F.: The lognormal distribution as a reference for reporting aerosol optical depth statistics; Empirical tests using multi-year, multi-site AERONET Sunphotometer data, Geophys. Res. Lett., 27, 3333–3336,, 2000. a

Osores, S., Ruiz, J., Folch, A., and Collini, E.: Volcanic ash forecast using ensemble-based data assimilation: an ensemble transform Kalman filter coupled with the FALL3D-7.2 model (ETKF–FALL3D version 1.0), Geosci. Model Dev., 13, 1–22,, 2020. a

Pardini, F., Corradini, S., Costa, A., Esposti Ongaro, T., Merucci, L., Neri, A., Stelitano, D., de' Michieli Vitturi, M.: Ensemble-Based Data Assimilation of Volcanic Ash Clouds from Satellite Observations: Application to the 24 December 2018 Mt. Etna Explosive Eruption, Atmosphere, 11, 359,, 2020. a

Pelley, R., Cooke, M., Manning, A., Thomson, D., Witham, C., and Hort, M.: Initial implementation of an inversion technique for estimating volcanic ash source parameters in near real time using satellite retrievals, Tech. rep., Forecasting Research Technical Report, Met Office, Exeter, 2015. a

Pfeiffer, T., Costa, A., and Macedonio, G.: A model for the numerical simulation of tephra fall deposits, J. Volcanol. Geoth. Res., 140, 273–294,, 2005. a, b, c

Posselt, D. J. and Bishop, C. H.: Nonlinear data assimilation for clouds and precipitation using a gamma inverse-gamma ensemble filter, Q. J. Roy. Meteor. Soc., 144, 2331–2349,, 2018. a

Prata, A. T., Mingari, L., Folch, A., Macedonio, G., and Costa, A.: FALL3D-8.0: a computational model for atmospheric transport and deposition of particles, aerosols and radionuclides – Part 2: Model validation, Geosci. Model Dev., 14, 409–436,, 2021. a, b

Pyle, D. M.: The thickness, volume and grainsize of tephra fall deposits, B. Volcanol., 51, 1–15, 1989. a

Romero, J., Morgavi, D., Arzilli, F., Daga, R., Caselli, A., Reckziegel, F., Viramonte, J., Díaz-Alvarado, J., Polacci, M., Burton, M., and Perugini, D.: Eruption dynamics of the 22–23 April 2015 Calbuco Volcano (Southern Chile): Analyses of tephra fall deposits, J. Volcanol. Geoth. Res., 317, 15–29,, 2016. a, b, c, d, e, f, g, h

Sandri, L., Costa, A., Selva, J., Tonini, R., Macedonio, G., Folch, A., and Sulpizio, R.: Beyond eruptive scenarios: Assessing tephra fallout hazard from Neapolitan volcanoes, Sci. Rep., 6, 1–13,, 2016. a

Scasso, R. A., Corbella, H., and Tiberi, P.: Sedimentological analysis of the tephra from the 12–15 August 1991 eruption of Hudson volcano, B. Volcanol., 56, 121–132, 1994. a

Sha, F., Lin, Y., Saul, L. K., and Lee, D. D.: Multiplicative Updates for Nonnegative Quadratic Programming, Neural Comput., 19, 2004–2031,, 2007. a, b

Stefanescu, E. R., Patra, A. K., Bursik, M. I., Madankan, R., Pouget, S., Jones, M., Singla, P., Singh, T., Pitman, E. B., Pavolonis, M., Morton, D., Webley, P., and Dehn, J.: Temporal, probabilistic mapping of ash clouds using wind field stochastic variability and uncertain eruption source parameters: Example of the 14 April 2010 Eyjafjallajökull eruption, J. Adv. Model. Earth Sy., 6, 1173–1184,, 2014. a

Suzuki, T.: A theoretical model for dispersion of tephra, in: Arc Volcanism: Physics and Tectonics, edited by: Shimozuru, D. and Yokoyama, I., Terra Scientific Publishing Company (TERRAPUB), Tokyo, 93–113, ISBN 90-277-1612-9, 1983. a

Van Eaton, A. R., Amigo, A., Bertin, D., Mastin, L. G., Giacosa, R. E., González, J., Valderrama, O., Fontijn, K., and Behnke, S. A.: Volcanic lightning and plume behavior reveal evolving hazards during the April 2015 eruption of Calbuco volcano, Chile, Geophys. Res. Lett., 43, 3563–3571,, 2016.  a, b, c, d, e

Vogel, H., Förstner, J., Vogel, B., Hanisch, T., Mühr, B., Schättler, U., and Schad, T.: Time-lagged ensemble simulations of the dispersion of the Eyjafjallajökull plume over Europe with COSMO-ART, Atmos. Chem. Phys., 14, 7837–7845,, 2014. a

Wilson, G., Wilson, T., Deligne, N., and Cole, J.: Volcanic hazard impacts to critical infrastructure: A review, J. Volcanol. Geoth. Res., 286, 148–182,, 2014. a

Zidikheri, M. J. and Lucas, C.: A Computationally Efficient Ensemble Filtering Scheme for Quantitative Volcanic Ash Forecasts, J. Geophys. Res.-Atmos., 126, e2020JD033094,, 2021a. a

Zidikheri, M. J. and Lucas, C.: Improving Ensemble Volcanic Ash Forecasts by Direct Insertion of Satellite Data and Ensemble Filtering, Atmosphere, 12, 1215,, 2021b. a

Zidikheri, M. J., Lucas, C., and Potts, R. J.: Estimation of optimal dispersion model source parameters using satellite detections of volcanic ash, J. Geophys. Res.-Atmos., 122, 8207–8232,, 2017. a

Zidikheri, M. J., Lucas, C., and Potts, R. J.: Quantitative Verification and Calibration of Volcanic Ash Ensemble Forecasts Using Satellite Data, J. Geophys. Res.-Atmos., 123, 4135–4156,, 2018. a

Short summary
Two novel techniques for ensemble-based data assimilation, suitable for semi-positive-definite variables with highly skewed uncertainty distributions such as tephra deposit mass loading, are applied to reconstruct the tephra fallout deposit resulting from the 2015 Calbuco eruption in Chile. The deposit spatial distribution and the ashfall volume according to the analyses are in good agreement with estimations based on field measurements and isopach maps reported in previous studies.