In recent years, there has been a growing interest in ensemble approaches for modelling the atmospheric transport of volcanic aerosol, ash, and lapilli (tephra). The development of such techniques enables the exploration of novel methods for incorporating real observations into tephra dispersal models. However, traditional data assimilation algorithms, including ensemble Kalman filter (EnKF) methods, can yield suboptimal state estimates for positive-definite variables such as those related to volcanic aerosols and tephra deposits. This study proposes two new ensemble-based data assimilation techniques for semi-positive-definite variables with highly skewed uncertainty distributions, including aerosol concentrations and tephra deposit mass loading: the Gaussian with non-negative constraints (GNC) and gamma inverse-gamma (GIG) methods. The proposed methods are applied to reconstruct the tephra fallout deposit resulting from the 2015 Calbuco eruption using an ensemble of 256 runs performed with the FALL3D dispersal model. An assessment of the methodologies is conducted considering two independent datasets of deposit thickness measurements: an assimilation dataset and a validation dataset. Different evaluation metrics (e.g. RMSE, MBE, and SMAPE) are computed for the validation dataset, and the results are compared to two references: the ensemble prior mean and the EnKF analysis. Results show that the assimilation leads to a significant improvement over the first-guess results obtained from the simple ensemble forecast. The evidence from this study suggests that the GNC method was the most skilful approach and represents a promising alternative for assimilation of volcanic fallout data. The spatial distributions of the tephra fallout deposit thickness and volume according to the GNC analysis are in good agreement with estimations based on field measurements and isopach maps reported in previous studies. On the other hand, although it is an interesting approach, the GIG method failed to improve the EnKF analysis.

Multiple hazards are associated with volcanic eruptions including lava flows, pyroclastic density currents, lahars, volcanic plumes, and tephra
fallout. Specifically, the dispersal of volcanic plumes poses a serious threat to flight safety

The characterisation and quantification of past eruptive events are also of paramount importance for volcano hazard and risk assessment studies, which
infer the likelihood of future eruption scenarios based on past volcano behaviour. Explosive volcanic eruptions are often characterised and
classified by means of tephra deposits

In contrast, physics-based approaches, built upon volcanic ash transport and dispersal (VATD) models, include multiple physical parameterisations and
are a much more powerful tool for representing the real distribution of tephra deposits. However, the accuracy of deterministic models is highly
sensitive to uncertain model input parameters (e.g. eruption column height or physical properties of particles) and the underlying meteorological
fields. Alternatively, probabilistic modelling approaches provide a framework to incorporate uncertainties associated with model input
data. Specifically, ensemble-based modelling strategies allow one to characterise and quantify model uncertainties and have been proven to enhance
VATD model skills

The incorporation of ensemble capabilities in VATD models lays the foundation for developing and implementing ensemble-based data assimilation and
inversion techniques

This study explores two new ensemble-based data assimilation techniques for positive-definite variables and their implementation in VATD models, the
Gaussian with non-negative constraints (GNC) method and the gamma, inverse-gamma, and Gaussian ensemble Kalman filter (GIGG-EnKF), a sequential method
proposed by

This study aims to reconstruct the tephra fall deposit of the 2015 Calbuco eruption from a scattered set of observations. The rich existing dataset
available for this eruption, consisting of deposit samples collected up to 500

The paper is organised as follows. The ensemble-based data assimilation methods are introduced in Sect.

Data assimilation (DA) techniques have been widely used to study and forecast geophysical systems and have been applied in a variety of research and
operational settings

The ensemble Kalman filter (EnKF) is a remarkable example of a sequential data assimilation scheme based on the Kalman filter theory

In this work, the state of the physical system is fully determined by the two-dimensional tephra deposit load (in

The observations vector

According to the analysis scheme in the EnKF, given the prior ensemble and the observations, the deposit can be reconstructed by means of
Eq. (

The Gaussian with non-negative constraints (GNC) method assumes a multi-dimensional Gaussian probability distribution for

The most likely state is the one that maximises the posterior PDF, as in Eq. (

Note that the expression above is actually very similar to the cost function used in classical variational methods

Given a prior ensemble of

Iterative approach to minimise the GNC cost function

According to the strategy proposed by

In order to generate an analysis ensemble

In addition, the ensemble generated in this way is ensured to converge to the true posterior PDF for large ensembles.

The univariate case is extended according to the second-stage linear regression step proposed by

The inverse-gamma PDFs assign non-zero probability densities only for positive observations, and, as a result, zero observations cannot be properly
assimilated using the GIG equation set (see e.g. Eq.

The GIG method is a sequential procedure: a single observation is assimilated in order to update the prior ensemble forecast using the GIG equation
set; subsequently, this procedure is repeated until all observations have been sequentially assimilated. In contrast, the GNC method described in
Sec.

Pseudocode of the GIG method based on the

List of observations

Analysis ensemble

randomly shuffle observation list

Location of the sampling sites corresponding to the dataset reported by

In this section, the procedures described in Sect.

The 2015 eruption of the Calbuco stratovolcano (41.33

The availability of independent and comprehensive datasets of field observations makes the Calbuco tephra deposit an excellent case
study.

Calbuco deposit datasets considered in this study.

It is interesting to note the presence of a secondary thickness maximum

The assimilation methods require a dataset of measurements along with the corresponding absolute or relative errors. Specifically, the GNC method
requires the absolute error

The strategy adopted in this work to provide reasonable error estimates is based on a clustering algorithm, and observation error standard deviations
are assumed to be dependent on the measured value. In summary, observational data are organised into groups with similar characteristics, and an
absolute and relative error is assigned to each group or cluster. The error for the

As validation metrics, we consider the weighted versions of the mean bias error (MBE) and the root mean square error (RMSE) to measure the differences
between observations and analyses, defined as

Notice that if the non-weighted (

Another relative metric used to evaluate the deposit reconstruction is the symmetric mean absolute percentage error (SMAPE), defined as

Numerical simulations were carried out using the latest version release of FALL3D (v8.2), an open-source offline Eulerian model for atmospheric
transport and deposition of aerosols and particles, including tephra species. FALL3D solves the so-called advection–diffusion–sedimentation (ADS)
equation

The configuration of the FALL3D model used in this work is summarised in Table

FALL3D model configuration parameters for the 2015 Calbuco runs.

A 256-member prior ensemble was generated by perturbing the eruption source parameters (ESPs) and the horizontal wind components around a reference
value using either uniform or truncated normal distributions. Table

Ensemble configuration. The perturbed model parameters are eruption column height (

Before showing how the assimilation methods perform, it is worth characterising the prior ensemble distribution and checking whether the assumptions
of the GIG method are fulfilled (i.e. the prior distribution can be approximated by a gamma PDF). To this purpose, the skewness

Skewness as a function of the ratio of the standard deviation to the mean for the prior distribution at the sampling locations (blue circles). Results for some theoretical distributions (Gaussian, log-normal, and gamma) are also shown for comparison.

Histograms of sampled prior distributions along with the corresponding theoretical gamma distribution (solid line) at some selected observation sites.

In order to further understand the similarities between the gamma and prior distributions, Fig.

In this section, we compare the tephra deposit field reconstructed according to four strategies: (i) forecast, i.e. the prior ensemble mean, (ii) the EnKF
method, i.e. the analysis ensemble mean via Eq. (

The results of the tephra fall deposit reconstruction are shown in Fig.

Reconstructed tephra fall deposit according to the forecast

The forecast (Fig.

The EnKF analysis shows a very noisy spatial distribution with large oscillations and negative values in some regions, leading to artificial spatial structures. On the other hand, the GNC shows smoother deposit thickness contours with a spatial distribution having a more physically plausible structure. The GIG method represents an intermediate case between the EnKF and GNC methods. Although this method gives unrealistic results as well, the fraction of negative values and the amplitude of oscillations are noticeably reduced compared to the EnKF method (negative data were remove and reassigned to zero).

In order to evaluate the performance of the analysis schemes, the full dataset of observations was split into two subsets: an assimilation dataset and
a validation dataset. The assimilation dataset was used to produce new analyses, and the validation metrics defined in Sect.

Comparison between analyses and observations for the forecast

Figure

Evaluation metrics for the prior ensemble mean (forecast) and the EnKF, GNC, and GIG analysis schemes. The metrics were computed from the assimilation and validation datasets considering a partition of the full dataset of 60 % and 40 %, respectively.

In order to quantify deviations from observations according to the prior ensemble mean (forecast) and the analysis schemes, the evaluation metrics
computed for the assimilation dataset (60 %) and the validation dataset (40 %) are reported in Fig.

The bias is presented in Fig.

In conclusion, the GNC method outperforms the EnKF and GIG methods in term of all metrics computed for the validation dataset. In contrast, the EnKF
and GIG analyses perform well over regions around the observation sites, but the analyses cannot fully capture all deposit features beyond
these regions. In order to illustrate this point, the uncorrected EnKF analysis (i.e. negative data were not removed) and the location of the
assimilated observations (60 % of the full dataset) are presented in Fig.

Tephra fall deposit according to the EnKF method and location of the assimilated observations (60 % of the full dataset). Colour-shaded regions represent positive data, and negative data are masked in order to illustrate the emergence of large oscillations and unphysical values in regions with scarce observational data.

To conclude this section, different partitions of the observational dataset are considered, and the RMSE is computed for the validation dataset. Results
are shown in Fig.

RMSE computed for the validation dataset for different partitions of the full dataset of observations expressed in terms of the percentage of assimilated observations.

Profiles of emission rate and time series of eruption source parameters (ESPs) for the 2015 Calbuco eruption according to the GNC inverse modelling approach. The solid line represents the cumulative erupted volume (

A major advantage of the GNC method is that it allows estimating the eruption source parameters (ESPs) in a straightforward way, with inverse modelling
coming at no extra computational cost. This is because FALL3D solves an almost linear problem with weak non-linearity effects (e.g. due to gravity
current, wet deposition, or aggregation), and consequently, a rescaling of the emission source term

As most of the 256 weight factors converge to zero,

The time series for mass eruption rate and total erupted volume are also depicted in Fig.

Traditional ensemble-based DA methods such as the ensemble Kalman filter (EnKF) are based on the Gaussian hypothesis. However, it is well-known that
analyses produced by these methods are suboptimal when either the model state variables or the observation errors are not Gaussian-distributed. Volcanic aerosol concentrations and tephra deposit mass loading are two remarkable examples of non-Gaussian-distributed variables with
highly skewed distributions

The GNC method assumes a multi-dimensional Gaussian distribution and solves an optimisation problem with non-negative constraints to ensure plausible
physical solutions. The GNC method, constrained here to assimilate deposit observations, can be easily extended to other observables as long as the
observation operator is linear. For example, VATD models could use it to assimilate column mass observations of volcanic aerosols, but the
assimilation of other satellite-retrieved variables (e.g. aerosol optical depth) would require an alternative approach. The solution obtained
through the minimisation process in Eq. (

The GIG method is a sequential assimilation procedure proposed by

This paper has proposed two ensemble-based data assimilation methods for semi-positive-definite variables. The methods were applied to reconstruct the tephra fallout deposit of the 2015 Calbuco eruption in Chile by assimilating measurements of deposit thickness. An assessment based on an independent validation dataset was carried out for the GNC and GIG methods in terms of different evaluation metrics, and the results were compared to two references: the ensemble prior mean and the EnKF analysis.

The evidence from this study suggests that the GNC method was the most skilful approach and represents a promising alternative for assimilation of
volcanic fallout data. The GNC method provides an ensemble of weight factors and can also be used for source term inversion in a straightforward
way. Unlike the majority of source term inversion methods

On the other hand, although it is an interesting approach, the GIG method failed to improve the EnKF analysis. Evidently, the linear regression used by the GIG method needs to be reformulated or corrected. The GIG method is a second-order method and provides an ensemble of analyses without the linear observation operator assumption. Consequently, it represents an attractive alternative for assimilation of volcanic aerosol observations from satellite retrievals. To this purpose, the analysis ensemble from the GIG method could be used to perform multiple assimilation cycles by restarting an ensemble forecast. This approach has the potential to improve the accuracy of operational forecasts of volcanic clouds. In its present form, the GNC method is not suitable for data assimilation of volcanic aerosol observations in the context of operational forecasting as it does not provide an analysis ensemble. To achieve this, future studies should focus on extending the method in order to formulate a second-order analysis scheme based on the GNC method.

FALL3D-8.2.0 is available under version 3 of the GNU General Public License (GPL) at

The supplement related to this article is available online at:

Conceptualisation: LM, AC. Methodology: LM. Software: LM, AF, GM. Resources: LM. Writing – original draft: LM. Writing – review and editing: LM, AC, AF, GM. Visualisation: LM. Supervision: AC, AF. Funding acquisition: AF. All authors have read and approved the final version of the paper.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank Alexa Van Eaton from USGS for providing us with the assimilation dataset and the digitalised isopach contours. We are also grateful to Florencia Reckziegel for kindly providing us with her dataset. We acknowledge the Partnership for Advanced Computing in Europe (PRACE) for granting us access to the Joliot–Curie supercomputer at the CEA's Very Large Computing Center (TGCC, France). Finally, we acknowledge the constructive reviews from Matthieu Plu and two anonymous reviewers.

This work has been partially funded by the H2020 Center of Excellence for Exascale in Solid Earth (ChEESE) (grant no. 823844) and by the European Union's Horizon Europe Research and Innovation Programme (DT-GEO (grant no. 101058129)). The research leading to these results has received funding from EuroHPC (ChEESE-2P (grant no. 101093038)).

This paper was edited by Yuefei Zeng and reviewed by Matthieu Plu and two anonymous referees.