Estimation of the temporal profile of an atmospheric release, also called the source term, is an important problem in environmental sciences. The problem can be formalized as a linear inverse problem wherein the unknown source term is optimized to minimize the difference between the measurements and the corresponding model predictions. The problem is typically ill-posed due to low sensor coverage of a release and due to uncertainties, e.g., in measurements or atmospheric transport modeling; hence, all state-of-the-art methods are based on some form of regularization of the problem using additional information. We consider two kinds of additional information: the prior source term, also known as the first guess, and regularization parameters for the shape of the source term. While the first guess is based on information independent of the measurements, such as the physics of the potential release or previous estimations, the regularization parameters are often selected by the designers of the optimization procedure. In this paper, we provide a sensitivity study of two inverse methodologies on the choice of the prior source term and regularization parameters of the methods. The sensitivity is studied in two cases: data from the European Tracer Experiment (ETEX) using FLEXPART v8.1 and the caesium-134 and caesium-137 dataset from the Chernobyl accident using FLEXPART v10.3.

The source term describes the spatiotemporal distribution of an atmospheric
release, and it is of great interest in the case of an accidental
atmospheric release. The aim of inverse modeling is to reconstruct
the source term by maximization of agreement between the ambient measurements
and prediction of an atmospheric transport model in a so-called top-down
approach

We assume that the measurements can be explained by a linear model
using the concept of the source–receptor sensitivity (SRS) matrix
calculated from an atmospheric transport model (e.g.,

There are various assumptions on the level of knowledge of the prior
source term used in the inversion procedure. Assumption of the zero
prior source term

The aim of this paper is to explore the sensitivity of linear inversion
methods to the prior source term selection coupled with tuning of
the covariance matrix representing modeling error. We considered the
optimization method proposed by

We are concerned with linear models of atmospheric dispersion using
an SRS matrix

In the rest of this section, we will discuss an influence of the modeling
error and show how existing methods approach compensation of such
error. We will analyze in detail two methods for the source term estimation:
(i) the optimization model proposed by

It is generally assumed that the SRS matrix

The L2 norm

The analysis can be generalized to a quadratic norm with arbitrary
kernel

An common attempt to minimize the influence of the linear term is
to define the prior source term

In the ideal situation, we would like to optimize the left-hand side
of Eqs. (

In the following sections, we will discuss methods that estimate

In

Minimization of Eq. (

The method is summarized as Algorithm 1 and will be
denoted as the optimization method in this study. The maximum number of
iterations is set to 50, which was enough for convergence in all our
experiments. To solve the minimization problem (

In

Specifically, it formulates a hierarchical probabilistic model:

To estimate the parameters of the prior model (

The variational Bayes' solution

The Gibbs sampler is a Markov chain Monte Carlo method for obtaining sequences
of samples from distributions for which direct sampling is difficult
or intractable

All mentioned methods are sensitive to a certain extent to the selection of their parameters. Here, we will identify these tuning parameters and discuss their settings in the following experiments. Moreover, we will discuss the selection of the prior source term.

The optimization approach is summarized in Algorithm 1
wherein two key tuning parameters are needed: parameter

To select the prior source term seems to be an even more difficult problem, especially in cases of releases with limited available information. Therefore, we will investigate various errors in the prior source term, which can be considered thanks to controlled experiments in which the true source term is available. We consider the time shift of the prior source term in contrast with the true source term, different scales, and a blurred version of the true source term. These errors can be examined alone or combined, which will probably be more realistic.

While the tuning parameters selected in the previous section are often
selected manually, statistical methods for their selection are also
available. One of the most popular is cross-validation

The European Tracer Experiment (ETEX) is one of a few large controlled
tracer experiments (see

The uppermost row of panels shows eight different
prior source terms

To calculate the SRS matrices, we used the Lagrangian particle dispersion
model FLEXPART

The tested method will be compared in terms of the mean absolute error
(MAE) between the estimated and the true source term for different
tuning parameters

L-curve-type plots using the optimization algorithm
with

We observe that for all choices of the optimization method, the results
exhibit two notable modes of solution: the data mode for tuning parameters
with minimum impact on the loss function and the prior mode for tuning
parameter values that cause the prior to dominate the loss function.
This is notable for the results in the range of

For

An attempt to find the optimal tuning via the L-curve method (i.e.,
dependence between the norm of the solution and the norm of the residuum
between measurement and reconstruction) is displayed and demonstrated
in two cases: ETEX ERA-40 with

The LS-APC-VB method also exhibits modes of solution; however, the
transition between the data mode and the prior source term mode seems
to be rather fast. Notably, no such transitions are observed in the
case of the LS-APC-G method. This is caused by the fact that the Gibbs
sampling is not sensitive to the selection of the initial state, as
discussed in Sect.

Here, we will discuss the behavior of the methods around the regions
of the tuning parameter with minimum MAE (sweet spots) observed in
the case of the optimization method. Note that no such regions are
observed in the case of the LS-APC-VB and LS-APC-G methods. The temporal
profiles of the estimated source term at different penalization coefficients
selected around two different sweet spots are displayed in Figs.

The uppermost panel shows mean absolute errors
between estimated and true source terms for the ETEX ERA-40 dataset with

Same as Fig.

Figure

Figure

We note that the two discussed sweet spots are selected as representative
cases and other observed sweet spots (see, e.g., Fig.

The top left panel

Since the LS-APC-VB and LS-APC-G methods provide rather stable
estimates of the source term, we will investigate the use of cross-validation
(CV) in optimization-based approaches. The results of CV for
the optimization method with

Same as Fig.

Same as Fig.

The results demonstrate significant differences between the prior
mode and the data mode of the solution, which can be seen in all cross-validation
box plots. This is also the case for

Box-and-whisker plots of the MAE averaged
over all explored prior source terms, with the tuning parameter settings
determined by CV for the optimization method (left) and the LS-APC-VB
method (middle), as well as for the LS-APC-VB method using a default

To evaluate the overall results, we compute the mean MAE over all
estimated source terms using the optimization method with

We demonstrate the sensitivity of the tuning methods to estimation
of the source term for the Chernobyl accident, which became, together
with the Fukushima accident, a widely discussed case in the inverse modeling
literature. Specifically, we study caesium-134 (Cs-134) and caesium-137
(Cs-137) releases from the Chernobyl nuclear power plant (ChNPP).
For this purpose, we use a recently published dataset

The Lagrangian particle dispersion model FLEXPART version 10.3

The exact temporal profile of the Chernobyl release is uncertain, although
certain features are commonly accepted

We follow the setup of

The uncertainties associated with measurements are relatively high
since both concentration and deposition measurements are used from
the dataset

In this case, direct comparison of the estimates with the true emission
profile is not possible since this remains unknown. Therefore, we
will provide results of the tested methods as the sensitivity of the total
estimated release activity to tuning parameters in the same way as
in Sect.

Estimated total released activities for both meteorological reanalyses (ERA-40 and ERA-Interim) and both nuclides (Cs-134 and Cs-137) using all tested methods; see the label bar on the right for a line description.

The resulting estimates of the total released activity are displayed
in Fig.

Similarly to the ETEX results, the results in Fig.

Notice that the estimated mass is higher in the data mode than in the prior mode. This means that the model constrained by the measurement data alone would support a higher total release amount than the a priori source term. The true source term is not known; however, it is likely that the data mode overestimates the true total release. This can happen when the SRS matrix is biased. For instance, removal of radiocesium that is too rapid would lead to estimated air concentrations with the correct source term that are too low, and the inversion would compensate for the bias by increasing the posterior source term (notice, though, that deposition values would in this case be overestimated at least close to the source, leading to the contrary effect for the deposition data). Regardless, this effect shows that in the data mode, the resulting source term is heavily influenced by possible biases in the transport model.

Cross-validation for Chernobyl Cs-134 (top panels) and Cs-137 (bottom panels) source terms using FLEXPART driven with ERA-40 meteorological reanalyses. Optima in the sense of cross-validation are denoted using red circles with total estimated releases reported in the legends.

The same cross-validation scheme as in the case of ETEX
(Sect.

In the case of Cs-134 (top row), the cross-validation was able to
determine optimal values of tuning parameters in the case of all tested
methods. The total estimated releases associated with these tuning
parameters are 87.1 PBq (LS-APC-VB with

The results suggest that a well-selected prior source term can bind the solution to acceptable values and prevent the occurrence of extreme outliers. On the other hand, we observed that the regularization terms commonly used to compensate for errors of the SRS matrices are not able to compensate for the error caused by inaccurate SRS matrices. Further research is clearly needed to develop a more relevant method of regularization.

Methods for the determination of the source term of an atmospheric release
have to cope with inaccurate prediction models often represented by
the source–receptor sensitivity (SRS) matrix. Relying solely on the
SRS matrix using a best estimate of weather and dispersion parameters
may lead to highly inaccurate results. We have shown that various
regularization terms introduced by different inversion methods are
essentially coarse approximations of the error of the SRS matrix, and
thus we can evaluate their suitability using methods of statistical
model validation. We have performed sensitivity tests of inverse modeling
methods to the selection of the prior source term (first guess) and
other tuning parameters for two selected inversion methods: the optimization
method

We have observed that the results have two strong modes of solution: the data mode for minimal influence of the prior on the loss and the prior mode for the loss function with significant influence of the prior. The prior mode is naturally significantly influenced by the choice of the prior source term. However, the dominant impact on the resulting estimate has the choice of the regularization. In the case of the ETEX dataset, good estimates were obtained for every choice of the prior source term; however, the regularization has to be carefully tuned. For some choices of the prior source term, the error of the estimated source term was exceptionally low for good selection of the tuning parameters. After analyzing these minima, we conjecture that they are caused by coincidence. These minima are visible only in comparison with the ground truth; they have no visible impact on the common validation metrics such as the L-curve or cross-validation and thus cannot be objectively identified.

We have tested the suitability of the cross-validation approach for selection
of the tuning parameters for both methods. In the case of the
ETEX release, we have observed that this approach tends to select
modes closer to the data mode than the prior mode of solution. However,
this is not the case of the Chernobyl Cs-134 release for which cross-validation
selects solutions close to the prior-dominated mode. This may be caused
by the fact that the prior source term used here fits the measurements well,
and only small corrections by the inversion are needed.
An interesting question is whether it is beneficial to use a nonzero
prior source term at all. Considering ETEX, for which the
true release is known, one can see that the estimates in data modes
are often even better than the considered prior source terms. On the
other hand, when the prior source term used is close to the true release,
which is probably the case for the Chernobyl Cs-134 release, its use
seems beneficial. Also, the prior source term could be valuable in
cases when the release is not fully seen by the measurement network
and thus the measurements do not provide a good constraint for the
source term estimation. However, determining the reliability of the
prior source term is difficult and even impossible in real-world scenarios,
and the prior source term would probably be shifted, scaled, and/or
blurred. We recommend tackling this task using the cross-validation approach,
providing a reasonable although computationally expensive tool for
determination at least between a prior-dominated mode or a data-dominated
mode of solution. A more sophisticated approach is to design a different
regularization of the error term

From Eq. (

SRS matrices for ETEX are displayed in Fig.

ETEX SRS matrices computed using FLEXPART driven
by meteorological input data from the European Center for Medium-Range
Weather Forecasts (ECMWF).

All data used for the present publication can be freely downloaded from

OT designed and performed the experiments and wrote the paper. LU performed Gibbs sampling experiments and wrote parts of the paper. VŠ designed and supervised the study and wrote parts of the paper. NE prepared the Chernobyl dataset and commented on the paper. AS commented on the paper and wrote parts of the final version.

The authors declare that they have no conflict of interest.

This research has been supported by the Czech Science Foundation (grant no. GA20-27939S).

This paper was edited by Slimane Bekki and reviewed by two anonymous referees.