On tuning of atmospheric inverse methods: Comparison on ETEX and Chernobyl datasets using FLEXPART v8.1 and v10.3

Estimation of the temporal profile of an atmospheric release, also called the source term, is an important problem in environmental sciences. The problem can be formalized as a linear inverse problem where the unknown source term is optimized to minimize the difference between the measurements and the corresponding model predictions. The problem is typically ill-posed due to low sensor coverage of a release and due to uncertainties e.g. in measurements or atmospheric transport modeling, hence, all state-of-the-art methods are based on some form of regularization of the problem using additional infor5 mation. We consider two kinds of additional information, the prior source term, also known as the first guess, and regularization parameters for shape of the source term. While the first guess is based on information independent of the measurements, such as physics of the potential release or previous estimations, the regularization parameters are often selected by the designers of the optimization procedure. In this paper, we provide a sensitivity study of two inverse methodologies on the choice of the prior source term as well as regularization parameters of the methods. The sensitivity is studied in two cases: data from the European 10 Tracer Experiment (ETEX) using FLEXPART v8.1 and the caesium-134 and caesium-137 dataset from the Chernobyl accident using FLEXPART v10.3.

The aim of this paper is to explore the sensitivity of linear inversion methods to the prior source term selection coupled with tuning of the covariance matrix representing modeling error. We considered the optimization method proposed by Eckhardt et al. (2008), its probabilistic counterpart formulated as the hierarchical Bayesian model, extended here by non-zero prior source term, with variational Bayes inference (Tichý et al., 2016) and with Monte Carlo inference using Gibbs sampler (Ulrych and Šmídl, 2017). Two real cases will be examined: ETEX (Nodop et al., 1998) and Chernobyl (Evangeliou et al., 2016) 5 datasets. The ETEX experiment provides ideal data for a prior source term sensitivity study since the emission profile is exactly known. We propose various modifications of the known prior source term and study their influence on the results of the selected inversion methods. The Chernobyl dataset provides, on the other hand, a very demanding case where only consensus on the release is available and source term is more speculative.
2 Inverse modeling using prior source term 10 We are concerned with linear models of atmospheric dispersion using a SRS matrix (Seibert, 2001;Wotawa et al., 2003;Seibert and Frank, 2004) which has been used in inverse modeling (Evangeliou et al., 2017;Liu et al., 2017). Here, the atmospheric transport model calculates the linear relationship between potential sources and atmospheric concentrations. The source-receptor sensitivity is calculated as m ij = c i /x j where x j is the assumed release from the release site at time j and c i is the calculated concentration at a receptor c i at respective time period. The measurement y i at given time and location can be 15 explained as a sum of contributions from all elements of the source term weighted by m ij . In matrix notation where y ∈ p is a vector aggregating measurements from all locations and times (in arbitrary order), x ∈ n is a vector of all possible releases from given time period, and all possible source-receptor sensitivities form the SRS matrix M ∈ p×n .
The residual model, e ∈ p , is a sum of model and measurement errors. While the model looks trivial, its use in practical 20 applications poses significant challenges. The key reason is that all elements of the model are subject to uncertainty (Winiarek et al., 2012;Liu et al., 2017) and the problem is ill-posed.
In the rest of this Section, we will discuss an influence of the modeling error and show how existing methods approach compensation of such error. We will analyze in detail two methods for the source term estimation: (i) the optimization model proposed by Eckhardt et al. (2008) with a prior source term already considered and (ii) a Bayesian model (Tichý et al., 2016) 25 extended here by a prior source term information and solved using both the variational Bayes method or the Gibbs sampling method.

Influence of atmospheric model error
It is generally assumed that the SRS matrix M is correct and the true source term minimizes error of y = Mx. However, M is prone to errors due to a number of approximations in the formulation of the atmospheric transport model and use of uncertain 30 weather analysis data as input to the atmospheric transport model. Therefore, one should rather consider a hypothetical model where M is the available estimate of the sensitivity matrix from a numerical model, and the term ∆ M is the deviation of the estimate from the true generating matrix, M true = (M + ∆ M ). Exact estimation of ∆ M is not possible due to lack of data, however many existing regularization techniques can be interpreted as various simplified parametrizations of ∆ M . 5 The L2 norm 1 of the residuum between measurement and reconstruction for (3) would become The ideal optimization problem (right hand side of (3)) can be decomposed into the norm of residues of the estimated model ||y − Mx|| 2 2 , term linear in x (i.e., −2y T ∆ M x) and term quadratic in x (i.e.,x T Ξx). Both of the additional terms contribute 10 to incorrect estimation of x when ∆ M is significant.
An common attempt to minimize the influence of the linear term is to define the prior source term x a , and subtract Mx a from both sides of (2): Using substitutions x = x − x a and y = y − Mx a , yielding a new decomposition of (3): Terms independent of x do not cause estimation error, and we can neglect them. The decomposition is then which implies a different balance between the linear and quadratic terms than in the previous case.
In the ideal situation, we would like to optimize the left hand side of Eq. (3) and (5). However, due to unavailability of ∆ M 25 the linear term is typically ignored (assumed to be negligible) and the quadratic term is approximated using a parametric form of Ξ. The optimization criterion is then: Algorithm 1 Optimization algorithm for linear inverse problem.
Select and errx, then (possible extension to absolute/relative noise) while xneg xpos < 0.0001 and iter < 50 do iter = iter + 1 solve minimization problem with cost function (7) with (8) The estimation error caused by approximation (7) can be influenced by two choices of the user: (i) first guess x a , and (ii) regularization matrix Ξ. For a good choice of these values, the linear term should be close to zero (which would be the case for x a approaching x). The choice of parametric form of Ξ corresponds to choosing a model of the SRS matrix error, since Ξ = ∆ T M ∆ M . In the following sections, we will discuss methods which estimate Ξ from the data using parametrization of Ξ by tri-5 diagonal matrices with limited number of parameters. Specifically, we will investigate if the choice of x a has an impact on better estimation of Ξ.

Optimization approach
In Eckhardt et al. (2008), the source term inversion problem is formulated as (7) with choices 10 where matrix B is selected or estimated precision matrix, the matrix D a discrete representation of the second derivative with diagonal elements equal to −2 and equal to 1 on the first sub-diagonals and the scalar is the parameter weighting the smoothness of the solution x.
Minimization of (7) does not guarantee the non-negativity of the estimated source term x. To solve this issue, an iterative procedure is adopted (Eckhardt et al., 2008) where minimization of (7) is done repetitively with reduced diagonal elements 15 of B related to the negative parts of the solution, thus tightening the solution to the prior source term which is assumed to be 1 max(M T M ) , Υ (0) = errxIn, and L (0) = In.
2. Iterate from i = 1 until convergence or maximum number of iterations is reached: (a) Compute estimate of the source term x (i) using regularized least squares: and respective moment of the truncated normal distribution.
3. Report estimated source term x and its uncertainty Σx.
non-negative. The diagonal elements of B related to the positive parts of the solution can on the other hand be enlarged up to a selected constant. This can be iterated until the absolute value of the sum of negative source term elements is lower than 0.01% of the sum of positive source term elements. Formally, B j,j in the ithe iteration is calculated as We observed very low sensitivity to the choice of the recommended values of 0.5 and √ 1.5 in Eq. (9). In most cases, varying 5 these values does not lead to any serious differences in the resulting estimate. However, the selection of parameters x a , err x , and is crucial and it will be discussed in Sect. 2.4.
The method is summarized as Algorithm 1 and will be denoted as optimization method in this study. The maximum number of iterations is set to 50 which was enough for convergence in all our experiments. To solve the minimization problem (7), we use the CVX toolbox Boyd, 2008, 2018) for Matlab.

Bayesian approach
In Tichý et al. (2016), the problem was addressed using a Bayesian approach. The difference from the optimization approach is twofold. First, it has different approximation of the covariance matrix Ξ: where matrix L models smoothness and matrix Υ models closeness to the prior source term x a . Matrix Υ = diag (υ 1 , . . . , υ n ) is a diagonal matrix with positive entries while matrix L is a lower bi-diagonal matrix Second, the Bayesian approach allows to estimate the hyper-parameters Υ and L from the data.
To consider the prior vector x a is novel in the LS-APC model.
2. Until the preselected number of iterations is reached (a) Sweep through all variables x, υ, l, ψ1,...,n−1, ω and sample from their respective full conditionals.
(b) Update the parameters of full conditionals based on drawn samples.
3. Based on the histograms of samples, estimate the maximum, mean, median, variance or any other desired statistic.
The best possible approximation minimizes Kullback-Leibler divergence (Kullback and Leibler, 1951) between the estimated solution and hypothetical true posterior. This minimization uniquely determines the form of the posterior distributioñ p(l j |y) =N lj µ lj , Σ lj , ∀j = 1, . . . , n − 1, 5p (ψ j |y) =G ψj (ζ j , η j ) , ∀j = 1, . . . , n − 1, where the shaping parameters µ x , Σ x , α j , β j , µ lj , Σ lj , ζ j , η j , ϑ, ρ are derived in Appendix A. The shaping parameters are functions of standard moments of posterior distribution which are here denoted as x and means expected value with respect to the distribution on the variable in the argument. The standard moments together with shaping parameters form a set of implicit 10 equations which is solved iteratively, see Algorithm 2. Note, that only convergence to a local optimum is guaranteed, hence good initialization and iteration strategy is beneficial, see Algorithm 2 and discussion in (Tichý et al., 2016). The algorithm is denoted as LS-APC-VB algorithm.

Gibbs sampling solution
Gibbs sampler is a Monte Carlo Markov Chain method for obtaining sequences of samples from distributions from which 15 direct sampling is difficult or intractable (Casella and George, 1992). It is a special case of the Metropolis-Hastings algorithm with proposal distribution derived directly from the model (Chib and Greenberg, 1995). Given a joint probability density p(x, υ, l, ψ 1,...,n−1 , ω, y), a full conditional distribution needs to be derived for each variable or a block of variables, i.e. for x, distribution p(x|υ, l, ψ 1,...,n−1 , ω, y) has to be found. These full conditionals then serve as proposal generators and have the same form as (21)- (25). We use the original Gibbs sampler from (George and McCulloch, 1993). Having samples from the last 20 iteration, or a random initialization for the first iteration, the algorithm sweeps through all variables and draws samples from their respective full conditional distributions. It can be shown that samples generated in such a manner form a Markov chain whose stationary distribution, the distribution to which the chain converges, is the original joint probability density. Since the convergence of the algorithm can be very slow, it is common practice to discard the first few obtained samples. This is known as a burn-in period. The advantage of this algorithm is its indifference to the initial state from which sampling starts.
2.4 Tuning parameters and prior source term 5 All mentioned methods are sensitive to a certain extent to the selection of their parameters. Here, we will identify these tuning parameters and discuss their settings in the following experiments. Moreover, we will discuss the selection of the prior source term.
The optimization approach is summarized in Algorithm 1 where two key tuning parameters are needed, parameter err x which affects the closeness of solution to the prior source term through the matrix B and parameter which affects the 10 smoothness of solution. In the following experiments, we select the parameter by experience while it can be seen that the solution is similar for a relatively wide range of values (few orders of magnitude). The parameter err x seems to be crucial for the optimization method and sensitivity to the choice of this parameter will be studied while err x will be referred as the tuning parameter. Note that heuristic techniques such as the L-curve method (Hansen and O'Leary, 1993) can not be used here because of the modification of the matrix B within the algorithm. This will be demonstrated in Sect. 3, Fig. 2. The LS-APC-VB 15 method, summarized in Algorithm 2, also needs the selection of initial err x , however, relatively low sensitivity to this choice was reported (Tichý et al., 2016). The LS-APC-G method, summarized in Algorithm 3, is also initialized using err x while its sensitivity to this choice is negligible due to the Gibbs sampling mechanism.
To select the prior source term seems to be an even more difficult problem especially in cases of releases with limited available information. Therefore, we will investigate various errors in the prior source term which can be considered thanks 20 to controlled experiments where the true source term is available. We consider time-shift of the prior source term in contrast with the true source term, different scale, and a blurred version of the true source term. These errors can be examined alone or combined which will probably be more realistic.

Tuning by cross-validation
While the tuning parameters selected in previous Section are often selected manually, statistical methods for their selection 25 are also available. One of the most popular is cross-validation (?), which we will investigate in the context of source term determination. The methods is really simple, all available data are split in the training and testing data sets:, y train , M train , and y test , M test , respectively. The training dataset is then used for source term estimation while the test dataset is used for computation of the norm of the residue of the estimated source term, ||y test − M test x || 2 . Such estimate is known to be almost unbiased but with large variance. Therefore, the procedure is repeated several times and the tuning parameters are selected 30 based on statistical evaluation of the results. In this experiment, we repeat random selection of 80% of measurements as the training set and using the remaining 20% as the test set. For each tuning parameter err x , this is repeated 100 times in order to reach statistical significance of the selected tuning parameter. The true ETEX source term is repeated in every panel (red dashed line). The middle and the lowermost rows display mean absolute error between estimated and true source terms for ETEX ERA-40 and ETEX ERA-Interim datasets, respectively.

Sensitivity study on ETEX dataset
The European Tracer Experiment (ETEX) is one of a few large controlled tracer experiments (see https://rem.jrc.ec.europa.eu/etex/).
We use data from the first release in which a total amount of 340 kg of nearly inert perfluoromethylcyclohexane (PMCH) was released at a constant rate for nearly 12 hours at Monterfil in Brittany, France, on 23 October 1994 (Nodop et al., 1998).
Atmospheric concentrations of PMCH were monitored at 168 measurement stations across Europe with a sampling interval 5 of 3 hours and a total number of measurements 3102. The ETEX dataset has been used as a validation scenario for inverse modeling, see e.g. (Bocquet, 2007;Martinez-Camara et al., 2014;Tichý et al., 2016). The great benefit of this dataset is that the estimated source terms can be directly compared with the true release given in Fig. 1, first row, using dashed red lines.
To calculate the SRS matrices, we used the Lagrangian particle dispersion model FLEXPART (Stohl et al., 1998(Stohl et al., , 2005 version 8.1. We assume that the release period occurred during 120 hours period, thus, 120 forward calculations of 1 hour 10 hypothetical unit release were performed and SRS coefficients were calculated from simulated concentrations corresponding to the 3102 measurements. As a result, we obtained the SRS matrix M ∈ R 3102×120 . FLEXPART is driven by meteorological input data from the European Center for Medium-Range Weather Forecasts (ECMWF) where different datasets are available.
We used two: (i) data from the 40-year re-analysis (ERA-40) and (ii) data from the continuously updated ERA-Interim reanalysis. The tested method will be compared in terms of the mean absolute error (MAE) between the estimated and the true source term for different tuning parameters err x . We select a two representative values of the smoothing parameter for the optimization method. Specifically, we selected = 10 −5 and = 10 −4 , while higher values lead to overly smooth, and lower values to non-smooth solutions. We tested 8 different prior source term x a , see Fig. 1, top row, black lines: x a equal to (i) true source term, (ii) zeros for all elements, (iii) true source term right-shifted by 5 time steps, (iv) true source term scaled by factor 2.0,

5
(v) true source term blurred using convolution kernel of the size 5 times steps, left-shifted by 5 time-steps, and scaled by factor 1.3, (vi) true source term substantially right-shifted, (vii) true source term scaled by factor 10, and (viii) source term with constant activity. The resulting MAEs for all tested methods and for all 8 prior source terms are displayed in Fig. 1 for ETEX with ERA-40 dataset in the second column and for ETEX with ERA-Interim in the third row. The figures in the second and the third rows are accompanied by the MAE between the true source term and the used prior source term displayed using dashed 10 red lines.

Results
We observe that for all choices of the optimization method, the results exhibit two notable modes of solution: the data mode for tuning parameters with minimum impact on the loss function, and the prior mode for tuning parameter values that cause the prior to dominate the loss function. This is notable for the results on the range of err x in Fig. 1. For err x = 10 −10 the data term 15 dominates the loss function, and all methods converge to a similar answer (note that the data mode is different for different smoothing parameters in the optimization method).
For err x = 10 5 , the loss function is dominated by the prior and all estimates converge to x a . Although visibly stable modes of all the solutions are typically only two (data and prior mode), we also observe a third mode in the optimization solution, best seen e.g. in Fig. 1 in the second row and the fourth column or in the third row and the fifth column, where the error significantly 20 drops. These "sweet spots" are the desired locations that we hope to find by tuning of the hyper-parameters. While they are obvious when we know the ground truth, the challenge is to find them without this knowledge.
An attempt to find the optimal tuning via the L-curve method (i.e. dependence between the norm of the solution and the norm of the residuum between measurement and reconstruction) is displayed and demonstrated in two cases: ETEX ERA-40 with x a 2.0 scaled (Fig. 2, left) and ETEX ERA-Interim with x a shifted, scaled and blurred (Fig. 2, right) for optimization method with = 10 −5 . In these cases (and all others), L-curve shapes were not reached at all and thus an optimum regularization parameter can not be chosen from these plots. The red crosses denote the value corresponding to minima of MAEs. One can see that 5 the sweet spots are on the transition between the data mode and the prior mode of solutions with no specific feature in these measures. More detailed analysis is presented in the next Section.
The LS-APC-VB method also exhibits modes of solution; however, the transition between the data mode and the prior source term mode seems to be rather fast. Notably, no such transitions are observed in the case of the LS-APC-G method. This is caused by the fact that the Gibbs sampling is not sensitive to the selection of the initial state, as discussed in Sect. 2.3.2.

10
With the exception of x a as a constant activity (Fig. 1, the eighth column), the LS-APC-VB method performs better than the optimization method when approaching the data mode of a solution. The LS-APC-G method suffers from overestimating the source term in time steps when the true source term is zero and not enough evidence is available in data. This can be clearly seen in Fig. 3 and Fig. 4 where estimates from the LS-APC-G method are displayed using green lines, see especially time steps between 15 h and 40 h. and (c), the slow transition between these two modes allows to approach the true source term closely, since the chosen prior term is only scaled version of the true release and the sweet spot lies exactly between the two modes. Both the LS-APC-VB and the LS-APC-G methods are diverging from the "good" solution since they consider it to be very unlikely with respect to the observed data. Since no heuristics such as the L-curve can identify this tuning as providing good results, see Fig. 2, left, we we argue that choosing the optimal setting of the tuning parameter is not possible without knowledge of the true source term 30 and the occurrence of the sweet spot is only a coincidence. mode (a) contains also non-zero elements mainly in the first half of the source term. The transition can be seen in Fig. 4

(b) and
(c) where the non-zero activity at the beginning of the data mode is eliminated by using prior source term information while the non-zero elements are relatively close to the true release (b). In (c), the zero activity in the first half remains due to the prior source term, however, the estimated activity within the true release period moves toward the assumed prior source term. In (d), the estimation is already very close to the chosen prior source term. Once again, the improvement appears to be coincidental 5 rather than systematic.
We note that the two discussed sweet spots are selected as representative cases and other observed sweet spots (see e.g. Fig.   3, the second or the eighth columns) are very similar in nature. By analyzing the sweet spots, we conclude that they represent a transition from the data mode to the prior mode of solution. In some cases, the transition is very close to the true release, see e.g. Fig. 3, while in some cases, no point on the transition path approaches the true solution, see e.g. Fig. 4, and the data or prior mode is the closest.

Tuning by cross-validation
Since the LS-APC-VB and the LS-APC-G methods provide rather stable estimates of the source term, we will investigate the use of cross-validation (CV) on the optimization-based approaches. The results of CV for the optimization method with The results demonstrate significant differences between the prior mode and the data mode of the solution which can be seen on all cross-validation boxplots. This is also the case x a that are not displayed here. Notably, the minima of cross-validation  are not reached in positions of the sweet spots, indicating that the observed MAE minima are coincidental. In all tested cases, the minima of cross-validation are reached closer to the the data mode than to the prior mode. This is demonstrated on the extreme case of x a equal to the true source term in Fig. 7. Even for this case, the minimum of cross-validation is associated with the data mode rather than the prior mode.
To evaluate the overall results, we compute the mean MAE over all estimated source terms using optimization method with 5 = 10 −5 with the tuning parameter err x selected using cross-validation (CV) for each prior source term x a . This result is given in Fig. 8 using box-and-whiskers plot. The same box-and-whiskers plots are also computed for the LS-APC-VB method with the same scheme of selection of tuning parameters err x using cross-validation method (denoted by (CV) in Fig. 8) and for the LS-APC-VB algorithm with the tuning parameter err x set to 10 0 as recommended in (Tichý et al., 2016) (denoted by (default) in Fig. 8). These results suggest the LS-APC-VB method with fixed start (but weighted to data using selection of ω (0) ) slightly outperforms other methods in terms of the mean MAE on ETEX data with various assumed prior source terms without necessity of exhaustive tuning.

Sensitivity study on Chernobyl dataset
We demonstrate the sensitivity of the tuning methods on estimation of the source term of the Chernobyl accident which became, together with the Fukushima accident, a widely discussed case in inverse modeling literature. Specifically, we study caesium- Cs-137. We use the same experimental setup as in (Evangeliou et al., 2017) which will be now briefly described.

Atmospheric modeling
The Lagrangian particle dispersion model FLEXPART version 10.3 (Stohl et al., 199810.3 (Stohl et al., , 2005Pisso et al., 2019) was used 20 to simulate the transport of radiocesium and to calculate the SRS matrices. FLEXPART was driven by two meteorological reanalyses so that these can be compared. First, ERA-Interim (Dee et al., 2011), a European Center for Medium-range Weather Forecast (ECMWF) reanalyses, was used with temporal resolution of 3 hours and spatial resolution of circa 80 km on 60 vertical levels, from surface up to 0.1 hPa. Second, the ERA-40 (Uppala et al., 2005), ECMWF reanalysis, was used at a 125 km spatial resolution. The emissions from the ChNPP are discretized into six 0.5 km high vertical layers from 0 to 3 km. The assumed temporal resolution is 3 hours from 0000 UTC on April 26 to 2100 UTC on May 5, 1986, for which the forward 5 runs of FLEXPART are done. This period is selected according to the consensus that the activity declined by approximately six orders of magnitude after May 5 (De Cort et al., 1998). This discretized the temporal-spatial domain to 480 assumed releases (80 temporal elements times 6 vertical layers) for each nuclide. For each spatio-temporal element, concentrations and depositions sensitivities are computed using 300.000 particles. Following (Evangeliou et al., 2017), the aerosol tracers Cs-134 and Cs-137 are subject to wet (Grythe et al., 2017) and dry (Stohl et al., 2005) deposition depending on different particle sizes 10 with aerodynamic mean diameters of 0.4, 1.2, 1.8, and 5.0 µm. The distribution of mass is assumed as 15%, 30%, 40%, and 15% for 0.4, 1.2, 1.8, and 5.0 µm particle sizes respectively, following measurements of (Malá et al., 2013) and computation results of (Tichý et al., 2018).

Prior source term and measurements uncertainties
The exact temporal profile of the Chernobyl release is uncertain although certain features are commonly accepted (De Cort 15 et al., 1998). The first three days correspond to higher release with initial explosion and release of part of the fuel. The next four days correspond to lower releases and the last three days correspond to higher releases again due to fires and core meltdown.
After these ten day, the released activity dropped by several orders of magnitude (De Cort et al., 1998).
We follow the setup of Evangeliou et al. (2017) and consider six previously estimated Chernobyl source terms of Cs-134 and Cs-137 where the criterion of consideration was their sufficient temporal resolution and emission height information. The 20 source terms are taken from Brandt et al. (2002) (two cases with the same amount of release with slightly different release heights), Persson et al. (1987), Izrael et al. (1990), Abagyan et al. (1986), and Talerko (2005). See Evangeliou et al. (2017) for their detailed descriptions and profiles. The prior source term in our experiment is computed as their average emissions at a given time and height. In sum, the total prior releases of Cs-134 and Cs-137 are 54 PBq and 74 PBq, resprectively.
The uncertainties associated with measurements are relatively high since both concentration and deposition measurements 25 are used from the dataset (Evangeliou et al., 2016). As was pointed out by Gudiksen et al. (1989); Winiarek et al. (2014), deposition measurements may be biased by unknown mass of radiocesium already deposited over Europe from e.g. nuclear weapons tests. This mass has been, however, reported (De Cort et al., 1998) and already subtracted from the dataset. Still, similarly to Evangeliou et al. (2017), we consider relative measurement errors of 30% for concentration measurements and 60% for deposition measurements while the absolute measurement errors are handled in the same way as in Stohl et al. (2012).

Results
In this case, direct comparison of the estimates with the true emission profile is not possible since this remains unknown.
Therefore, we will provide results of the tested methods as sensitivity of the total estimated release activity to tuning parameters Optimization with x a , eps=10^(-2) Optimization with x a =0, eps=10^ (-2) Optimization with x a , eps=10^(-4) Optimization with x a =0, eps=10^(-4) Figure 9. Estimated total released activities for both meteorological reanalyses ERA-40 and ERA-Interim and both nuclides Cs-134 and Cs-137 using all tested methods, see label bar on the right for line description.
in the same way as in Sect. 3. Note that the total release activity is a sum of releases from all six vertical layers and all four aerosol size fractions. Due to this composition of the problem, the selection of the smoothness parameter in the case of the optimization approach is relatively difficult since specific selection may fit better for one vertical layer than for another. We will provide results for two settings of this parameter, = 10 −2 and = 10 −4 , leading to two different behavior of the optimization method.

5
The resulting estimates of the total released activity is displayed in Fig. 9 where the total of the used prior source term x a is displayed using dashed red line (same for all tested settings of the tuning parameter err x ). The estimated total release activity with the use of the prior source term x a is displayed using full lines with colors given in the legend in Fig. 9 while estimations without the use of this prior source term, i.e. with x a = 0, is given using dashed lines and respective colors.
Similarly to the ETEX results, the results in Fig. 9 suggest the occurrence of two main modes of solution, the data mode and 10 the prior mode, with a smooth transition between them in the cases of LS-APC-VB and optimization methods. The LS-APC-G method (evaluated only at four points denoted by green squares due to high computational costs) has, again, low sensitivity to the initialization of the tuning parameter. However, the results of the LS-APC-G method are close to the data mode of the remaining methods, or higher than those. Contrary to the previous results, the LS-APC-VB algorithm does not provide a stable solution and suffers from the need to select the tuning parameter. This signifies that the problem is ill conditioned even with the 15 proposed regularization term, thus VB is converging to various local minima. The optimization method with both settings of the smoothness parameter has also two modes of solution. In the prior mode of solution (higher values of the tuning parameter), both settings approach the same total release activity for both non-zero (full lines) and zero (dashed lines) prior source terms.
The prior mode is dominated by the used prior source term for an arbitrary smoothness parameter . The difference can be seen in the data mode where about one-third higher total released activity was estimated for smoothness parameter = 10 −4 20 than for smoothness parameter = 10 −2 on the same level of tuning parameter err x . This is caused by the penalization of high peaks of activity in the case of = 10 −2 . Thus, in the data mode of solution, the smoothness parameter is much more important than the used prior source term which plays almost no role here. Notice that the estimated mass is higher in the data mode than in the prior mode. This means that the model constrained by the measurement data alone would support a higher total release amount than the a priori source term. The true source term is not known, however, it is likely that the data mode overestimates the true total release. This can happen when the SRS matrix is biased. For instance, too rapid removal of radiocesium would lead to too low estimated air concentrations with the correct source term, and the inversion would compensate the bias by increasing the posterior source term (notice, though, 5 that deposition values would in this case be overestimated at least close to the source, leading to the contrary effect for the deposition data). Regardless, this effect shows that in the data mode, the resulting source term is heavily influenced by possible biases in the transport model.

Tuning by cross-validation
The same cross-validation scheme as in the case of the ETEX experiment, Sect. 3.3, was used here for the Chernobyl datasets. 10 The train/test spilt was once again 80%/20% and the CV was performed 50 times for each tuning parameter err x . The crossvalidation errors are displayed in Fig. 10 using box plots and associated mean values of the residue errors ||y test − M test x || 2 .
Here, the results are given for the datasets of Cs-134 (top row) and Cs-137 (bottom row) with FLEXPART driven with ERA-40 meteorological fields. We will investigate CV on tuning of parameters for the optimization as well as the LS-APC-VB method.
The results are presented for two x a configurations in Fig. 10, from the left: LS-APC-VB with x a , LS-APC-VB with x a = 0, 15 the optimization method with x a and with smoothness parameter = 10 −2 and the optimization method with x a = 0 and with smoothness parameter = 10 −2 . For these, boxplots are displayed together with mean residuals using the same types of lines as in Fig. 9. Moreover minimal mean residuals are identified and denoted using red circles in Fig. 10 for each graph and their associated total activities are displayed in the legend of each graph.
In the case of Cs-134 (top row), the cross-validation was able to determine optimal values of tuning parameters in cases of all tested methods. The total estimated releases associated with these tuning parameters are 87.1 PBq (LS-APC-VB with x a ), 5 56 PBq (LS-APC-VB with x a = 0), 69.7 PBq (the optimization method with x a ), and 43 PBq (the optimization method with x a = 0) which are in accordance with the mean of previously reported total activity 54 PBq used as a prior. Note that the prior dominated modes have lower residuals than the data dominated modes in all cases. This suggests that the used prior source term applied to the FLEXPART/ERA-40 simulation matches well with the measurements. On the other hand, this is not the case for Cs-137 where the prior dominated modes have, with the exception of LS-APC-VB with x a = 0, significantly higher 10 residuals than the data dominated modes. This may be caused by two factors. First, the prior source term is less adequate for interpretation of measurements of Cs-137 than those of Cs-134. Second, all methods assume quadratic loss function which may be less appropriate for this dataset and caused overestimation of the source term with the tuning parameter selected using cross-validation in comparison with previously reported 74 PBq used as a prior. We note that similar results were also observed with the ERA-Interim dataset. 15 The results suggest that a well selected prior source term can bind the solution to acceptable values and prevents the occurrence of extreme outliers. On the other hand, we observed that the regularization terms commonly used to compensate errors of the SRS matrices are not able to compensate the error caused by inaccurate SRS matrices. Further research is clearly needed to develop more relevant method of regularization.

20
Methods for determination of source term of an atmospheric release have to cope with inaccurate prediction model, represented often by the source-receptor sensitivity (SRS) matrix. Relying solely on the SRS matrix using best estimate of weather and dispersion parameters may lead to highly inaccurate results. We have shown that various regularization terms introduced by different inversion methods are essentially coarse approximation of the error of the SRS matrix and thus we can evaluate their suitability using methods of statistical model validation. We have performed sensitivity of inverse modeling methods to the 25 selection of the prior source term (first guess), and other tuning parameters for two selected inversion methods, the optimization method (Eckhardt et al., 2008), and the LS-APC method (Tichý et al., 2016), on datasets from the ETEX controlled release, and the Chernobyl releases of caesium-134 and caesium-137.
We have observed that the results have two strong modes of solution, the data mode for minimal influence of prior on the loss, and the prior mode for loss function with significant influence of the prior. The prior mode is naturally significantly 30 influenced by the choice of the prior source term. However, the dominant impact on the resulting estimate has the choice of the regularization. In the case of the ETEX dataset, good estimates were obtained for every choice of the prior source term, however, the regularization has to be carefully tuned. For some choices of the prior source term, the error of the estimated source term was exceptionally low for good selection of the tuning parameters. After analyzing these minima, we conjecture that they are caused by coincidence. These minima are visible only in comparison with the ground truth, they have no visible impact on the common validation metrics such as the L-curve or cross-validation and thus cannot be identified objectively.
We have tested suitability of the cross-validation approach for selection of the tuning parameters for both tested methods. In the case of the ETEX release, we have observed that this approach tends to select modes closer to the data mode than the prior 5 mode of solution. However, this is not the case of the Chernobyl Cs-134 release where cross-validation selects solutions close to the prior dominated mode. This may be caused by the fact that the used prior source term here well fits the measurements and only small corrections by the inversion are needed.
An interesting question is whether it is beneficial to use a non-zero prior source term at all. Considering the ETEX experiment, where the true release is known, one can see that the estimates in data modes are often even better than the considered 10 prior source terms. On the other hand, when the used prior source term is close to the true release, which is probably the case for the Chernobyl Cs-134 release, its use seems beneficial. Also, the prior source term could be valuable in cases when the release is not fully seen by the measurement network and thus the measurements do not provide a good constraint for the source term estimation. However, to determine the reliability of the prior source term is difficult and even impossible in real world scenarios and the prior source term would be probably shifted, scaled, and/or blurred. This task can be tackled using 15 cross-validation approach providing a reasonable although computationally expensive tool for determination at least between a prior dominated mode or a data dominated mode of solution.
Code and data availability. All data used for the present publication can be freely downloaded from https: