On the model uncertainties in Bayesian source reconstruction using the emission inverse modelling system FREARtool v1.0 and the Lagrangian transport and dispersion model Flexpart v9.0.2

the emission inverse modelling system FREARtool v1.0 and the Lagrangian transport and dispersion model Flexpart v9.0.2 Pieter De Meutter1,2,3, Ian Hoffman1, and Kurt Ungar1 1Radiation Protection Bureau, Health Canada, 775 Brookfield Road, Ottawa, Canada 2Belgian Nuclear Research Institute, Boeretang 200, Mol, Belgium 3Royal Meteorological Institute of Belgium, Ringlaan 3, Brussels, Belgium Correspondence: Pieter De Meutter (pieter.de.meutter@sckcen.be)


Activity concentration observations
Twelve 106 Ru detections have been used from five different stations of the International Monitoring System which is being commissioned to verify compliance with the Comprehensive Nuclear-Test-Ban Treaty once it enters into force: three consecutive detections at station RN40 (Kuwait City, Kuwait) taken between 4 and 6 October 2017; one detection at station RN56 (Peleduy, Russian Federation) taken on 6 October 2017 ; four consecutive detections at station RN59 (Zalesovo, Russian Fed- The source-receptor-sensitivities have horizontal grid spacings of 0.5°. An activity concentration (in Bq/m 3 ) can be related to a release (in Bq) by multiplying the latter with the source-receptor-sensitivity (in s), and dividing by the grid box volume and the 105 output frequency. An ensemble of numerical weather predictions was used to create an ensemble of atmospheric transport and dispersion simulations. The transport and dispersion processes themselves were not perturbed. The Ensemble of Data Assimilations from the European Centre for Medium-Range Weather Forecasts (ECMWF) was used to run Flexpart. It consists of 26 independent lower-resolution 4D-Var assimilations: one using unperturbed observations and physics, and 25 using perturbed observations, sea-surface temperatures and model physics (Bonavita et al., 2016). By adding and subtracting perturbations from 110 the ensemble mean, the number of perturbed members was doubled, so that 50 perturbed and one unperturbed member were obtained. The perturbations are created in such a way that each ensemble member represents a possible scenario for the true atmospheric state, and the spread between the different members represents the uncertainty. Then, for each weather ensemble member, Flexpart was run so that an atmospheric transport ensemble of 51 source-receptor-sensitivities was obtained.
3 Bayesian source reconstruction 115 De Meutter and Hoffman (2020) have developed and applied a Bayesian inference system to find the source parameters of an anomalous 75 Se release based on airborne 75 Se activity concentrations. The main components of the inference system are summarized here, but details are given in De Meutter and Hoffman (2020) and references therein.

Source parameters
The unknown source is described by five source parameters: In practice, the release period is parameterized by fractions rstart and rstop to ensure that t start occurs before t stop . The corresponding release period is calculated as follows: 5 https://doi.org/10.5194/gmd-2020-162 Preprint. Discussion started: 23 September 2020 c Author(s) 2020. CC BY 4.0 License.
with t 1 and t m the first and last time for which source-receptor-sensitivities are available for the source reconstruction. The release rate is assumed constant during the release period. The vertical position of the source is assumed to stretch between the surface and the top of the lowest model layer (at 100 m). Thus, the unknown source is parameterized as follows: with δ the Dirac delta function and H the Heaviside step function.

135
Uninformative bounded uniform priors are used for the source parameters. The source longitude is assumed to be between 20°and 80°and the source latitude is assumed to be between 40°and 70°(see Fig. 1 for a map showing the search domain).
The accumulated release is assumed to be between 10 10 Bq and 10 16 Bq. Since this spans many orders of magnitude, we take log 10(Q) as source parameter in our implementation and simply impose a uniform prior between 10 and 16. Recall that the first observation that we consider for the inference was taken on 1 October 2017. Therefore, the release is assumed to have 140 occurred between 25 September 2017 0000 UTC and 28 September 2017 0000 UTC. Generally, the upper limit on the release time will exclude solutions further downwind, while the lower limit on the release time will exclude solutions further upwind.
Uniform bounded priors between 0 and 1 are used for rstart and rstop.

Likelihood
De Meutter and Hoffman (2020) proposed likelihood equations that can take into account detections, instrumental non-145 detections, misses and false alarms using Currie detection limits (Currie, 1968). Since non-detections will not be used in this study, only the likelihood of detections will be used here. The possibility of a false alarm, where the detector wrongly identifies a detection, is also considered. For simplicity, the observations c det are assumed to be independent, thereby neglecting possible geotemporal correlations. As a result, the total likelihood is simply the product of the likelihood associated with individual observations: First, let us define c true , which is the true activity concentration that will never be known. Next, we define c det , which is the activity concentration as seen by the detector and which can differ from c true (the observed net signal can even be negative due to the statistical nature of spectroscopic analysis). We use the Currie critical threshold, L C , (Currie, 1968) to decide, given a net signal, whether a real detection took place (d obs if c det > L C ) or not (d obs if c det < L C ). One can then define a true 155 non-detection, a miss, a true detection and a false alarm as in Table 1. In this application, the risk of a missed detection and Table 1. Definition of a true non-detection, a miss, a false alarm and a true detection based on the Currie critical level LC , the detected activity concentration c det and the "true" activity concentration ctrue which will never be known.
dtrue (ctrue < LC ) true non-detection false alarm dtrue (ctrue > LC ) miss true detection a false alarm is set equal to 5 %. c true will never be known, but we assume that the modelled activity concentration c mod corresponding to a source hypothesis Ξ and its associated uncertainty results in a distribution of true activity concentrations d ctrue given by the following formula: with the index i denoting "corresponding to the i th observation" (with i = 1...12), Γ the gamma function and s,ᾱ andβ parameters of an inverse gamma distribution. The above equation was used by Yee (2012) as likelihood function for detections, and was obtained by starting from a Gaussian function, replacing the standard deviation σ by an inverse gamma distribution and integrating over all possible values of σ. However, in order to take into account the possibility of a false alarm, instead we propose the following likelihood for the i th detection c det,i given its corresponding modelled activity concentration c mod,i 165 associated with a source hypothesis Ξ: p(c det,i |c mod,i ) = p(c det,i |c mod,i , d true ) p(d true |c mod,i ) + p(c det,i |d true ) p(d true |c mod,i ).
In the equation above, p(d true |c mod ) is the probability of a true detection, given by: 170 and p(d true |c mod ) the probability of a true non-detection, given by: 7 https://doi.org/10.5194/gmd-2020-162 Preprint. Discussion started: 23 September 2020 c Author(s) 2020. CC BY 4.0 License.
In these equations, p(c true |c mod ) is simply equal to Eq. 5, but normalized (below, c true is a dummy variable for integration): gives the likelihood of detecting c det,i given c mod,i and assuming that the detection was not a false alarm. We assume it is given by Eq. 5 as follows: p(c det,i |d true ) is the likelihood of detecting c det,i assuming that the detection is a false alarm: with k α = 1.645 for a false alarm risk of 5 %. The likelihood and its different components are plotted for two hypothetical detections in Fig. 2. For the hypothetical detection c det close to its L C (Fig. 2, left), the likelihood is significantly increased for small c mod because the possibility of a false alarm is considered. For a hypothetical detection c det significantly larger than its L C (Fig. 2, right), the consideration of a false alarm does not significantly add to the total likelihood. Indeed, for c det,i L C,i 185 and c mod,i L C,i , Eq. 6 simplifies to:

Sampling from the posterior
The general-purpose Markov chain Monte Carlo algorithm MT-DREAM (ZS) (Laloy and Vrugt, 2012) is used to sample the posterior distribution. A chain of source hypotheses is obtained by repeatedly creating a proposal source hypothesis based 190 on the current source hypothesis, and then accepting the proposal or -if rejected -retaining the current source hypothesis.  A key feature of a Markov chain Monte Carlo algorithm is its ability to construct proposals in such a way that the posterior distribution is efficiently explored and sampled. MT-DREAM (ZS) creates a new proposal by adding a perturbation to the current source hypothesis. Such perturbations are created by taking the difference of two randomly drawn states out of an archive of past states, so that the size and direction of the perturbation are adaptive and optimal to the problem -without the necessity of 195 prior tuning. Every 10 th iteration, the current source hypothesis is added to the archive. Three chains are run simultaneously (sharing the same archive of past states) to diagnose convergence more easily. The algorithm is designed so that a snooker step occurs with a probability of 20 % to allow jumps between different posterior modes (ter Braak and Vrugt, 2008). To enhance efficiency and to obtain more accurate results, randomized subspace sampling is used (Vrugt et al., 2009). Furthermore, MT-DREAM (ZS) makes use of multiple try Metropolis sampling (Liu et al., 2000) to enhance the mixing of the chains.

Observation and model errors
The detected activity concentration c det is assumed to be Gaussian distributed around the true activity concentration c true , with a standard deviation σ obs = L C /k α as in Eq. 14. The model error originates from errors in the source-receptor-sensitivities calculated by the atmospheric transport model. The main sources of error are simplifications in the turbulence parameterization of the atmospheric transport model and errors in the numerical weather prediction data used to run the atmospheric transport 205 model. As it is very hard if not impossible to specify the model error, we assume Gaussian errors, but replace a fixed σ mod by an inverse gamma distribution, as mentioned earlier in this section and following Yee (2012): with the subscript i denoting that these values can be observation-specific. The parameters s,ᾱ andβ of the inverse gamma distribution are respectively an estimate of σ, a scale parameter and a shape parameter. Yee (2012) proposed to useᾱ = π −1 210 9 https://doi.org/10.5194/gmd-2020-162 Preprint. Discussion started: 23 September 2020 c Author(s) 2020. CC BY 4.0 License. andβ = 1, in which case σ mod,i = s i and the variance of σ mod,i becomes infinite (Eq. 16 will then be a very heavy-tailed distribution). We now have to come up with a value for s i . Since the source-receptor-sensitivities typically span many orders of magnitude, it makes more sense to define a relative error rather than an absolute error. Furthermore, because the sourcereceptor-sensitivities are linearly proportional to the activity concentration, we propose the following for s i : As a consequence, the model uncertainty does not depend on the source parameters. Note that the above proposal results in a larger relative model error if c det,i < 16 * L C,i . This is desirable since small detections are caused by a part of the plume of radionuclides that was subject to more atmospheric transport and dispersion processes and thus should have a larger relative uncertainty than large detections (although the latter have higher absolute uncertainty). In Eq. 13, s i is replaced by the following in order to take into account both observation and model error: From now on, we will express any parameters s i and σ as relative errors, thus as fractions of max(c det,i , 16 * L C,i ) (see Eq. 17).

Introduction to multipliers
In this and the next subsection, an alternative model error structure will be discussed involving multipliers or scale factors.
Besides being an alternative model error, multipliers could also be used to take into account unknown errors (such as errors due to local atmospheric features not resolved by the model). Yee et al. (2014) used a small number of activity concentration measurements to retrieve the known source parameters of a 240 major medical isotope production facility. The resulting source longitude, latitude and the release term were compared with the true source parameters in order to evaluate the performance of the source reconstruction. The method they used is similar to the method employed here: a Lagrangian stochastic particle model was used in backward mode to generate the source-receptorsensitivities, and the source parameters were obtained using Bayesian inference and the MT-DREAM (ZS) algorithm. Yee et al. (2014) showed that the main challenge in source reconstruction lies in the correct specification of the model uncertainty (both 245 scale and structure) by using two different measurement models: 11 https://doi.org/10.5194/gmd-2020-162 Preprint. Discussion started: 23 September 2020 c Author(s) 2020. CC BY 4.0 License.
In Eq. 19, i is the combined model and measurement error, which is assumed to be drawn from a Gaussian distribution with unknown standard deviation. Again, the unknown standard deviation is replaced by a distribution rather than a single fixed 250 value. In Eq. 20, i only represents the measurement error. The model error is now taken into account by so-called scale factors or multipliers m i . These multipliers are parameters of the Bayesian inference and they are allowed to vary between 0.1 and 10.
The range is chosen so that the multipliers can compensate for model underpredictions or overpredictions of up to a factor 10. Yee et al. (2014) found that using Eq. 19 resulted in a source reconstruction that did not include the correct source parameters.
The multipliers (Eq. 20) on the other hand resulted in a huge shift in the posterior source location, which significantly improved 255 the source reconstruction.

Multipliers as unknown model error
Here The multipliers are additional parameters that need to be estimated during the Bayesian inference. For the sampling of the parameter, we work with log 10 (m i ) and assign a uniform prior between [−1, 1]. While the multipliers increase the run time of the Bayesian inference, the increase is small -especially if one considers the huge increase in the number of unknown parameters (one additional parameter per observation).

265
We apply the multipliers using three different values for the model uncertainty parameter s i (0.3, 0.5 and 3), and show the results in Fig. 3  with and without multipliers are significantly different for small values of s i (Fig. 3, a and b), there is substantial agreement in the results when using s i = 3.0 (Fig. 3, e and f). This suggests that the effect of introducing multipliers and the effect of increasing the model uncertainty converges when specifying large model uncertainty.
The results in this section lead to the question: which of the source location maps shown in Fig. 3  3. The natural logarithm is applied to all SRS since these span many orders of magnitude. If any ensemble member has an 295 SRS equal to 0 for a specific grid box, all its 51 SRS are omitted from the analysis.
As an example, the probability density function of ensemble SRS members around its ensemble median is shown in Fig. 4 for an arbitrary observation and an arbitrary time. It can be seen that most SRS values fall within a factor exp(3) ≈ 20 of the ensemble median. Of particular interest is how this density can be approximated by a statistical distribution. In Fig. 4, two distributions are fitted to the density: a Gaussian distribution, and a fit using Eq. 5. Interestingly, while the probability density 300 function can roughly be represented by a Gaussian distribution (effectively being a lognormal distribution with respect to the SRS, since the natural logarithm was applied), its tails are heavier than a Gaussian distribution. On the other hand, a much better fit is obtained using Eq. 5. This distribution describes how the (unknown) true activity concentration is distributed on average around the modelled activity concentration. Similarly, the ensemble gives a distribution of the true SRS around the model predicted SRS. Note that the activity concentrations are simply linear combinations of SRS. This result shows that our 305 choice for the model error structure is supported by the ensemble. This initial result will be elaborated in the next subsection.
1 Since the SRS files are output every three hours, the maximum value for the SRS is 10800 s.  . Also shown are two fits, one using Eq. 5 (blue dashed line) and one using a Gaussian distribution (orange dotted line). Note that the SRS are scaled by its ensemble median, and that the natural logarithm is applied.

Fitting model uncertainty
In this subsection, the SRS ensemble is used to determine the parameters s i ,ᾱ i andβ i in Eq. 5, which describes how the true activity concentration c true is believed to be distributed given the modelled activity concentration c mod,i . The fitting can be performed in several ways, and four different cases are considered. A goodness-of-fit is calculated for each case. As a 310 reference, the fitting procedure is repeated using a Gaussian distribution instead of Eq. 5. The mean of the Gaussian is fixed to be 0 (corresponding to the median of the SRS ensemble), so that only the standard deviation is allowed to vary. Since our Gaussian fitting thus has only one degree of freedom (and given the suggestion of long tails in Fig. 4), it should not be surprising that the Gaussian fit will be outperformed by the fit using Eq. 5.
The following cases are considered to obtain the uncertainty parameters: The following goodness-of-fit is chosen to quantify how well the fitted probability density function p f it resembles the ensemble probability density function p ens ; it involves the integration of the absolute difference of both densities and is simply the fraction of overlap in density: If the overlap in density equals 1, p f it equals p ens . On the other hand, if it equals 0, there is no overlap between p f it and p ens .
The uncertainty parameters are obtained using the procedures outlined in case 1 to 4. Then, for each case, the overlap in density is calculated by comparing p f it with p ens for each observation and each time separately, resulting in 288 data points since there are 24 different times and 12 observations. The set of overlap in densities is used to construct box plots in Fig. 5. The 330 overlap in density is poor when using the a priori values for the uncertainty parameters (labels "InvG" and "Gaus", case 1 in Fig 5). While the outcome would have been entirely different if better a priori values would have been chosen, its significance should not be underestimated: (i) although seemingly realistic a priori uncertainty parameters have been chosen, the agreement with the ensemble density is poor; (ii) for this case and this particular choice of uncertainty parameters, a Gaussian distribution resulted in a better agreement; (iii) in the absence of an ensemble, using a priori chosen uncertainty parameters might be the 335 only available option. When fitting the uncertainty parameters once for all observations and all times, the overlap in density is significantly improved (labels "InvG" and "Gaus", case 2 in Fig 5). The fit using Eq. 5 performs significantly better than the Gaussian fit. Additional yet smaller improvements are obtained when fitting for each observation separately (labels "InvG" and "Gaus", case 3 in Fig 5) and when fitting for each observation and time separately (labels "InvG" and "Gaus", case 4 in Fig 5).

Error dependency on the time and the observation 340
In this subsection, it is assessed how the fitted uncertainty parameters vary among different observations and different times.
The interplay of the three uncertainty parameters s,ᾱ andβ of the inverse gamma function make it less feasible to directly estimate any effect. Therefore, only the fitted standard deviations of the Gaussian distribution are considered here.
The fitted standard deviation for each observation (averaged over all times) is shown in Fig. 6 (left). The standard deviation varies significantly between the different observations -up to a factor of 2 between the third and the eleventh observation. This

345
can alter the posterior since the uncertainty scales how deviations between the simulated and observed activity concentration are penalized. While it appears difficult to assign meaningful observation-specific a priori uncertainty parameters, the ensemble can readily provide such information.
The fitted standard deviation for each time (averaged over all observations) is shown in Fig. 6 (right). It can be seen that the model uncertainty grows when going backwards in time. The growth is about 30 % during a three day period (note that such 350 growth does not have to be constant over time, and an in-depth assessment is out of the scope of this paper). Also interesting to note is that there is an oscillatory behaviour with a period of eight time steps, corresponding to the diurnal cycle (since q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q  Figure 5. Box plots of the overlap between the ensemble densities and the fitted densities using uncertainty parameters obtained in four ways (case 1 to 4, see Subsection 5.2 for details) and using the distribution in Eq. 5 ("InvG") and the Gaussian distribution ("Gaus") for fitting. SRS fields were produced every three hours). The oscillations are likely associated with boundary layer processes, which often follow the diurnal cycle.

355
In this subsection, the Bayesian source reconstruction is applied using fitted observation-specific parameters s i ,ᾱ i andβ i (this corresponds to case 3, "invG" in Fig. 5). Furthermore, the inference is run using the SRS ensemble median for each geotemporal grid box -before, the unperturbed ensemble member was used. The resulting probability map is shown in Fig. 7 (a). While the source location probability map is distinct from (most of) the individual panels shown in Fig. 3, it is not too different from what Fig. 3 as a whole suggests. It best resembles case (f) of Fig. 3, which corresponds to the combination of multipliers 360 with large model uncertainty. Note that simply changing the a priori uncertainty parameters might never yield a near-perfect correspondence with Fig. 7 (a), since the fitting resulted in different uncertainty parameters for each observation, which seems not possible to obtain without an ensemble.
For comparison, the average probability of the six panels in Fig. 3 is shown in Fig. 7 (b). The latter is perhaps the best approach one can take in the absence of an ensemble. While both maps roughly agree at first glance, there are still important 365 differences. Foremost, the most south-west mode out of two modes in Fig. 7 (b) is absent in Fig. 7 (a). The other mode in Fig. 7 (b) is slightly shifted to the west in Fig. 7 (a), and extends further north-east.
In this section, it is assessed whether additional information can be acquired by considering each ensemble member as an independent scenario, thus performing the Bayesian source reconstruction for each ensemble member separately. No fitting of 370 uncertainty parameters is applied here, so that these need to be set a priori. The experiment is performed twice, once using s i = 0.5 and once using s i = 3.0. The other two uncertainty parameters remain fixed (ᾱ = 1/π andβ = 1).

Source location probability maps
The Bayesian inference is repeated using the SRS of each ensemble member, so that 51 different posteriors are obtained. These posteriors then need to be aggregated in some way. As before, we focus on the source location probability. While several 375 aggregation methods are possible, here the grid box-wise mean and maximum probability is taken (normalisation is required to ensure that the probabilities sum up to 1). Equal weights were assigned to each ensemble member, since our ensemble is constructed to yield equally likely scenarios or ensemble members. Note that in the case of multi-model ensembles, the latter might not be true, so that a weighting should be applied based on the skill of each model. Fig. 8 (a, b) shows the results using the unperturbed member only, and using s i = 0.5 and s i = 3.0 -hence, these are identical 380 to panels (c) and (e) in Fig. 3. As discussed earlier in Section 4, the source location probability map differs significantly when changing the parameter s.  Fig. 8 (e, f), which shows the grid box-wise ensemble maximum. As one can expect, the resulting source location probability is slightly broader than the one obtained by taking the grid box mean probability; however, the differences are minor.
It seems that overall, a similar picture is obtained when running the Bayesian inference for each ensemble member separately, compared to the procedure explained in Section 5. This suggests that if we use the ensemble only (i) to fit the uncertainty 390 parameters and (ii) to calculate the ensemble median SRS for running the inference as was done in order to obtain Fig. 7, no crucial information from the ensemble is lost with respect to the source location.

Ensemble convergence
We finally perform a brief assessment on whether or not each ensemble member is adding new information to the ensemble mean source location probability. For each perturbed member m ∈ [1, 50], the overlap in source location probability is calcu-395 lated between that member and the mean of all previous members [0, m − 1] (with member 0 the unperturbed member). We thus start with the first perturbed member (which is compared with simply the unperturbed member) and end with the last perturbed member (which is compared with the mean of all other ensemble members). The results are shown in Fig. 9 for an uncertainty parameter s = 0.5 and an uncertainty parameter s = 3.0. Note that the overlap in density is calculated using Eq. 21 and has the same meaning as before, with 0 denoting no overlap (thus being fully informative) and 1 denoting full overlap 400 (providing no new information). Fig. 9 shows that most ensemble members have an overlap in density less than 0.6. There is significant variance in how much new information is added by each ensemble member. A linear fit suggests that the added information from additional ensemble members is slowly decreasing as expected. Note that the effect is more pronounced for the case with higher model uncertainty ( Fig. 9, right). The reason for this is that the source location probability is more spread out due to the higher model uncertainty, 405 making an overlap more likely. We conclude that all ensemble members are adding new information. This is desirable, as it shows that the ensemble is well-constructed (if members are generally not adding new information, they could be a waste of computational resources).

Conclusions
Model error has a huge impact on the posterior obtained through Bayesian source reconstruction, a conclusion in agreement 410 with other studies (e.g., Yee et al., 2014). Specifically for the source location, an increase in the scale of the model error resulted in a non-uniform broadening and a shift in the source location probability. A linear fit is also plotted (solid black line).
Both the non-uniformity of the broadening and the shift in the source location probability imply that one cannot simply predict beforehand what the result will be when the Bayesian source reconstruction is repeated with different model error.
In the absence of a way to determine the model error, one could perform multiple Bayesian source reconstructions using 415 different model error formulations as shown in Fig. 3. The results could be aggregated by taking the average (as in Fig 7, b) or by using more elaborate procedures (including a weighted mean).
Multipliers can be used to represent model error (as in Yee et al., 2014), or to represent the unknown part of model error as was done in Subsection 4.3. The multipliers result in a shift in the source location probability, but not in a broadening.
We found that the ensemble members of source-receptor-sensitivities are distributed around their ensemble median (Fig. 4) 420 in a way that can be well-described by Eq. 5; the latter describes how the true activity concentration c true are assumed to be distributed around the modelled activity concentration c mod . Therefore, an ensemble of atmospheric transport and dispersion simulations can be employed to determine the parameters associated with the inverse gamma function in Eq. 16. Of course, the effectiveness of such approach largely depends on whether the ensemble is capable to represent model error.
The ensemble showed that model error varies among different observations (up to a factor 2 in the standard deviation when 425 fitting a Gaussian distribution). Therefore, it is expected that having available model error information which is observationspecific can improve the quality of the Bayesian source reconstruction. The model error is also shown to increase when going further backward in time (for this specific case, there was an increase of 30 % during a three day period in the standard deviation when fitting a Gaussian distribution).
The source location probability using the fitted model error obtained from the ensemble (Fig. 7a) is distinct from the source 430 location probability obtained using fixed uncertainty parameters (individual panels in Fig. 3); however, it is not too different 20 https://doi.org/10.5194/gmd-2020-162 Preprint. Discussion started: 23 September 2020 c Author(s) 2020. CC BY 4.0 License. from what Fig. 3 as a whole suggests, which demonstrates the usefulness of performing multiple Bayesian source reconstructions using different model error formulations as a sensitivity analysis in the absence of an ensemble.
A scenario-based approach (where each ensemble member is used as input for the Bayesian source reconstruction, instead of using the ensemble to fit the uncertainty parameters) gives results which are more robust against the choice of the uncertainty 435 parameters but is more costly compared to directly fitting the uncertainty parameters. No new information is obtained for the source location probability (or stated differently: one does not loose information when using the ensemble only to fit the uncertainty parameters and to calculate the ensemble median for use in the Bayesian inference). The scenario-based approach might be best in case of a small multi-model ensemble, since the fitting of uncertainty parameters might be difficult due to the difference in skill of each ensemble member.

440
Code and data availability. The Flexpart model that was used to generate the SRS data is open source and is available for download (Flexpart, 2020). The SRS data from the Flexpart model is available on Zenodo (De Meutter and Delcloo, 2020). The Bayesian inference tool will be made available upon request.
Author contributions. All authors contributed to the conceptualization of the study. PDM conducted the simulations and performed the analysis. IH and KU supervised the research. All authors contributed to the manuscript. IH took care of the project administration.