A Robust Method for Inverse Transport Modeling of Atmospheric Emissions Using Blind Outlier Detection

Emissions of harmful substances into the atmosphere are a serious environmental concern. In order to understand and predict their effects, it is necessary to estimate the exact quantity and timing of the emissions from sensor measurements taken at different locations. There are a number of methods for solving this problem. However, these existing methods assume Gaussian additive errors, making them extremely sensitive to outlier measurements. We first show that the errors in real-world measurement data sets come from a heavy-tailed distribution, i.e., include outliers. Hence, we propose robustifying the existing inverse methods by adding a blind outlier-detection algorithm. The improved performance of our method is demonstrated on a real data set and compared to previously proposed methods. For the blind outlier detection, we first use an existing algorithm , RANSAC, and then propose a modification called TRANSAC, which provides a further performance improvement .


Motivation
Emissions of harmful substances into the atmosphere occur all the time.Examples include nuclear power plant accidents, volcano eruptions, and releases of greenhouse gases.However, these emissions are difficult to quantify.Depending on the scenario, measurement networks on scales from local to global may be needed.A robust technical framework to estimate the emissions properly from such measurements is also necessary.
This technical framework consists of three elements: measurements, atmospheric dispersion models, and inverse 30 methods tailored to this specific linear inverse problem.
There has been a clear effort in deploying more reliable, precise, and extended sensor networks (CTBTO, 2014).Also, there has been an evident development of precise atmospherical dispersion models (Holmes and Morawska, 2006).

35
However, inverse methods are still in a relatively early stage of development.
These inverse methods are technically complex, and require a multidisciplinary approach; collaboration among researchers from different fields is necessary for further ad-40 vances.

Related work
Atmospheric dispersion models such as Eulerian or Lagrangian particle dispersion models (LPDMs) (Zannetti, 1990) allow us to relate the source to the measurements in 45 a linear way: where y is the measurement vector, x is the source term, Ā is the transport matrix, and n is the measurement error.
LPDMs have some advantages with respect to the Eulerian 50 ones: they can have infinite temporal and spatial resolution; they avoid the artificial initial diffusion of a point source in the corresponding cell and the advection numerical errors; and they are computationally more efficient (Zannetti, 1990).
There are only a few freely available, open source im-55 plementations of LPDMs.FLEXPART (Stohl et al., 2005) is one of them.It has been used and validated in a large number of studies on long-range atmospheric transport (Stohl et al., 1998).Here we use it to derive A, which is an estimate of the true transport matrix Ā.
It is clear from (1) that estimating the source means solving a linear inverse problem.Most environmental scientists use a least-squares approach with the Tikhonov (ℓ 2 -norm) regularization, or variants of this method, to recover an estimate x of the source: where λ ≥ 0 is the regularization parameter.For example, in Seibert (2001), the Tikhonov regularization is combined with a smooth first derivative constraint: (3) Also, an a priori solution x a can be introduced to the Tikhonov regularization such as in (Stohl et al., 2012): In Winiarek et al. (2012), the Tikhonov regularization is used with a non-negative constraint.A slightly different approach is the use of a sparsity constraint together with a nonnegative constraint as in Martinez-Camara et al. (2013).Yet another point of view is given in Bocquet (2007), where both the source and the error distributions are estimated at the same time.
All these approaches minimize the energy of the disagreement between the model and the observations, while at the same time keeping the energy of the solution in check.While this is a reasonable approach, no metrics of real performance are (or can be) given in most of these studies, simply because no knowledge of the ground truth is available.This fact made it impossible to evaluate the true performance of any of these approaches.
However, a few controlled tracer experiments have been performed -the most important ones in Europe and in the US.They present exceptional opportunities to study model and measurement errors, as well as to develop and test the various source recovery algorithms.
The European Tracer EXperiment (ETEX) (Nodop et al., 1998) was established to evaluate the validity of long-range transport models.Perfluorocarbon (PFC) tracers were released into the atmosphere in Monterfil, Brittany, in 1994.Air samples were taken at 168 stations in 17 European countries for 72 hours after the release.The data collected in the ETEX experiment and the correspondent matrix estimated by FLEXPART are used for several purposes in this paper.We will refer to this data as the ETEX dataset.In every inverse problem, a time window must be defined, during which the activity of the source is to be recovered.In this particular case, we define a window of 5 days (although we in fact know that the ETEX emissions took place over only 12 hours) or 5 * 24 = 120 hours.Since the time resolution is 1 hour, we have 120 unknowns in the system.

Contributions
In this paper we show that the errors present in a source 110 estimation problem come from a heavy-tailed distribution, which implies the presence of outliers in the measurement dataset.Typical source estimation algorithms like (2) assume Gaussian additive errors (Rousseeuw and Leroy (1987)).This incorrect assumption makes them highly sensi-115 tive to outliers.In fact, if the outliers are removed, the source estimation using (2) improves substantially.
Hence, we propose to combine (2) with algorithms to detect and remove outliers blindly, i,e.without any knowledge of the ground truth.First we use a well-known algo-120 rithm for this task, RANdom SAmple Consensus (RANSAC) (Fischler and Bolles, 1981), and study its performance.Next, we propose a new algorithm which overcomes some of the weaknesses of RANSAC, and test its performance.The efficiency of both algorithms is demonstrated on a real-world 125 dataset, and their performance is evaluated and compared to other existing methods.
Our presented algorithm is generic, in the sense that it is suitable for all classes of input signals.Of the four key elements that constitute our algorithm -the least squares term, 130 the regularization, the outlier detection, and voting -only the regularization is affected by the type of input signal.We chose to use the regularizations given in (2) and (3) because they are the most generic, and are known to apply relatively well to a broad range of realistic signals (impulse, continu-135 ous, piece-wise constant, sparse, etc.).As always, improved performance can be achieved when the structure of the signal is known, by using an appropriate, more specific regularization suited to that structure.Our approach is in fact independent of the regularization that is used, and is applicable to 140 any regularization found in the literature.

Non-Gaussian noise
Given A, the estimate of the transport matrix produced by FLEXPART, the forward model (1) now becomes 145 where e is an additive error term that encompasses both the model and measurement errors.
In the ETEX experiment we have access to the measurements y, the true source x, and the estimated transport matrix A. This permits us to study the errors e.Let us model 150 the components e i of the vector e as random and independent and identically distributed.Some degree of correlation may exist among the errors, but this correlation is unknown.Thus it cannot be considered in the problem.We can approximate the empirical probability distribution of e by plotting 155 the histogram of the elements e i .
Figure 1 shows graphically that the error has a heavy-tailed distribution.The distribution clearly deviates from a Gaus-sian one.This is confirmed by calculating the excess kurtosis of the sample distribution.The value of g = 123.64indicates that the underlying distribution is strongly super-Gaussian.
Using the ℓ 2 norm in the loss function in ( 2) is optimal when the additive errors are Gaussian -which is not our case.Even worse, this loss function is very sensitive to outliers, just like the ones present in the heavy-tailed distribution shown in Figure 1.Hence, the performance of (2) and its variants could be improved by additional processing, aimed at removing and/or marginalizing the outliers.In the present paper we propose and demonstrate a novel scheme for this additional processing.

Outlier detection
Imagine that we have an oracle which reveals to us the measurements corresponding to the largest errors (i.e. the outliers).If we remove these measurements from the dataset, the performance of (2) in terms of the reconstruction error or mean square error (MSE) improves significantly. 1 In order to illustrate this, we remove the measurements associated with the largest errors (sorted by magnitude) and observe the effect on the MSE. Figure 2 shows how the MSE decreases as more and more outliers are removed.Some oscillations may occur due to outlier compensation effects.
However, in a real-world problem, we do not have such an oracle.The question becomes: how could one locate the outliers blindly?

RANSAC
One of the simplest and most popular algorithms to localize outliers blindly is RANSAC.RANSAC has been widely and successfully used, mainly by the computer vision community (Stewart (1999)).Figure 3 illustrates the operation of RANSAC, and Algorithm 1 describes it in pseudocode.
Given a dataset y with m measurements, select randomly a subset y ′ containing p measurements.Typically, n < p < m, where n is the number of unknowns in the problem.In Figure 3, m = 8 and p = 2, and the subset is shown in red diamonds.Using (2) and y ′ , estimate x, and then compute the residual r = Ax−y.Now we can count how many of the original samples are inliers.For a given tolerance η, the set of inliers is defined as Repeat this process N times and declare the final solution x * to be that estimate x which produced the most inliers.In Figure 3, N = 2.
Note that other regularizations can be used instead of (2).Here we use the Tikhonov regularization because it is simple, general, and most other existing approaches are based on it.Nevertheless, if some properties of the source are known a priori (e.g., sparsity or smoothness), this step of the algorithm can be adapted accordingly.
At the stage where the N possible solutions x have been generated, what RANSAC actually tries to do is to select the solution x * with the smallest MSE.However, in a real world 210 problem the ground-truth is unknown, so we do not have access to the MSE itself.So, as mentioned above, RANSAC overcomes this difficulty by using an indirect metric of the MSE: it assumes that the number of inliers is inversely proportional to the MSE. Figure 3 depicts the intuition behind 215 this in a simple 1-D problem: the superior solution (subset 1) produces more inliers than the inferior solution (subset 2).Thus, RANSAC maximizes the number of inliers, in hopes that this also minimizes the estimation error.
As we will see in the following sections, if the optimal 220 value for the threshold parameter η is known and used, using RANSAC as a pre-processing stage for outlier removal before applying (2) significantly improves the overall performance (compared to using only (2) with no outlier removal pre-processing).Unfortunately, the performance of

225
RANSAC depends strongly on the parameter η, and finding the optimal value of η is an open problem.Furthermore, the assumed inverse proportionality between the number of inliers and the MSE does not always hold in the presence of critical measurements.This is the case in the 230 ETEX dataset, as we can see in Figure 4(a).

Critical measurements
We identify critical measurements as those which have the largest influence in the source estimation process.A quantitative measure of this influence is the Cook's distance (Cook, 235 1977).Figure 5 shows the Cook's distance of the ETEX measurements.It is easy to observe the peak that identifies the critical measurements.
Let us consider again the ETEX dataset, the set of N solutions x that RANSAC generates, and their corresponding 240 residuals r.It is interesting to note that the solutions x with most inliers (the superior solutions according to RANSAC) have high residuals at exactly the critical measurements.This is shown in Figure 6.In other words, by considering the critical measurements as outliers, these solutions achieve more 245 inliers.
RANSAC assumes that all the measurements have the same influence: it just wants to maximize the number of inliers, and does not care about which exact measurements are the inliers.This is why it fails in this case and the inverse 250 proportionality between the number of inliers and the MSE does not hold.
In summary, RANSAC operates reliably when all the measurements are of similar importance, because the inverse proportionality between MSE and the number of inliers holds.

255
However, when critical measurements are present, this proportionality does not hold, and RANSAC fails.

TRANSAC
In order to avoid the weakness of the standard RANSAC algorithm, we propose an alternative indirect metric to discriminate solutions with small MSE: the total residual ǫ = Ax − y 2 .By replacing the number of inliers by the total residual metric, we create the first step of the Total residual RANdom SAmple Consensus (TRANSAC) algorithm.The second step consists in a voting stage.Both are described in Algorithm 2 in pseudocode.
The total residual is directly proportional to the MSE of reconstruction.Unlike the number of inliers, this proportionality is conserved also when critical measurements are present in the dataset (Figures 4(c) and 4(d)).In a real-life problem, where we do not have access to the ground truth, we do not know if critical measurements are present.Hence, we need a robust algorithm like TRANSAC.In addition, TRANSAC does not depend on the threshold η.
The proportionality between the total residual and the reconstruction error is not perfect, as we can see in the scatter plot of Figure 4(d).Even if a candidate solution has the smallest total residual, it is not guaranteed to be the solution with the smallest MSE.The intention of the voting stage is, using the candidate solutions with a total residual under a certain threshold, to come up with the best possible final solution.
Intuitively, the solutions with the smallest total residual (i.e., smallest MSE) are generated using almost outlier-free random subsets of measurements y ′ .We refer to these as the good subsets.Outliers can appear sporadicly in some of these good subsets, but the same outlier is extremely unlikely to appear in all of them.Hence, in the voting stage we select the measurements that all the good subsets have in common, or in other words, exclude any measurements that appear very infrequently.
Thus, we first select the subsets y ′ associated with candidate solutions with a total residual smaller than a certain threshold, ǫ < β.Then, for each measurement we count how many times it appears in these good subsets.Finally, we select the M measurements with the largest frequency of occurrence.

TRANSAC
We now perform two experiments to demonstrate various aspects of TRANSAC.

Sanity check
In Section 3.3 we confirmed the expected behaviour of the first stage of TRANSAC: we showed that the total residual is directly proportional to the MSE.Let us check now the second stage, the voting.To do so, let us suppose that during the voting we have access to the MSE of every candidate solution x.Then, we would of course select the solutions which in fact have the smallest MSE, and use them to build the histogram.We run this modified TRANSAC with the dataset 310 without critical measurements.Figure 7 We can observe that the MSE of the solution increases as M increases.This is to be expected: as M grows, more outliers are included in the dataset that is used to obtain x * , and its MSE increases.We note that the results curve is nondecreasing, because in this particular experiment we have ac-325 cess to the MSE and the histogram h is built from the actual best candidate solutions.

Actual ETEX
In this subsection, the performance of the complete TRANSAC algorithm is examined.Let us consider first the 330 dataset without critical measurements.As in the sanity check above, TRANSAC is run for different values of M .The results are shown in Figure 7(b).We observe that the MSE increases as M increases, as before, and the maximum MSE still occurs at M = m.This is reassuring: even if we do not 335 find the optimal value for the parameter M , we will improve the solution (with respect to using only the Tikhonov regularization) by taking any n < M < m.Notice that the minimum MSE again occurs when M = n.
Let us examine now the whole dataset, including the criti-340 cal measurements.Figure 7(c) shows the results.We can observe that, again, the maximum MSE occurs at M = m.On the other hand, the minimum MSE does not occur at n, but rather at M = 330.Also, although the exact performance of the algorithm varies with the value chosen for the parame-345 ter β, as shown in Figure 8, we note that for practically any value of β there is an improvement in performance.These results show that TRANSAC clearly improves the performance of the Tikhonov regularization in both cases: with and without critical measurements.

Outlier removal
As explained in Section 3, RANSAC and TRANSAC are blind outlier detection algorithms that can be combined with different regularizations in order to improve their results.In this section we combine RANSAC and TRANSAC with two different regularizations previously used in the literature, (2), and (3), and study their performance.As before, we use the ETEX dataset with and without the critical measurements.The results are shown in Figure 9.It is important to note that all these results were generated using the optimal values of all the parameters (λ, η, β, M ) that were found experimentally.Figure 10 gives a more qualitative assessment of the results.
First, with and without critical measurements, the outlier removal stage improves the performance of both regularizations.Hence, our idea of removing outliers, outlined in Section 2, indeed does lead to improved performance, regardless of critical measurements or type of regularization.Next, in all cases TRANSAC shows higher performance than RANSAC.Therefore, our proposed modification of the metric, and the addition of the voting stage result in improved performance, as expected.

Conclusions
In this work we showed that the additive errors present in the ETEX dataset come from a heavy-tailed distribution.This implies the presence of outliers.Existing source estimation algorithms typically assume Gaussian additive errors.This assumption makes such existing algorithms highly sensitive to outliers.We showed that, if the outliers are removed from the dataset, the estimation given by these algorithms improves substantially.
However, in a real life problem, we do not know which of the measurements are outliers.Hence, we do have to remove them in a blind fashion.For this purpose we proposed RANSAC, a well-known blind outlier detection algorithm.
We then showed that RANSAC unfortunately strongly depends on the chosen tolerance parameter, and it is sensitive to critical measurements.To overcome these difficulties, we created TRANSAC, a modification of RANSAC which also includes a voting stage.
To demonstrate the efficiency of these methods in a realworld problem, we used the ETEX tracer experiment dataset.The source was recovered first with two previously proposed source estimation algorithms that assume Gaussian additive errors (2), (3).Then it was recovered again with our algorithms that use RANSAC and TRANSAC.The results clearly display how the source estimation improves if an outlier detection algorithm is used.They also show that the performance of our proposed algorithm TRANSAC clearly exceeds the performance of RANSAC in every case.

Subset #1
Subset #2 Step 1 Step 2 Steps 3&4 t t Figure 3. Visual representation of the functioning of RANSAC.Subset 1 and 2 represent two RANSAC iterations.The subset of measurements selected by RANSAC in each iteration is represented with red diamonds.Subset 1 contains one outlier.Hence, the solution corresponding with this subset generates fewer inliers than subset 2, which is free of outliers.Clearly, the residual corresponding to these two measurements is much larger than the rest.The red peaks corresponds to the residual produced by the solution x with the smallest MSE in 4(a). .Source reconstructions given by the different algorithms.The plots on the left were generated combining (2) with RANSAC and TRANSAC.The plots on the right were generated combining (3) with TRANSAC and RANSAC.The plots on the top were generated using the ETEX dataset without critical measurements.The plots on the bottom were generated using the whole ETEX dataset.
(a)  shows the MSE obtained for different values of the parameter M.The dashed line on the right indicates the maximum possible value of M , such that M = m, which corresponds to using the whole measurement dataset.The dashed line on 315 the left indicates the minimum possible value, M = n, and corresponds to using as many measurements as unknowns.The red horizontal line indicates the MSE of the solution obtained by using just the Tikhonov regularization without TRANSAC, i.e., when M = m. 350

Figure 1 .Figure 2 .
Figure 1.Histogram of the additive error e.For clarity, the zeroerror bin has been omitted here.

Figure 4 .Figure 5 .
Figure 4. Performance of RANSAC and TRANSAC.(a) and (b) show graphically the correlation between MSE of reconstruction and the number of inliers.(c) and (d) show graphically the correlation between MSE of reconstruction and the total residual.To build (a) and (c) the complete dataset was used, to build (b) and (d) the dataset without critical measurements was used.The diamond indicates the solution obtained by the traditional Tikhonov regularization in (2), the star indicates the solution chosen by TRANSAC before the voting stage, the square indicates the final solution of TRANSAC, and the hexagon the solution chosen by RANSAC.

Figure 6 .
Figure 6.Residuals of two different source estimations: The blue peaks correspond to the residual produced by the solution x with the largest number of inliers in Figure 4(a).The black arrows on the top indicate where the two most critical measurements are localised.Clearly, the residual corresponding to these two measurements is much larger than the rest.The red peaks corresponds to the residual produced by the solution x with the smallest MSE in 4(a).

Figure 7 .Figure 8 .Figure 9 .
Figure 7. Performance of TRANSAC combined with Tikhonov regularization.In the three plots, the red dashed line indicates the estimation error given by typical Tikhonov (2).The dashed line on the right indicates M = m, the one on the left indicates M = n.Plot (a) shows the results of the sanity check.As the selected number of measurements M increases, the MSE of the estimation decreases.Notice that the maximum MSE corresponds with M = m.Plot (b) shows the results of applying TRANSAC to the ETEX dataset without critical measurements.Again, the MSE increases in general with M, and the maximum MSE appears in M = m.Plot (c) shows the results of applying TRANSAC to the whole ETEX dataset, critical measurements included.In this case the MSE not always increases with M, but the maximum MSE still corresponds with M = m.