Many applications in atmospheric science involve ill-posed inverse problems. A crucial component of many inverse problems is the proper formulation of a priori knowledge about the unknown parameters. In most cases, this knowledge is expressed as a Gaussian prior. This formulation often performs well at capturing smoothed, large-scale processes but is often ill equipped to capture localized structures like large point sources or localized hot spots.

Over the last decade, scientists from a diverse array of applied mathematics
and engineering fields have developed sparse reconstruction techniques to
identify localized structures. In this study, we present a new regularization
approach for ill-posed inverse problems in atmospheric science. It is based
on Tikhonov regularization with sparsity constraint and allows bounds on the
parameters. We enforce sparsity using a dictionary representation system. We
analyze its performance in an atmospheric inverse modeling scenario by
estimating anthropogenic US methane (CH

Different measures indicate that our sparse reconstruction approach is better able to capture large point sources or localized hot spots than other methods commonly used in atmospheric inversions. It captures the overall signal equally well but adds details on the grid scale. This feature can be of value for any inverse problem with point or spatially discrete sources. We show an example for source estimation of synthetic methane emissions from the Barnett shale formation.

Inverse problems are widespread in atmospheric sciences. The estimation of
greenhouse gas sources and sinks is a prime example. Numerous studies combine
observations of greenhouse gas concentrations in the atmosphere and inverse
modeling to infer sources and sinks at the Earth's surface. Existing studies
apply these techniques at municipal

In almost all cases, these parameter estimation problems are ill posed. “Ill posed” means that small noise on the measurements can be amplified by the inversion, leading to unrealistic estimates. Thus, special techniques are required for a stable inversion.

A Bayesian inversion is a common tool in atmospheric sciences that can handle
the ill-posed nature of these problems

A classical approach is the use of a Gaussian prior, which often allows rapid
calculations via analytical expressions. However, the Gaussian prior is known
to return a best estimate that is a smoothed version of the true solution

Other research areas solve inverse problems using Tikhonov regularization. Tikhonov regularization is formulated as an optimization problem. The functional to be minimized consists of a data fitting term and a penalty term that prevents overfitting. The classical choice of these terms is analogous to a Bayesian inversion with a Gaussian prior.

Recently, Tikhonov regularization with sparsity constraint has become a
popular alternative to these classical inverse methods within a number of
engineering fields. Several recent studies apply the approach to a variety of
applications, including medical imaging, signal analysis, and compressed sensing

Only a handful of studies apply these modern inversion techniques to
atmospheric sciences.

The goal of this paper is to show how sparse reconstruction techniques can
improve flux estimates in an atmospheric inverse modeling scenario. We use a
synthetic case study from

The present study is organized as follows: first, we briefly introduce the
atmospheric inverse modeling problem in Sect.

Additional graphics, source code, and a pseudocode of the sparse dictionary reconstruction method are included in the Supplement.

Existing studies employ a number of different techniques to quantify
greenhouse gas surface fluxes

We use the WRF-STILT (Weather Research and Forecasting – Stochastic
Time-Inverted Lagrangian Transport) model

For a given emission field,

In this section we provide some mathematical background for inverse problems
and how the approach developed in this article is related to commonly used
inverse methods. We formulate the AIM problem as a parameter optimization
problem, which is based on norm notation. Thus, we define

Inverse problems arise when the quantity of interest cannot be measured
directly. Instead, another quantity

The problem of finding parameters that best explain noisy measurements

The inversion using Eq. (

If more detailed noise characteristics are known, these can be introduced by
adaptation of the data fitting term. In case of Gaussian noise, penalized,
weighted least squares

We solve the inverse problem using Tikhonov regularization,
Eq. (

Sometimes it can be useful to penalize the components of the parameter
vector

A large number of methods are available to solve optimization problems of the
type

Localized structures like point sources or edges in the true solution

The sparsity constraint has become very popular for regularization of inverse
problems over the last decade. The 1-norm is used to constrain parameters
instead of taking the 2-norm as a penalty function. This results in the
optimization problem

Flux fields with sporadic local hot spots are a prime example of sparse
signals and sparse reconstruction is well suited to identifying such sparse
but nonsmooth signals. However, if the true solution

The field of signal and image processing offers a variety of transforms
designed for sparse representation of oscillations, localized signals, edges,
and the like. Options include regular basis transforms, Fourier transforms,
wavelets, shearlets, and curvelets

Illustration of 2-norm and 1-norm regularization for an
underdetermined problem

A dictionary is a collection of

Consider an example for

We assume that the estimated state in the AIM problem can be sparsely
represented in a given dictionary

The previous section formulates the AIM problem as an optimization problem
using Tikhonov functionals. In the following paragraphs, we focus on
efficient methods to solve problems (Eqs.

Henceforth, we only consider linear forward models

For linear forward models the classical Tikhonov functional,
Eq. (

For the sparse reconstruction problem, Eq. (

The sparse dictionary reconstruction problem
(Eq.

Some problems require a bound on the parameter space:

The iterative scheme of the sparse dictionary reconstruction creates a sparse
representation

The projection step for the dictionary is difficult because the
dictionary

The methods presented in this paper are formulated as Tikhonov regularizations. The inverse modeling community may be more familiar with the statistical formulation, namely Bayesian inverse modeling. In the following section, we briefly describe how both formulations overlap.

In a Bayesian inverse modeling setup, noise and unknown parameters are assumed to be realizations of known probability distributions. Given these distributions and the forward model, Bayes' theorem is used to infer the a posteriori distribution. The maximizer of the posterior probability density function, called the maximum a posteriori solution, is often presented as a best estimate. Further evaluation of the posterior distribution also yields uncertainty bounds for the estimate.

We previously explained that covariance matrices for the noise or prior
translate into weighting matrices for the norms in the Tikhonov formulation
(see Sects.

Inversions with non-Gaussian priors, like the Laplacian in
Eqs. (

To judge the quality of an estimate, it is necessary to know the uncertainty
associated with each estimated parameter. For Bayesian methods, these
uncertainties and the best estimate are deduced from samples of the posterior
distribution if no analytical expressions exist. For the Tikhonov methods
used in this work, uncertainty estimates are an extra calculation performed
after the retrieval of a best estimate. In this section, we present an
uncertainty analysis for Tikhonov methods based on

We call the true parameters

The exact total, smoothing, and measurement error can be calculated from the
true solution,

In real data problems, the error terms in Eqs. (

To find such bounds for the smoothing error, Bayesian methods make additional
assumptions about the true solution by applying so-called a priori knowledge.
Comparable source conditions also exist for Tikhonov methods

Without applying a priori knowledge, the best one can do is to analyze the
sensitivity matrix

Even in real data problems, one often has access to the noise
characteristics. Uncertainty bounds for the measurement error can be
approximated via resampling of the noise and recalculation of the estimate
under this noise for a sufficient number of samples. The distribution of
estimates to different realizations of the noise yields the uncertainties
commonly expressed by standard deviations. If the noise characteristics are
unknown, resampling can be achieved by bootstrapping of the residual of the
estimate

We apply the sparse dictionary reconstruction method in an atmospheric
inverse modeling setup. We use anthropogenic methane emissions in the US as a
synthetic case study, the same case study used in

We estimate emissions for the North American mainland (25–55

Available in situ methane measurements for May to September 2008.

Synthetic methane emissions are generated from the Emission Database for
Global Atmospheric Research (EDGAR). We project anthropogenic methane
emissions from the EDGAR v3.2 FT2000 inventory

US methane emissions from EDGAR v3.2 FT2000 at the native 1

The simulated noisy measurements are calculated by applying the linear
WRF-STILT forward model to the EDGAR fluxes and adding Gaussian noise of
realistic magnitude. The noise vector is sampled from the multivariate
Gaussian distribution with a diagonal covariance matrix.

The fluxes are temporally constant in the inversion setup here, so each of
the 1469 land grid cells has only one unknown emission parameter
(

We start by comparing Tikhonov regularization with the classical 2-norm
penalty

For a fixed

We use FISTA, which is an accelerated version of ISTA (see
Sect.

Emission estimates using Tikhonov regularization with the classical
2-norm penalty (see Eq. L2) and the sparsifying 1-norm penalty (see Eq. L1)
inverted from noisy simulated methane measurements. The true flux field is
shown in Fig.

Figure

The estimates differ (Fig.

Often, the sparse emission field better estimates large sources such as those
from major cities (see, e.g., Salt Lake City emissions at 111

The histogram of the EDGAR fluxes in Fig.

Based on these preliminary results we conclude that the estimate using L2 is closer to the true EDGAR emissions than L1, but the estimate is not satisfying for the reconstruction of large emitters due to the smoothing effect. Also, this methane emissions case study is not suitable for sparse reconstruction in the standard representation system.

Normalized histogram of randomly signed EDGAR fluxes. The histogram
data have been used to estimate the parameters of corresponding Gaussian and
Laplacian probability density functions. Note that only 10 grid cells have
emissions larger than 0.1

The preliminary results in the previous section show that the classical

We employ a dictionary to achieve this goal. We therefore need to select
atoms such that the dictionary can sparsely approximate all methane emission
patterns. Efficient dictionaries can be created using learning algorithms,
but a set of training data is required

The 1

A selection of atoms from the dictionary used for the sparse dictionary reconstruction method. These atoms are scaled to represent the state vector via linear combination. The left and middle elements are the basic shapes centered in each grid cell of the domain. At coasts and lakes these shapes are limited to land grids. All atoms are normalized in the 2-norm. The dictionary chosen here also holds a constant background function.

Another option to sparsify the representation is to use atoms that cover a
large portion of the domain. A background is best represented by a constant
function. With the same argument we could add regional background functions.
We in fact find that a division into regions as shown in
Fig.

For our experiments, we decide not to include atoms that are constructed from
EDGAR or geostatistical data. We use a pixel basis, a basis with peaks that
extend into the direct neighbors (see Fig.

To estimate the flux parameters

For further analysis, we add a positivity constraint on the flux parameters,

By enforcing positive parameters, three different constraints determine the final estimate: positivity, data, and minimal norm. Often, these constraints may pull the estimate in different directions. The final estimate depends on the balance between them. Using a projection, positivity is always enforced. The emissions will also explain the given data up to the noise as long as Morozov's discrepancy principle is fulfilled. The most flexible constraint in this setup is thus the smoothness or sparsity assumption defined through the penalty term because it is the most uncertain of all constraints.

We carry out two types of analysis to measure the quality of the estimates.
First, we perform an uncertainty analysis based on knowledge about the noise
characteristics but without knowledge about the true fluxes, as would be the
case for many real data scenarios. As discussed in
Sect.

The smoothing error describes the error that results from stabilizing the
inversion. It can only be estimated if additional assumptions about the true
fluxes are made. Without such assumptions, the best one can do is to analyze
the sensitivity matrix (see Eq.

The measurement error shows the influence of the noise on the estimated
parameters. We estimate uncertainty bounds for the measurement error based on
1000 samples of noise as described in Sect.

We compare our approaches to state-of-the-art methods studied in

Standard inversion: this is a geostatistical approach following

Transform inversion: flux parameters are enforced to be positive by a power transformation

Lagrange multiplier method: positivity is enforced by formulating an
optimization problem with an inequality constraint, which is solved via the
Lagrangian function. As a deterministic method, no direct uncertainty
estimates are given, but they can be approximated using the approaches from
Sect.

Gibbs sampler: the Gibbs sampler belongs to the group of Markov chain Monte
Carlo (MCMC) methods. These methods can generate realizations of complicated
probability distributions such as the posterior distribution to non-Gaussian
priors in a Bayesian inversion framework. MCMC methods differ in the way
these realizations are calculated. One can estimate statistical quantities
such as mean and standard deviation given a sufficient number of
realizations. Positivity is formulated in the prior distribution. In theory,
one could implement one of several MCMC algorithms

In this section, we analyze the performance of our suggested sparse
dictionary reconstruction method L1 DIC POS in the AIM scenario described in
the previous section. First, we compare it with the methods L2 POS and L1 POS
and carry out an uncertainty analysis and an analysis of the exact errors.
Then, we include the methods from

Emission estimates from the methods L2 POS, L1 POS, and L1 DIC POS
inverted from noisy simulated methane measurements. The true flux field is
shown in Fig.

Figure

In contrast to L1 POS, the solution of L1 DIC POS does not look sparse, as
sparsity is enforced on the coefficients of the dictionary. The background
function in the dictionary is selected to represent a base level of small
emissions (not visible in the color map). It improves the inversion’s
ability to accurately estimate total US emissions. Regionally, other atoms
are added and subtracted from this background level. The smooth character of
the estimate in many regions is a result of the broader dictionary functions
(see Fig.

L1 DIC POS shows significant improvement in the estimate of localized sources against L2 POS (e.g., when looking at West Coast emissions or Salt Lake City), but a slight setback against L1 POS. Sometimes, these emission peaks might be misplaced (e.g., the San Francisco Bay area). Regions of significant emissions like in the Midwest are often reasonably well reconstructed, but the method still tends to spatially concentrate these sources. This localization property is a consequence of the sparsity constraint because the flux field is represented by as few atoms as possible.

We observe that the locations of significant sources agree much better with both sparsity methods L1 POS and L1 DIC POS than with the classical L2 approach. This result can be explained by the fact that the sparse schemes look for the dominant sources. Even if the magnitude is not captured exactly, the method L1 might be used in applications to identify the center of source locations.

As described in Sects.

The sensitivity matrix gives the best insight into the smoothing error
without knowledge of the true fluxes. Each column describes how an additional
pixel source would change the flux estimate. A perfect sensitivity matrix is
thus equal to the identity. The column sum indicates regions that are
overestimated or underestimated. This phenomenon can only be observed in
regions with small footprint information outside the main study area, namely
Florida, Mexico, and central and eastern Canada. The locations are similar
for all methods, but L1 POS is far more biased in those regions.
Table

The most valuable information is contained on the diagonal of the sensitivity
matrix, plotted in Fig.

The interpretation of the sensitivity matrix is slightly different for both
types of methods. If we excluded the positivity constraint, L2 (POS) would be
a linear method, meaning that the sensitivity matrix is independent of the
parameters and could be used to predict how additional sources would be
reconstructed using Eq. (

The measurement uncertainties describe the uncertainties in the estimate from
the noise. We approximate these uncertainties by resampling the noise. Two
standard deviations are plotted in Fig.

The sparse reconstruction method, L1 POS, has much larger measurement uncertainties, particularly in places where large emitters are estimated. On the other hand, the estimated uncertainties are small or even zero in regions where the sensitivity is small. This is a consequence of the thresholding algorithm. The method reacts with its active, nonzero parameters on small perturbations in the data. The set of these active parameters is only adapted by significant changes. Theoretically, uncertainties can be equal to zero, but it is likely that zero uncertainties are a result of a limited number of samples used for their calculation. It is important to keep in mind that these uncertainties only represent the effect from noise on the estimated parameters. Uncertainties for the smoothing error will be larger where the sensitivity is small.

L1 DIC POS looks more robust to noise than L1 POS. Similarly to L1 POS, there are large areas with negligible measurement uncertainties. The uncertainty correlates with the magnitude of the estimated emissions.

Figure

It is important to point out that it is misleading to look at either the smoothing or measurement error without the other. For ill-posed inverse problems, a small smoothing error comes at the expense of a larger measurement error and vice versa. A well-chosen regularization parameter balances both errors such that the total error is minimized. For our methods, we observe that the measurement error is always smaller, but the ratio is different for each method.

The measurement noise causes deviations in the estimated coefficients. This effect is described by the measurement error. For L2 POS, these deviations affect most parameters, whereas for L1 POS, the effect is larger but mostly limited to the active nonzero parameters. For L1 DIC POS, the noise affects the active atoms of the dictionary.

The smoothing error results from the stabilizing effect of the reconstruction methods. Because L2 POS aims at smooth emission fields, large pixel emissions are generally a combination of underestimation in that particular grid cell and overestimation in the vicinity. This smoothing effect does not manifest in the L1 POS (e.g., for the Salt Lake City emission). For L1 DIC POS smoothing can happen when spatially larger dictionary elements are selected.

While overestimation and underestimation are approximately equal for the
measurement error, the smoothing error indicates whether a method
overestimates or underestimates in general. Because of the zero a priori, all
methods are expected to underestimate, but only L1 POS significantly
underestimates (see also Table

L1 POS reduces the smoothing effect in some locations, but smoothing and measurement error are larger than for the other methods because too many small sources are suppressed. Sparse dictionary reconstruction has significantly less smoothing error and thus gives the best estimate of the EDGAR fluxes. These results do not rule out the L1 POS method in general, but they suggest that this particular case study is not naturally sparse. However, the dictionary representation is able to sparsify the signal and is thus well suited to these types of problems.

Total

In this section, we evaluate the estimates of our Tikhonov-based methods by
comparing them to the estimates of the methods studied in

We examine the reconstruction quality using several measures, each of which
focuses on aspects or qualities. These measures are the following.

The results for all estimates are listed in Table

In the local error measure, which compares grid cell by grid cell, L1 DIC POS and the transform inversion perform best. These methods come closer at addressing questions on the grid cell level, but errors are still too high for accurate answers. Reasonable estimates can only be made on a coarser scale by spatially integrating grid cells. The regional measure suggests that L2 POS and the Lagrange multiplier method also perform well on a coarser grid.

From a modeling perspective, the standard inversion is comparable to our
method L2, whereas the Lagrange multiplier method and Gibbs sampler include
positivity constraints and compare to L2 POS. The estimates show similar
features to our estimates for L2 and L2 POS (see the Supplement), namely
rather smooth emission estimates. The spatial correlation between parameters
used by

Our sparse dictionary reconstruction method and the transform inversion both estimate parameters in a different space, but the transforms are fundamentally different. For L1 DIC POS, the sparsity constraint and the dictionary with the pixel elements promote the estimation of pixel sources. For the transform inversion, the nonlinear mapping between coefficient space and parameter space allows larger pixel emissions than the smoothing methods.

Reconstruction errors measured on local, regional, and total scales
(see Eqs.

Regional EDGAR emissions and emission estimates for the methods
studied in

A common task is to determine the total emissions for a political or
geographic region. Thus, we divided the domain into 10 regions, mainly along
political borders. The flux estimates for these regions are shown in
Fig.

In a final scenario, we test the reconstruction quality of our methods for
methane emissions from unconventional gas wells. We choose the Barnett shale
formation in Texas because it had the highest production of any US reservoir
in the summer of 2008. We add a small synthetic source on top of the EDGAR
fluxes and simulate noisy measurements. The synthetic emissions are inspired
by the location of the formation and a recent map of well distribution

The plots in Fig.

We should add that this scenario is not designed to favor one of these methods. The source distribution cannot be represented by a single atom in the dictionary. However, if potential source shapes like the distribution of wells were available, the sparse dictionary method would benefit from such knowledge.

Additional methane sources in the Barnett shale gas reservoir (upper
left panel) are added to the EDGAR emissions (see
Fig.

Results for the Barnett scenario: estimated emissions in the Barnett
region (red boxes in Fig.

This study analyzes different methods for solving inverse problems. We introduce Tikhonov regularization with the commonly used 2-norm and the sparsifying 1-norm penalty function. We show how these approaches translate to a Gaussian and a Laplacian prior, respectively, in a Bayesian inversion framework. We present a new sparse reconstruction method that enforces sparsity in a redundant dictionary representation system tailored to this application. A simple heuristic approach is applied to all methods to force nonnegative parameters. To test these methods, we consider an atmospheric inverse modeling scenario, in which we estimate methane surface fluxes for the US from atmospheric in situ measurements.

We find that the choice of the penalty term has a substantial influence on the estimate and is thus a crucial step when solving inverse problems. Gaussian-like priors such as the 2-norm penalty in Tikhonov regularization produce a smoothing effect. In our scenario, this characteristic means that large localized sources such as emissions from cities cannot be estimated accurately. Instead, they appear more as regional sources. In contrast, the sparse reconstruction approach can reproduce these large emitters, but it also suppresses too many small emissions to properly estimate the total flux. However, we find a simple dictionary representation system that is able to sparsely approximate the emission field. The resulting sparse dictionary reconstruction method works equally well as established methods in determining the overall flux field and adds information on the local scale.

The Barnett case study shows the importance of such local information: while
the smoothing methods recognize the additional emissions in the total flux,
they cannot attribute these to the Barnett. The sparse dictionary
reconstruction method and the transform inversion studied in

As concluded in the previous study by

The sparsity constraint works best when the underlying signal is sparse or can be sparsely approximated. The representation of the signal in a dictionary is very flexible and can create a sparse signal for many applications. Our sparse reconstruction method is thus applicable to any inverse problem, but the dictionary would need to be adapted to suit the application. For some applications, sparsifying transforms or training data to learn a dictionary might be available. In others, finding a sparsifying dictionary might be a challenge on its own. We construct the dictionary by identifying some building blocks of the signal. The estimate can be further improved by using spatial information about sources encoded in shape functions.

In summary, the sparse reconstruction approach here is a good alternative to commonly used Gaussian priors when the emission field has many point sources or a heterogeneous spatial structure. The combination of a sparsifying dictionary representation system and sparse reconstruction is a powerful tool for many inverse modeling applications.

The numerical methods and the case study data are available for download. The code is written in Matlab 2014b. See the Supplement for more information.

NH carried out the numerical experiments, and prepared and finalized the article. SM provided the experimental framework and was involved in the finalization of the paper. PM, JN, MP, and TW supervised the project from a mathematical and environmental physics point of view.

The authors declare that they have no conflict of interest.

The work was funded by the Center for Industrial Mathematics of the University of Bremen. The collaboration was supported by a research scholarship by the Deutscher Akademischer Austauschdienst (DAAD). The article processing charges for this open-access publication were covered by the University of Bremen. Edited by: Michael Long Reviewed by: two anonymous referees