Atmospheric trace gas inversions often attempt to attribute fluxes to a high-dimensional grid using observations. To make this problem computationally feasible, and to reduce the degree of under-determination, some form of dimension reduction is usually performed. Here, we present an objective method for reducing the spatial dimension of the parameter space in atmospheric trace gas inversions. In addition to solving for a set of unknowns that govern emissions of a trace gas, we set out a framework that considers the number of unknowns to itself be an unknown. We rely on the well-established reversible-jump Markov chain Monte Carlo algorithm to use the data to determine the dimension of the parameter space. This framework provides a single-step process that solves for both the resolution of the inversion grid, as well as the magnitude of fluxes from this grid. Therefore, the uncertainty that surrounds the choice of aggregation is accounted for in the posterior parameter distribution. The posterior distribution of this transdimensional Markov chain provides a naturally smoothed solution, formed from an ensemble of coarser partitions of the spatial domain. We describe the form of the reversible-jump algorithm and how it may be applied to trace gas inversions. We build the system into a hierarchical Bayesian framework in which other unknown factors, such as the magnitude of the model uncertainty, can also be explored. A pseudo-data example is used to show the usefulness of this approach when compared to a subjectively chosen partitioning of a spatial domain. An inversion using real data is also shown to illustrate the scales at which the data allow for methane emissions over north-west Europe to be resolved.

Emissions of atmospheric trace gases can be estimated using observations and
an atmospheric chemical transport model (CTM). A common approach to such
“inverse” problems uses Bayes theorem, where a prior estimate of
parameters,

The
relationship between the observations and parameters can be determined by a
CTM. For flux estimation problems this forward model is usually given by the
linear relationship:

Traditionally, the number of basis functions has been fixed a priori, and the
flux values associated with each basis function are updated in the inversion.
The basis functions can take various forms, but often each one represents a
2-D geographical area, within which the fluxes are either uniform

Different methods can be used for this dimension reduction such as radial
basis functions or principal components

There are only a few studies that have sought to solve for the partitioning
of basis functions, such that the aggregation of the parameter space may be
performed objectively.

An alternative to proposing an optimum partitioning of basis functions either
subjectively or otherwise, is to instead allow the data to decide the form of
the partitioning. Such an approach has been receiving increasing attention in
many fields of the geosciences that invoke Bayes theorem

There are three critical advantages to this transdimensional approach:

The subjectivity associated with the choice of partitioning is largely removed.

The uncertainty that surrounds the choice of partitioning is propagated through to the posterior parameters estimate.

The partitioning of the inversion domain and inference on the desired parameters are calculated simultaneously.

For two different models (which contain a different number of basis
functions),

The first term on the right-hand side is the evidence, which appeared as the
normalizing constant in Eq. (

The evidence gives a measure of the probability of randomly choosing the set
of parameter values that generate the data,

Theoretically, one could use the evidence as a means to find the partitioning
of the inversion domain that provides the most likely explanation for the
given data. However, in practice this is not so straightforward, as the
evidence can be particularly complex to calculate, particularly for
non-linear, high-dimensional problems. Instead, we can make use of the
reversible-jump Markov chain Monte Carlo (rj-MCMC) method of

In addition to being dependent on the partitioning of basis functions,
Bayesian inversions are also dependent on the form of the PDFs used to
describe the prior and likelihood. The terms that describe these PDFs such as
the mean, standard deviation and correlation length are commonly referred to
as hyperparameters. The dependence of the posterior parameters on these
hyperparameters, and a lack of objective determination of their values have
been previously identified as a limitation of Bayesian inverse methods

In this work, we will set out a transdimensional hierarchical Bayesian
inverse framework and its application to estimating emissions of trace gases,
using atmospheric data. We describe the form of the partitioning of the model
space, and how this may be easily varied in the inversion. We further
incorporate a hierarchical Bayesian framework

Partitioning based on a spatial aggregation of neighbouring cells allows one to easily define natural boundaries such as a land–sea interface or country border, but can also have the effect of imposing hard boundaries elsewhere, which is an extremely crude depiction of reality. Furthermore, each grid cell within each aggregated region has an enforced uncertainty correlation to its neighbours, which may not be appropriate. However, whilst other forms of basis functions may avoid these pitfalls, they can lack the simplicity of a simple grid coarsening. For this reason, the basis functions we define in this work are based on a spatial aggregation of neighbouring cells. However, the same transdimensional framework could be similarly applied to determine the number of appropriate principal components, radial basis functions or indeed any other model reduction class method to use in an inversion.

Instead of defining the spatial domain as a series of regular square or
rectangular basis functions, an alternative is to assign a number of nodes,
or nuclei, to the domain. Every nucleus defines a region, where the edge of
each region is equidistant between the closest two nuclei, and perpendicular
to a line connecting the nuclei pair. As such, any point within each region
is closer to that region's nucleus than any other nucleus.
Figure

A spatial domain partitioned into 10 Voronoi cells. Moving one nucleus to a new position in the domain changes the boundaries of the neighbouring cells.

On its own, partitioning a domain into Voronoi cells is perhaps a rather
unrefined means of forming a set of basis functions, since it does not take
account of natural partitions, such as a land–sea boundary, or different
vegetation types. However, defining a grid by a set of Voronoi nuclei
provides a convenient way of describing the size and shape of each cell using
just two values: the

The use of Eq. (

Using the property of PDFs that states

Bayes theorem allows the term

In addition to

Conventional MCMC techniques allow the flexibility of including models which
are non-Gaussian, or with varying hyperparameters, and thus cannot be solved
analytically. In this usual Metropolis–Hastings approach, at each iteration a
proposal is made to move to a new point in the state space, and subsequently
either accepted or rejected based on some probabilistic criterion. The
proposal is accepted provided that

However, conventional MCMC assumes that the model (the dimension of

or more formally:

Equation (

The second additional term is the proposal ratio,

To perform this step, we are in effect applying a proposal distribution, in
this case Gaussian, denoted by

The reversible-jump algorithm allows for the sampling of arbitrary dimension probability density functions (PDFs),
and thus allows us to explore both the model parameter values, and the model dimensionality simultaneously. At each step on the chain we prescribe five possible proposals:

emissions update – randomly select and perturb one emissions value;

hyperparameter update – randomly select and perturb one hyperparameter value;

move – randomly select and move one Voronoi nucleus location;

birth – add one new Voronoi nucleus to a random location in the domain, thereby increasing the parameter space by one;

death – randomly select and remove one Voronoi nucleus, thereby reducing the parameter space by one.

The first two steps involve a change only in the emissions value prescribed
to a cell, or a hyperparameter value, exactly the same as a conventional
fixed-dimension hierarchical inverse problem. The other three proposal types
involve a change to the partitioning of the grid, either through a dimension
change, or by moving the location of a nucleus. This means that the
sensitivity matrix,

In the transdimensional inversion, there is an unknown number of unknowns, so
the prior PDF must describe both the basis functions,

Partitioning the inversion domain into Voronoi cells enables us to describe
the basis functions using three parameters: the longitude, latitude and
emissions value. If the emissions value is taken to be a scaling of the a
priori distribution of emissions, then the a priori scaling of this prior
emissions field should be one everywhere, and hence this is not dependent on
location. In this work we assign a uniform distribution for the location of
the Voronoi nuclei, meaning that the prior distribution is independent of the
emissions. Given the independence of the variables, the term

Given a uniform distribution, a priori the Voronoi nuclei may be located
anywhere within the spatial inversion domain. However, if we assume that the
Voronoi nuclei can only be located at the centre points of each grid cell on
a finite underlying grid with

For the number of unknowns, we assume little prior knowledge on this
quantity, and assign a uniform distribution that can take any value between a
maximum and minimum. Outside of this range the probability is set as zero.
Whilst the uniform prior is relatively uninformative, the choice of maximum
and minimum bounds may still influence the number of nuclei if the constraint
from the data is weak or if the bounds are too narrow.

For emissions, we have chosen a log-normal PDF, since it is usually a
requirement that anthropogenic emissions are defined only on the positive
axis, and gridded emissions databases are readily available. However, in
cases where the distribution of emissions is less certain or unknown, one
could easily use an alternative PDF, such as an uniform distribution, again
defined only on the positive axis. For a log-normal distribution the term

The full prior PDF of the basis functions is thus

We assume minimal prior knowledge on the emissions hyperparameters,

For the transdimensional case, we must consider the form of the proposal
ratio

As previously discussed, when updating an emissions value of a basis
function, the proposal distribution takes the form of a Gaussian perturbation
to the current state. Therefore, the proposal distribution is symmetrical,
and

The third proposal is to randomly select one Voronoi nucleus and move its location according to some Gaussian distribution, centred on its current position. The emissions value associated with that nucleus remains unchanged. Again, as in the case of the emissions update, this proposal distribution is symmetric, and the proposal ratio equals 1.

The birth proposal involves randomly selecting a vacant point in the domain
to add a new Voronoi nucleus. The new nucleus also requires an emissions
value, which is chosen based on a Gaussian perturbation of the emissions
value of the Voronoi cell,

Assuming we have a finite grid with

The death process of removing a Voronoi nucleus is the exact opposite of the
birth step of adding a Voronoi nucleus. Supposing that the

Here, we assume that the likelihood function is based on a least-squares
misfit. The form of this function is given by

The model–measurement covariance matrix

However, careful design of the covariance structure can simplify the problem.
The covariance matrix

The inverse may be similarly defined as

Changes to

This type of symmetric Toeplitz matrix has an explicit inverse

The determinant of

For each type of proposal to move along the chain from

Since the proposal ratio for an emissions update is 1, the acceptance term is
the conventional, fixed-dimension MCMC acceptance. The acceptance term can be
formed from Eq. (

We consider the case of a change in two types of hyperparameter, those acting
on

As for the emissions update, the proposal ratio for a movement of a Voronoi
cell is 1. In addition, there is no change in the prior distribution, since
the cell that moves takes its emissions value with it, and the dimension of
the model does not change. Thus, both

The acceptance term for a birth takes the form of the full transdimensional
acceptance given in Eq. (

Similar to the birth proposal, the death proposal takes the full
transdimensional form. Assuming a log-normal prior PDF for emissions the
acceptance term is

At each iteration, movement along the chain is governed by the acceptance
ratios given in Eqs. (

The chain must be run for a sufficient number of iterations in order for
convergence of the posterior distribution to occur. The convergence refers to
the stability of the distribution across the sampled iterations of the Markov
chain. In the fixed-dimension case well-established convergence assessments
exist, by examining convergence of each element of the parameters vector

In order to achieve a stationary posterior distribution for the parameters,
the number of iterations for which the chain must be run is large, of the order of

A key component of the likelihood function is

In the pseudo-data and real data examples discussed in
Sects.

Particles were tracked backwards for 30 days in a large regional domain with
bounds of (

In order to demonstrate the utility of the transdimensional inversion
framework, we applied it to a pseudo-data example, where the true emissions
field was known. An emissions field of anthropogenic methane was taken from
the Emissions Database for Global Atmospheric Research

The pseudo-data inversion was first performed in the traditional
Metropolis–Hastings sense, using a fixed grid with random arrangements of

RMSE in the data space as a function of number of unknowns, given as

For each experiment the root mean square error (RMSE) of the posterior mean
modelled mole fractions minus the true observations (without added data
noise), was taken as a measure of the fit to the true data. The mean and
standard deviation of these RMSE values across the 500 inversions are shown
in Fig.

An additional experiment also used a fixed set of basis functions, but in
such a configuration that had been designed to be higher resolution close to
the measurement sites and follow national boundaries. This grid was based on
the set up of

Maps of the posterior scaling of the prior for the subjectively optimized fixed grid

Although the overall pattern of reds and blues is discernible in
Fig.

Finally the inversion was performed using the transdimensional approach,
where the number and configuration of basis functions was allowed to vary; 40 regions were chosen a priori, and the bounds of the uniform prior were
5 and 500 unknowns. Each discrete point on the transdimensional Markov chain
contained a relatively coarse partitioning of the spatial domain, which may
have provided an RMSE little better than a randomly chosen grid of the same
number of regions. However, since the solution is the entire posterior PDF of
the parameters, we can extract the mean value of the posterior distribution
for each of the underlying grid cells, and use this to recreate a set of mole
fractions. This naturally smoothed solution gave a significantly reduced RMSE
in the data space for the mean number of regions, shown by the green circle
in Fig.

Figure

This point is highlighted by the estimated uncertainty map, which is extracted
directly from the posterior PDF, shown in
Fig.

Of course, if one specifies the form of the basis functions correctly, then
the model–measurement RMSE can be minimized. This is shown by the magenta
square on Fig.

Posterior distribution of the number of unknowns in the pseudo-data experiment.

The posterior distribution on the number of derived unknowns in the
transdimensional solution is shown in Fig.

The pseudo-data example shows the merits of the transdimensional approach
when there exist hard boundaries between areas of over or underestimation in
the prior. In reality, such clear cut scaling fields are unlikely, and so it
is pertinent to observe how the inversion performs when confronted with real
data. In order to achieve this, we performed an inversion using 1 month of
CH

Compared to the pseudo-data experiment above, in addition to solving for the
emissions scaling factors and the number of unknowns, various hyperparameters
were also considered variable, which were to be solved in the inversion.
Hyperparameters describing the prior log-mean and log-standard deviation,
model–measurement uncertainty and auto-correlation timescale were each
described by a uniform PDF. Prior emissions were taken from the EDGAR
emissions inventory

An emissions field was estimated for March 2014 using data from the four
measurement sites of the DECC network: Mace Head, Ireland; Tacolneston,
England; Ridge Hill, England; and Angus, Scotland, shown in
Fig.

Schematic of the arrangement of the full NAME computational domain, divided into six fixed regions and the sub-domain in which the transdimensional inference was performed, shown by the different shadings.

The total NAME output domain was of dimension

Figure

Posterior distribution of the number of unknowns in the real data inversion.

The mean number of unknowns was found to be 201 (145–248), as shown in
Fig.

In addition to inference on the mean of the posterior distribution, the
posterior PDF of the emissions field gives us a direct estimate of the
uncertainty of each grid cell. Figure

Inference on the various hyper-parameters of interest can inform us about the
relative modelling performance at each of the measurement sites. Modelling
uncertainties were found to be smallest at the Angus site, with a mean
uncertainty of 8 (4–15) ppb. This is consistent with the station mainly
sampling clean air, being far enough away from large, variable emission
sources in Scotland. In contrast, higher modelling uncertainties were derived
for the Tacolneston and Ridgehill stations, of 32 (10–73) ppb and
25(8–58) ppb respectively, consistent with both sites intercepting polluted
air more frequently. The average correlation timescale, based on the
prescribed exponential decay structure, was found to be 15 (7–37) h across
all four sites. No significant difference was found between the uncertainties
derived for times when local influence was high and those when it was not. By
contrast,

The inversion above took around 90 min to run 600 000 iterations, on a
single processor, although there were two primary time-consuming steps that
affected the computation time. The first was calculating the inverse of the
model–measurement covariance matrix. In the above example, there were around
750 observations, so that the inverse had to be calculated on a

The other rate-limiting step is the recalculation of the Voronoi cells and
the associated sensitivity of each one, every time a birth, death or move is
proposed. In practice, this need not be recalculated for all Voronoi cells,
only those that change in moving from the current state to the proposed
state. However, this can still be a cumbersome calculation. The use of
Voronoi cells present a simple, albeit rather crude approach to the
partitioning of the inversion domain, and it is our hope to extend this
method to other forms of basis functions in the future

Although the examples above were run using a single chain on a single
processor, the opportunities for running multiple chains in parallel should
be readily apparent. The implementation of several independent chains run in
parallel on multiple processors could allow for significantly fewer
iterations being required for each chain. Indeed, a further development is to
have several chains able to communicate with each other throughout the
inversion, where each chain is tempered by a given parameter. This technique
of parallel tempering has the potential to allow vast improvements in
efficiency when compared to the conventional Metropolis–Hastings algorithm,
especially for multi-modal PDFs

In this work, we intentionally chose to focus only on the 2-D spatial
aggregation of the fluxes and ignored the assumptions made in aggregation of
the temporal dimension due, primarily, to concerns about the computational
demands of extending this particular implementation to 3-D

This inversion framework is inherently suited to cases where one does not
have to continuously recalculate the native resolution sensitivity matrix,

We have demonstrated how reversible-jump Markov chain Monte Carlo can be applied to inverse modelling of trace gas emission fields. In allowing the number of unknowns itself to be an unknown, the method attempts to avoid some of the assumptions that have had to be made in atmospheric inverse modelling. Furthermore, the uncertainty surrounding the choice of number and shape of unknowns, propagates through to the posterior distribution. We have shown how, through making a reduced set of assumptions about the shape and number of our basis functions, this transdimensional approach can lead to a better fit to the data, and an improved representation of a true emissions field, when compared to a random or subjective basis function definition. Combined with a hierarchical framework, the method set out here is focused on using the data to as great an extent as possible to guide our solution. Emissions derived using the transdimensional hierarchical framework, from the UK and Ireland during March 2014, were found to be consistent with previous work. The framework provides an alternative approach to using a single partitioning of basis functions when performing dimension reduction.

NAME is a UK Met Office model available for external research use under
license. Information on obtaining a license can be obtained by contacting the
Met Office directly. The reversible-jump MCMC Fortran code can be obtained
upon request. Data from the UK DECC network are available for download from
the EBAS database:

We thank Simon O'Doherty, Aoife Grant, Dickon Young and all the site operators of the DECC network for their tireless work and dedication to providing high quality and reliable data. Mark Lunt is funded under a studentship from the UK Natural Environment Research Council (NERC). Matt Rigby is funded by a NERC advanced research fellowship NE/I021365/1. Anita Ganesan is funded under a NERC-independent fellowship NE/L010992/1.Edited by: I. Pisso Reviewed by: two anonymous referees