Bayesian spatiotemporal inference of trace gas emissions using an integrated nested Laplacian approximation and Gaussian Markov random fields.

. We present a method to infer spatially and spatio-temporally correlated emissions of greenhouse gases from atmospheric measurements and a chemical transport model. The method allows fast computation of spatial emissions using a hierarchical Bayesian framework as an alternative to Markov chain Monte Carlo algorithms. The spatial emissions follow a Gaussian process with a Matérn correlation structure which can be represented by a Gaussian Markov random ﬁeld through a stochastic partial differential equation approach. The inference is based on an integrated nested Laplacian approximation (INLA) for hierarchical models with Gaussian latent ﬁelds. Combining an autoregressive temporal correlation and the Matérn ﬁeld provides a full spatio-temporal correlation structure. We ﬁrst demonstrate the method on a synthetic data example and follow this using a well-studied test case of inferring UK methane emissions from tall tower measurements of atmospheric mole fraction. Results from these two test cases show that this method can


Introduction
Emissions of greenhouse gases, ozone-depleting substances and air pollutants are increasingly inferred indirectly from atmospheric trace gas concentration observations and chemical transport models. These "top-down" or "inverse" methods complement inventory-or process-model-based "bottom-up" techniques that are used, for example, in national reporting of greenhouse gas emissions to the United Nations Framework Convention on Climate Change (UNFCCC; e.g. Leip et al., 2018).
Top-down methods rely on some form of statistical inference, or inverse theory, to infer emissions at global (e.g. Saunois et al., 2016) to regional (e.g. Brunner et al., 2017) scales. They also require a chemical transport model to provide the relationship between atmospheric mole fraction and emissions. The most common type of inverse method uses Bayesian inference. Bayesian inference relies on the information about the uncertainties in the measurement, transport model and prior probability of the parameters to inform the emissions estimate. Inverse methods often assume that uncertainties in the likelihood and prior probabilities are known exactly and Gaussian (e.g. Stohl et al., 2009;Brioude et al., 2013). These assumptions allow large inverse problems to be solved efficiently. However, when uncertainties are poorly understood, Bayesian methods have been shown to lead to posterior solutions that are highly dependent on these assumptions (e.g. Ganesan et al., 2014).
Recently, hierarchical Bayesian schemes have been developed to infer unknown uncertainties in the inversion framework (e.g. Ganesan et al., 2014;Jeong et al., 2016;Lunt et al., 2016). These hierarchical Bayesian inference schemes use Markov chain Monte Carlo (MCMC) methods that are wellsuited to small-dimensional problems. However, they can suffer from issues with convergence, especially as the dimension of the problem increases (Cowles and Carlin, 1996). Hierarchical inference of spatially and spatio-temporally correlated emissions is computationally difficult and currently relies on methods such as Kalman filters with assumed covariance structures (e.g. Brunner et al., 2012) or an empirical hierarchical framework, where unknown hyperparameters are not integrated out during inference (e.g. Michalak et al., 2005;Berchet et al., 2015). Previous work shows, however, that including spatial correlation improves the fit between modelled and observed data (Zammit-Mangion et al., 2016), while temporal correlation is important to represent the dependence of emissions between time periods. The size and spatial coverage of measurement networks available for inferring gas emissions are growing, particularly through satellite observations (e.g. Bergamaschi et al., 2007;Veefkind et al., 2012;Wecht et al., 2014;Ganesan et al., 2017). Therefore, there is a need to develop methods that can utilise these big data sets whilst maintaining the benefits of uncertainty quantification in hierarchical methods, ideally extending to spatio-temporal inference.
This work presents a computationally efficient hierarchical Bayesian framework for inferring spatio-temporally correlated trace gas emissions in a widely used regional atmospheric chemical transport modelling framework. We use an integrated nested Laplacian approximation (INLA) for the Bayesian inference. The spatial correlation structure with spatial Markov properties results from the Gaussian random field being a solution to a particular stochastic partial differential equation. Kronecker product algebra allows efficient extension to spatio-temporal correlation. Section 2 presents the construction of the hierarchical problem, describes the formation of the correlation structure and introduces INLA. Section 3 contains two case studies as a proof of concept for the method with a discussion of their implementation. The first is an example using four consecutive periods of pseudo-data observations of methane from four UK monitoring stations. This then extends to using real observations from the four measurement stations to infer UK emissions for 2014, split into four 3-month periods. Section 4 discusses the results and computational performance of the method, and Sect. 5 presents the conclusions of the study.

Methods
This section details an efficient approach to forming spatial and spatio-temporal correlation functions and outlines how this applies to the inference of regional trace gas emissions from measurements using fast inference for hierarchical models. We limit the scope of this paper to the wellestablished problem of regional inference of long-lived trace gas emissions using a backward-running Lagrangian particle dispersion model (Stohl et al., 2009;Manning et al., 2011;Henne et al., 2016). In this framework, the model directly calculates the sensitivity of the measurements to emissions from each grid cell in the domain. Extension to other systems (e.g. global models) would require modification to provide sensitivity on a global scale with substantially different temporal and spatial emissions. We begin by introducing the model and the inferred latent parameters (the emission fluxes and boundary conditions), followed by an introduction to Gaussian Markov random fields and how they are useful for efficient calculation of spatial and spatio-temporal correlation structures. All together, this forms a Bayesian hierarchical model, from which emissions can be inferred.

Model framework
The aim is to infer some parameters of interest, here a spatial field of a priori emissions scaled by some factor, x, from some measurement. The a priori emissions at a given location are generally informed by spatially resolved bottom-up inventories (as in Sect. 3.1.3) or extrapolation from some reported emissions value. The value x is a multiplicative scaling of this a priori value for emissions, most generally expressed as a quantity of gas per unit time per unit area. The approach taken in this work uses a Gaussian Markov random field for fast and efficient calculation of spatial correlation for x. This means that the emissions field is required to be a latent Gaussian field, which will be discussed further in Sect. 2.2. Net surface fluxes of many greenhouse gases are positive at the scales resolved by the model. In this work, due to the usage of a latent Gaussian field, which must be defined over both positive and negative values, we choose to look at the deviations of emissions from the prior mean emissions field. This is in an effort to fit the physical model to the imposed statistical model that fast computation requires. Taking the approach that, for many regional inverse problems involving longer-lived trace gases, there is a linear relationship between emissions which are constant in time and observed atmospheric concentration, the relationship between measurements and emissions is where y is a vector of the residual between the measured and a priori predicted measurement; H is a Jacobian (or sensitivity) matrix, which maps the surface emissions to the measurements; u is a vector of independent and identically dis-tributed variables containing the contribution to the measurement of mole fractions at the boundary of the domain minus the prior mean contribution, with an associated sensitivity matrix K; and is some stochastic error. The variable x has a Gaussian prior probability with zero expectation and a covariance described by a Gaussian random field. This will be solved using a hierarchical framework (Sect. 2.4).

Gaussian Markov random fields
The emissions scaling from its a priori value x is spatially continuous over the domain of interest. We assume that it exhibits a spatial correlation structure because emissions at one location are generally not independent from all other locations in the same field. We choose to model the covariance in this field using a Matérn covariance function, which Stein (1999) shows is well-suited to natural systems due to its flexibility, with other correlation structures (e.g. exponential) being special cases of the Matérn family (Guttorp and Gneiting, 2006). A stationary Gaussian random field with a Matérn covariance is the solution to a particular stochastic partial differential equation (Whittle, 1954(Whittle, , 1963, given by where κ is the spatial scale parameter, τ influences the variance, is the Laplacian operator, is the spatial domain and W(s) is a Gaussian stochastic process for locations s, noting that the dependence is only on the Euclidean distance; the process is therefore isotropic. For an example of nonstationary and anisotropic fields see Marques et al. (2019). The smoothness parameter α gives a continuous domain Markov field for integer values and is set to α = 2 (see Whittle, 1954). Smaller values will give more short-scale variability and can be difficult to differentiate from noise. Lindgren et al. (2011) show that if the field is represented using a Gaussian Markov random field (see Rue and Held, 2005), then it is very efficient to directly construct the precision, or inverse covariance, matrix using a basis function representation where ψ k (·) is a piecewise linear function in each triangle of a mesh construction of the spatial domain, where ψ k (·) is 1 at vertex k and 0 at all other vertices. This mesh is created using constrained refined Delaunay triangulation (Shewchuk, 2002), placing nodes at the main points of interests and infilling the rest of the space using some condition of minimum and maximum length of the vertices. In this work we choose to represent the UK and Ireland with an evenly spaced denser mesh with a coarser mesh outside of this region. A mesh could be further refined, for example by creating an even denser mesh close to the measurement site where sensitivity is higher. It is important to extend the mesh beyond the region where the measurements are sensitive to emissions as the mesh is constructed using Neumann boundary conditions, which trigger reflection and therefore overestimation close to the boundaries. This has no effect on the inferred emissions, provided that the mesh is extended far enough around the domain of interest. Figure 1 shows the mesh used in this work, excluding the extended outer mesh region.

Extension to spatio-temporal correlation
A spatio-temporal extension to the forward model (Eq. 1) is possible by including the spatial correlation structure introduced in Sect. 2.2 in a temporal framework (Cameletti et al., 2013). Similar to Eq. (1), the deviation from the prior mean measurement at site l made at time t in its simplest form is where i represents each of the total n nodes and j represents the edge of the domain at each of the four cardinal directions. We make the assumption that measurements made at a given time are independent, giving a vectorised observation vector y t at each time. This can be generalised to include correlated measurements, where the covariance function should result in a sparse matrix to retain computational efficiency. Following this, the matrix H from Eq. (1) becomes the sparse block diagonal matrix which operates on the vectorised spatio-temporal scaling of the emissions field x = [x 1 , x 2 , . . .x m ] to model the obser-vations y = [y 1 , y 2 , . . .y m ]. The time varying structure of H t and x applies also to K t , and u t . We impose the temporal correlation structure between emissions over time to be a firstorder autoregressive model, where and where φ is the temporal correlation and Q −1 S is the spatial correlation structure described by a Matérn field using the stochastic partial differential equation approach for a Gaussian Markov random field. We have chosen a first-order autoregressive model as we believe that emissions are generally similar to those at the previous time step. In the given model, φ is the correlation between the previous time step and the current time step. This means that the emissions at time t have a similarity of φ to emissions at t −1 plus some spatially correlated random effect δ t . The vectorisation of x allows a separable covariance structure for the temporal and spatial covariances, which means that the spatio-temporal precision matrix can be expressed using a Kronecker product as follows (Mardia et al., 1979): Estimating hourly emissions at each time t soon makes inference prohibitive due to burden. Instead we make the assumption that emissions are constant over a 3-month period to reduce the computational size. We continue with the assumption that measurements within a single time period are independent, although this can be generalised if required. In this experiment we make the assumption that emissions are constant over a 3-month period and that the correlation between these 3-month periods is autoregressive of the order of 1.

Hierarchical model
Inferring the emissions and the related uncertainties requires a hierarchical model to infer the quantities of interest from measurements while estimating some unknown parameters which are necessary for inference. The main focus of this work is to estimate the posterior distribution of the emissions field x based on observations y. We follow a typical Bayesian hierarchical framework where θ is vector of hyperparameters describing the variances and covariances in x, u and y, noting that x ⊥ ⊥ y | θ .
We assume where the precision matrix of the model-measurement uncertainty Q y contains the hyperparameter for the standard deviation of the model-measurement standard deviation σ y , σ BC is the standard deviation of the prior for u and I is the identity matrix. Together the hyperparameters make the vector θ = (ρ, σ, φ, σ BC , σ y ), which have independent prior distributions. The hyperparameters for the spatial precision matrix Q(θ ) are transformations of the variables in Eq. (2), where ρ is the range parameter and σ is the marginal standard deviation of the latent field (Lindgren et al., 2011). We use penalised complexity priors to define the prior probabilities for these parameters (Simpson et al., 2017;Fuglstad et al., 2018). Penalised complexity priors allow the formation of priors when there is only a vague understanding of their true values while enforcing more constraint than using a broad uniform or a Jeffreys prior. This uses the information loss of deviating from some baseline estimate of the parameter. Penalised priors do not increase the computational speed for the given case and are chosen instead for their intuitiveness. Penalised complexity priors are specified by the probability of the parameters being less than or greater than some baseline value where ρ 0 and σ 0 are the baseline values and p ρ and p σ are the associated probabilities defined by the user. The hyperparameter φ controls the temporal correlation between latent variables, and its prior probability is defined on a beta distribution scaled between −1 and 1 as where a and b are assumed coefficients. The matrix Q y is the precision matrix for the combined measurement and model errors. The diagonal of Q y contains the square of the hyperparameter σ y , where the prior probability is defined on log 1 The marginal standard deviation of the a priori boundary conditions σ BC is also constructed this way, giving A Bayesian hierarchical model requires a method of inference to estimate the parameters of interest and any parameters that are not of direct interest but required, and uncertain, in order to infer the other parameters in the hierarchy. Many methods of inference exist and have been applied to the problem of estimating emissions of trace gases (see references in Sect. 1). A promising approach using the correlation structure in Sect. 2.2 and 2.3 is INLA . Section 2.5 outlines this approach to inference.

Inference using an integrated nested Laplacian approximation
An integrated nested Laplacian approximation (Rue et al., 2009) provides a fast and efficient framework to infer the latent variable x and hyperparameters θ from measurements y. The calculation of the INLA is possible using the R-INLA package . In this work we use R-INLA version 17.06.20. The speed in this approach comes from solving the marginal posteriors for x i , i.e. each element i of the latent field, through the numerical integration where j is the j th element in θ and −j indicates all by the element j . Equations (11) and (12) make use of a Laplace approximation by approximating p(θ | y) by where p G (x | θ, y) is a Gaussian approximation to the full conditional of x and the right-hand side is evaluated at x * (θ), which is the modal probability of x for a given θ . This approximation is exact if p(θ | y) is Gaussian and gives a good approximation for log concave problems (Tierney and Kadane, 1986). Then, it is possible to approximate p(x i | y, θ ) using another Laplace approximation where p G (x −i | x i , θ , y) is the Gaussian approximation to x −i | x i , θ , y evaluated at the mode x * −i (x i , θ ). See Rue et al. (2009) and Martins et al. (2013) for a more in-depth description.
While this method relies on calculating only the marginal posterior distributions of x i , it is still possible to predict a linear combination of the field to provide regional emissions totals (e.g. country totals). We define a linear predictor of emissions for a given region η * , defined using the basis function representation of the mesh where each node contains information on the spatial area represented by that node and its connecting vertices and zeros for all nodes that are not in the region of interest. Then we can approximate the linear combination of the parameters of interest, giving a combined emissions total η * , using Eqs. (11) and (12) and transforming the predicted latent field by the weightings containing the area information.

Case studies
This section presents two case studies to demonstrate how the method applies to inferring trace gas emissions. The first uses simulated methane observations from four tall-tower measurement sites to infer simulated spatio-temporal emissions from the UK. The second case study expands on the first case study by using real observations from the four tall towers to infer emissions of methane from the UK over four 3-month periods in 2014. While the size of this problem is not particularly large, we demonstrate the method using UK methane emissions as a proof of concept as it is a well-studied test case (Manning et al., 2011;Ganesan et al., 2015;Lunt et al., 2015), which should exhibit a spatio-temporal correlation structure. The method can be extended to larger spatial domains or data set sizes as required.
3.1 Transport model and data

Measurement data
The case study observations are from four measurement sites: three in the UK and one in Ireland, which are part of the UK Deriving Emissions related to Climate Change (DECC) network (Stanley et al., 2018). Figure 2 shows the location of these four measurement stations: Ridge Hill in the west of England, Angus on the east coast of Scotland, Tacolneston on the east coast of England and Mace Head on the west coast of Ireland. Measurements are made quasi-continuously throughout this network but are averaged into hourly values here, consistent with the time step of the meteorology that drives the atmospheric transport model (Sect. 3.1.2). The data set contains ∼ 10 000 measurements to demonstrate the capabilities of the method at handling moderate data volumes. We consider the scalability of this method in the discussion (Sect. 4).

Transport model
An atmospheric transport model calculates the sensitivity of hourly measurements to the emissions or boundary condi- tions, from which the matrices H and K can be formed. This work uses the NAME III (Numerical Atmosphericdispersion Modelling Environment) version 7.2 Lagrangian particle dispersion model (Jones et al., 2006) to simulate the transport of methane in the atmosphere. For each measurement, NAME tracks 20 000 gas particles, released over a 1 h period, backward in time for 30 d from the measurement site. We record the times and locations that the particles drop below 40 m a.g.l. and reach the computational domain boundary, which is at 5 • S, 74 • N, 55 and 192 • E, to calculate the sensitivity of the data to emissions from the surface or to the mole fraction at the domain edge. NAME was driven by the Met Office's Unified Model UK Variable (UKV) threehourly meteorological analysis (Cullen, 1993).

Prior emissions inventory
Inventory data from the Emissions Database for Global Atmospheric Research (EDGAR) v4.3.2. provide the mean prior emissions (Janssens-Maenhout et al., 2017) using the most recent emissions map from 2012. This database contains anthropogenic emissions, which are the dominant methane sources in the UK (Manning et al., 2011), and is deemed appropriate for the application. The EDGAR emissions inventory provides an a priori estimate of UK methane emissions of 2.46 Tg yr −1 . The MOZART-4 global chemistry model (Emmons et al., 2010), driven by global fields of natural and anthropogenic emissions and sink terms (Ganesan et al., 2017), provides the prior mean estimate of methane mole fraction at the boundaries of the inversion region.

Pseudo-data experiment
We test the method by performing an inversion using pseudodata for four consecutive time periods of 1 month. By creating a known-emissions field we are able to validate the method through comparing the inferred emissions to the known emissions, which is not possible in the real world. We form a synthetic-emissions field by allowing the emissions to deviate from the prior mean emissions according to a Matérn field (see Sect. 2.2). We choose to simulate the data using σ = 0.5 as the uncertainty in the EDGAR v4.3.2. inventory is around 50 % for methane emissions and ρ = 3.25, which is similar to the correlation length scale in UK emissions in the EDGAR v4.3.2. inventory estimated using a variogram (Cressie, 2015), although the correlation length scale in the uncertainty is unknown. We use φ = 0.8 as expert experience suggests that UK emissions of methane are generally highly correlated in time.
To create the synthetic-emissions field, the NAME sensitivities for each measurement at each grid cell, detailed in Sect. 3.1.1, are multiplied by the corresponding EDGAR inventory emissions in that grid cell, which are then transformed into the triangulation nodes in Fig. 1. This forms the matrix H. We randomly sample the full spatio-temporal precision matrix for the latent field (Sect. 2.3) using the GMR-FLib library (Rue and Follestad, 2001) to generate the latent field x. In this experiment we treat the boundary conditions as known as in practice these are generally well-constrained (or treated as known) during an inversion. The observations are simulated using the simulated latent field and sensitivities following Sect. 2.1 with additive Gaussian noise with a standard deviation of 15 % of y. The synthetic observations contain a total of 11 520 measurement points, which are used to infer 646 emissions nodes for each time period, i.e. the nodes of the mesh in Fig. 1. Figure 3a-d show the syntheticemissions field in terms of deviation from some prior emissions field. The synthetic total deviation in emissions from the prior mean for the UK for each period is 0.07, 1.29, 1.01 and −0.33 Tg yr −1 .
The inference needs prior probabilities for the hyperparameters, which are known exactly here, but we set them to be deliberately incorrect, but feasible based on true prior knowledge, to check that the inversion method can still recover the correct emissions. For the inversion we assign a prior probability for φ using a = 6.5 and b = 0.1, for σ using σ 0 = 0.1 and p σ = 0.01 and for ρ using ρ 0 = 5 and p ρ = 0.5. We base the constraint on the spatio-temporal emissions on the assumption that methane emissions in the UK are likely to be strongly correlated between time periods, that they are unlikely to vary by more than 10 % of the a priori emissions from the previous time step and that there is little knowledge of the spatial correlation structure. For the model measurement error we assign the prior probability on the log precision as log 1 σ 2 y ∼ N (−5, 1), which represents approxi-  Fig. 5. The inferred emissions for the UK as a whole agree well with the synthetic emissions, with all true synthetic emissions totals falling within the 95 % uncertainty of the inferred total emissions. These synthetic tests cover both small and large differences between the a priori estimate of methane emissions in the UK and their true value. The larger differences are most likely an overestimate of what would be expected in practice, although this provides a good test case. The posterior mean spatial distribution of emissions in Fig. 4 is qualitatively similar to those in Fig. 3, although we avoid reading heavily into comparisons of the mean estimates plotted on spatial maps (see Gelman and Price, 1999).
Hyperparameter estimation is less accurate than for the latent field. The estimation of the noise σ y generally captures the imposed noise well, which had a mean value of 5.5 ppm and an estimated value of 6.6 [6.5, 6.8] ppm. The correlation structures are less well-captured. The temporal correlation φ was estimated with a mean value of 0.7 [0.6, 0.8]; the range ρ has a value of 1.7 [1.3, 2.1] and the marginal standard deviation of the latent field σ has a value of 0.8 [0.7,0.9]. It is promising that the posterior mean estimates of all hyperparameters show an improvement on their prior mean or baseline values, although only the true value for ρ falls within the estimated 95 % uncertainty.

Real-data experiment
This section presents methane emissions estimates for the UK in 2014. The year is split into four time periods: January to March, April to June, July to September and October to December. Based on the synthetic data set in Sect. 3.2, we use the same prior probabilities for the hyperparameters as the inversion. The prior boundary conditions are distributed with µ σ BC = 3.2 and σ σ BC = 0.4, based on expert judgement. The a priori values for these boundary conditions come from the MOZART-4 model as in Sect. 3.1.3. The linear mapping  from the latent field to the measurements is generated using the NAME-derived sensitivities described in Sect. 3.1.2 multiplied by the inventory emissions detailed in Sect. 3.1.3. Figure 6 shows maps of the inferred mean difference in emissions from the a priori inventory for the four time periods using INLA as in Sect. 2.5. This result is the mean posterior scaling for the latent field multiplied by the inventory value (see Sect. 3.1.3). Figure S2 shows maps of the 95 % uncertainty in the difference in emissions. A linear combination of the posterior latent field multiplied by the inventory emissions field gives total emissions for the UK. We estimate total UK methane emissions in 2014, with their associated 95 % uncertainty, of 2 for the periods January to March, April to June, July to September and October to December, respectively. These are plotted along with the inventory estimate in Fig. 7. This emissions trend suggests that, for 2014, there was an increase in methane emissions in the UK during the summer months compared to the winter months. The uncertainties, however, are large, meaning that this increase may not be as stark as suggested by the mean estimates. Combining these emissions into a mean annual emission for 2014 with its associated 2standard-deviation uncertainty assuming that the time periods are correlated with the modal posterior value of φ gives 2.28 ± 0.33 Tg yr −1 . The emissions are similar to mean UK estimates from previous hierarchical inversions using NAME of 2.09 [1.65, 2.67] Tg yr −1 by Ganesan et al. (2015) and 2.28 [2.04, 2.52] Tg yr −1 by Lunt et al. (2016). All of these

Discussion
The real benefit of the presented inversion method is speed, while still maintaining the idea that, in this application, uncertainties exist within the uncertainties that are inherent to hierarchical Bayesian inverse methods. The computation of the marginal posterior distribution of the latent field is readily suited to run in parallel across multiple cores, making this approach scalable to problems with a larger parameter space. This, however, requires that sufficient memory allocation is available. The inference for the experiment in Sect. 3.3 took around 2 h wall-clock time using a Quad-Core MacBook Pro with 8 GB RAM. For comparison, we were unable to get the problem in Sect. 3.3 to converge using 22 million iterations of a Metropolis-adjusted Langevin diffusion MCMC algorithm (Roberts and Rosenthal, 1998), taking on the order of days to complete and whose convergence is difficult to diagnose (Cowles and Carlin, 1996). Reducing the problem to purely spatial inference reduces the run time down to around 10 min wall-clock time for a single 3-month period. Inference using INLA may have smaller computational gains over hierarchical methods other than MCMC, for example empirical hierarchical inference (e.g. Michalak et al., 2005). Such methods, however, would benefit from using the stochastic partial differential equation approach to Gaussian Markov random fields for spatial correlation in Sect. 2.2, vastly reducing the cost of inverting dense covariance matrices.
The latent Gaussian field, crucial to this method, has the problem that it does not restrict emissions to strictly positive values, which is a physical requirement for many gas emissions. In addition, the INLA method relies on the assumption of approximate multivariate normality of the posterior linear predictors. MCMC algorithms do not suffer from this issue as any prior probability density function can be chosen, and no assumption is made about the posterior linear predictors. This has to be traded off, however, with the speed and ease of implementation of the method and the scale of the problem that the user wishes to solve. If strictly positive posterior emissions are an absolute requirement, another possible modification to the approach is to use a Taylor expansion around the nonlinear model Z = He x + , where x is a multiplicative scaling of the prior mode which follows a Matérn field. This Taylor expansion could either be around zero if the prior inventory is a good estimate of the true emissions or around the posterior mode itself, which can be found through iterations of the inverse method at the posterior mode of the previous iteration. An alternative route would be to instead use a non-Gaussian Matérn field, and this will likely prove a promising future avenue for geostatistical inference (Wallin and Bolin, 2015).
A potential extension to this work is global-scale modelling, for example a global study of methane emission from global satellite measurements. Spatio-temporal estimation of global CO 2 fluxes using Gaussian Markov random fields and a global measurement network gives promising results (Dahlén et al., 2019). The nature of the method makes it well-suited to parallel implementation and thus upscaling. Upscaling to global studies may introduce additional difficulties, which have been otherwise ignored in the regional case. For example it may no longer be possible to assume a spatial correlation structure controlled by only two hyperparameters, i.e. the correlation structure may vary in space. The stochastic partial differential equation approach to Gaussian Markov random fields can handle this by allowing the hyperparameters ρ and σ to become vectors or a non-stationary covariance (Lindgren et al., 2011) or by subsetting the space controlled by the different covariance structures (Sha et al., 2019). The problem of non-stationary covariances may also be present for inference of other spatial scales. For example natural features, such as lakes, may cause abrupt changes in correlation structures. Emissions of anthropogenic greenhouse gases may exhibit no spatial correlation structure (e.g. Mühle et al., 2019;Rigby et al., 2019). In this case a Matérn field would be inappropriate, but an autoregressive model may still be applicable for time-varying emissions. The inclusion of a non-stationary covariance makes the problem much more computationally expensive, and a more parsimonious approach may often suffice (Fuglstad et al., 2015).

Conclusions
This work presents a fast and efficient method using an integrated nested Laplacian approximation for hierarchical inference of trace gas emissions. This method is particularly wellsuited to assimilating large data sets. We show that INLA with a stochastic partial differential equation approach for spatial correlation can reproduce synthetic emissions from pseudo-observations and benchmarked emissions using real data.
A real advantage over other hierarchical Bayesian inversion methods is the attractive convergence properties, which can be difficult to obtain using methods such as Markov chain Monte Carlo algorithms. As the method computes the marginal variance for each node, this allows for efficient parallel implementation and significant computational savings compared to other hierarchical methods. Computational speed will become increasingly important as more data from space-borne sensors become available, which will offer more measurements and increased spatial coverage.
Code and data availability. Measurements of methane from the UK DECC network sites Tacolneston, Ridge Hill and Tall Tower Angus are available at http://ebas.nilu.no/ (last access: 13 February 2020, Tørseth et al., 2012). Measurements of methane for the Mace Head station are available at http://agage.mit.edu/data (last access: 13 February 2020, Prinn et al., 2020). The NAME III v7.2 transport model is available from the UK Met Office under licence by contacting enquiries@metoffice.gov.uk. The meteorological data used in this work from the UK Met Office operational NWP (Numerical Weather Prediction) Unified Model (UM) are available from the UK Centre for Environmental Data Analysis at http://data.ceda.ac.uk/badc/ukmo-nwp (last access: 13 February 2020, Met Office, 2013). The MOZART-4 global chemistry model is available at https://www2.acom.ucar.edu/gcm/mozart-4 (last access: 13 February 2020, Emmons et al., 2010). The R-INLA v17.06.20 library is available for download from http://www. r-inla.org/download (last access: 13 February 2020, Lindgren and Rue, 2015), which includes the GMRFLib library. The EDGAR v.4.3.2 methane inventory can be downloaded from https://edgar. jrc.ec.europa.eu/overview.php?v=432_GHG (last access: 13 February 2020, Janssens-Maenhout et al., 2017). The code and data to infer emissions using the simulated data in Sect. 3.2 can be found at https://osf.io/53w96 (last access: 13 February 2020) and https://doi.org/10.17605/OSF.IO/53W96 . Any further data or code is available from the corresponding author on request.
Author contributions. LW and ZS conceived and led the implementation of the study. MR, AG and JR supported and advised on the study. KS, SO and DY made the measurements from the UK DECC