WOMBAT v1.0: A fully Bayesian global flux-inversion framework

WOMBAT (the WOllongong Methodology for Bayesian Assimilation of Trace-gases) is a fully Bayesian hierarchical statistical framework for flux inversion of trace gases from flask, in situ, and remotely sensed data. WOMBAT extends the conventional Bayesian-synthesis framework through the consideration of a correlated error term, the capacity for online bias correction, and the provision of uncertainty quantification on all unknowns that appear in the Bayesian statistical model. We show, in an observing system simulation experiment (OSSE), that these extensions are crucial when the data are indeed bi5 ased and have errors that are spatio-temporally correlated. Using the GEOS-Chem atmospheric transport model, we show that WOMBAT is able to obtain posterior means and variances on non-fossil-fuel CO2 fluxes from Orbiting Carbon Observatory-2 (OCO-2) data that are comparable to those from the Model Intercomparison Project (MIP) reported in Crowell et al. (2019, Atmos. Chem. Phys., vol. 19). We also find that WOMBAT’s predictions of out-of-sample retrievals obtained from the Total Column Carbon Observing Network are, for the most part, more accurate than those made by the MIP participants. 10

When the flux is modelled on a space-time grid and space-time step functions are used as basis functions, a prior distribution 100 on α that correlates the flux a priori in space and time is natural (Michalak et al., 2004;Chevallier et al., 2007;Basu et al., 2018). On the other hand, when using large spatial flux "patterns" that have temporally-limited scope, it is generally reasonable to assume that the α j 's corresponding to basis functions in different spatial regions are uncorrelated, but that those associated with the same spatial region are temporally correlated.
Irrespective of the choice of basis functions, in WOMBAT, one expresses prior judgement on α through the model α ∼ 105 Gau(0, Σ α ), where Gau(µ, Σ) is a Gaussian probability density function of a random vector with mean µ and covariance matrix Σ. The covariance matrix Σ α is parameterised through a parameter vector θ α , which typically contains variances and spatio-temporal length scales in the covariances. Expert elicitation can be used to construct prior distributions on these parameters too; we describe possible prior distributions when discussing the parameter model in the Bayesian hierarchical model in Sect. 2.4. 110 The flux model of Eq. (1) may be improved by introducing a dimension-reduction error, also known as aggregation error, on the right-hand side. This error accounts for the fact that the structured basis functions typically span a small (function) space and that, therefore, they cannot reproduce fluxes perfectly. However, since we are unable to deconvolve dimension-reduction error from other sources of error (e.g., transport-model error) in our mole-fraction data, we model the spatio-temporal variability it introduces collectively with other sources of error (Sect. 2.2).

The mole-fraction process model
Denote the true mole-fraction process at space-height-time location (s, h, t), as Y 2 (s, h, t). We only model the mole fraction within our time interval of interest T , starting at time t 0 , and therefore we express the true mole-fraction field within T as a function of the initial mole-fraction process at t 0 and the exogenous flux-process inputs in T . Specifically, define T t ≡ [t 0 , t], for t 0 ≤ t ≤ t f , as the set of all time points in T up to and including t. The field Y 2 (· , · , ·) is related to the flux process through the relationship, Y 2 (s, h, t) = H(Y 2 (· , · , t 0 ), Y 1 (· , T t ); s, h, t), s ∈ S 2 , h ∈ R + , t ∈ T , where Y 2 (· , · , t 0 ) is the mole-fraction field at time t 0 , Y 1 (· , T t ) is the flux field evolving over the whole time period T t , and H is an operator that solves the underlying chemical transport equations (that are approximately linear for long-lived species such as CO 2 ; see, e.g., Enting, 2002, Chapter 2). In practice, H is not known perfectly, but we usually have at hand a reasonable 125 approximation to it,Ĥ, which is often referred to as the chemical transport model (CTM) or simply as the transport model.
Similarly, we will usually have a reasonable approximation to Y 2 (· , · , t 0 ), which we callŶ 2 (· , ·, , t 0 ). The use ofŶ 2 (· , · , t 0 ) instead of Y 2 (· , · , t 0 ), and ofĤ instead of H, leads to a residual term v 2 (· , · , ·) that will inevitably be spatio-temporally correlated (Enting, 2002, Chapter 9). In particular, Y 2 (s, h, t) =Ĥ(Ŷ 2 (· , · , t 0 ), Y 1 (· , T t ); s, h, t) + v 2 (s, h, t), s ∈ S 2 , h ∈ R + , t ∈ T , 130 where v 2 (· , · , ·) is the residual mole-fraction process arising from the use of an approximate initial mole-fraction field, imperfect meteorology inside the transport model, imperfect transport-model parameters and physics, and potentially sub-grid-scale variation in the mole-fraction field whenĤ is a numerical model evaluated at a coarse resolution. It is difficult to place prior beliefs on the structure of v 2 (· , · , ·), which we model as statistical error, but it is known that using the approximation H introduces errors that could span hundreds of kilometres and several days McNorton et al., 2020). 135 Transport-model implementations tend to differ considerably in their vertical and inter-hemispheric mixing behaviour, and flux-inversion estimates are known to be particularly sensitive to transport-model choice (Gurney et al., 2002;Schuh et al., 2019). Note that v 2 (· , · , ·) is also likely to depend on Y 1 (· , ·), and that we ignore this dependence for model simplicity in what follows.
The assumed linear behaviour of the underlying dynamics for CO 2 is important. First, it allows us to model the effect of the 140 approximate initial mole-fraction field,Ŷ 2 (· , ·, t 0 ), separately from that of the fluxes (e.g., Enting, 2002, Chapter 10), so that Eq.

The mole-fraction data model
Fluxes cannot be observed directly at the spatial and temporal scales of interest. Flux inversion therefore proceeds by "constraining" the flux field using column-averaged retrievals or point-referenced measurements of mole fraction. We use Z 2,i to 155 denote the ith mole-fraction measurement or retrieval, where i ∈ {1, . . . , m} indexes the datum used in the inversion, and m is the number of data used in the inversion.
Point-referenced (PR) measurements of mole fraction are generally made at or near Earth's surface, using instruments on towers or in aircraft. The mole-fraction data model for these measurements is therefore given by 160 where Z 2,i is the observed mole fraction at (s i , h i , t i ), and i is mean-zero Gaussian measurement error with a model for its variance parameter presented below in Sect. 2.4. Measurement errors associated with point-referenced instruments are generally small and (usually) not correlated in space and time. Despite this, such data are not immune to the effects of spatiotemporal correlations induced by the CTM in the process model, and they may even be more susceptible than column-averaged retrievals due to the combined effect of their usual proximity to the surface and the discretisations employed when simulating 165 approximate transport (Rayner and O'Brien, 2001;Basu et al., 2018).
Column-averaged (CA) retrievals, such as XCO 2 (where "X" refers to the column averaged nature of the retrievals) from the OCO-2 satellite or the TCCON sites, are noisier than PR measurements, although TCCON less so. In particular, since the raw spectral information collected for the retrieval is affected by environmental factors such as aerosols (O'Dell et al., 2012), the errors can contain biases as well as exhibit spatio-temporal correlations. These biases can also be instrument-mode dependent 170 (e.g., land glint [LG] vs land nadir [LN] retrievals for OCO-2; see Sect. 4.4.1). The vertical averaging operation also involves an averaging kernel and an a priori bias correction, which are both specific to the retrieval, and which arise from the algorithm used for the retrieval. In general, this relationship can be expressed as where (s i , t i ) is the space-time location of the retrieval,Â i is the assumed (but necessarily approximate) observation operator 175 of the ith retrieval that column-averages the mole fraction field via an averaging kernel; b i is bias; v Z2,i is mean-zero spatiotemporally correlated random error; and i is mean-zero uncorrelated random error. The bias and error terms arise from the use of an approximate observation operator. Surface-based or remotely sensed CA retrievals are sometimes provided as "biascorrected". In this case, the data model for these retrievals is identical to Eq. (7), but with the bias component omitted.

185
Flux inversions can make use of both PR measurements and CA retrievals simultaneously, and hence it is convenient to provide a data model that encapsulates both types of measurements. It is also often the case that measurements from the same instrument type can be divided into groups that can be expected to have similar characteristics, such as group-specific bias and error properties. A given group could contain, for example, PR data from the same in situ instrument, or CA retrievals from a particular remote sensing instrument under a specific retrieval mode (e.g., land nadir). Hence, we consider the following 190 general data model, where different groups have different terms, but the overall structure is the same: where n g is the number of groups; Z 2,g contains the data in group g;Ẑ 0 2,g are the prior expected mole fractions in group g under the approximate transport model, the approximate mole-fraction field at t 0 and, if the groups consist of CA retrievals, under the approximate observation operators;Ψ g are the response functions in group g evaluated at either the PR locations 195 (in the case of PR measurements) or averaged over a column via an approximate observation operator (in the case of CA retrievals); b g ≡ A g β g are group-specific biases, with A g a group-specific design matrix and β g the corresponding weights; ξ g is the group g's vector of correlated errors; and g is the group's vector of uncorrelated errors. When the data in group g are considered to be unbiased (or already bias-corrected), the term A g β g = 0. The variables constituting β g and g , for g = 1, . . . , n g , are mutually independent within and across groups.

200
The correlation between elements of ξ g associated with measurements that are proximal in space and time is stronger than between those that are farther apart. Yet, while spatio-temporal correlation in model-data discrepancies is widely acknowledged (Chevallier, 2007;Ciais et al., 2010;Mukherjee et al., 2011;Miller et al., 2020), the general consensus is that using just the variance of ξ g to add to the variance of the uncorrelated component is sufficient (e.g., Michalak et al., 2005;Basu et al., 2013;Deng et al., 2016). However, as we show in our OSSE in Sect. 5.1, even when a measurement-error variance inflation factor is 205 estimated, predictions of the flux worsen under the assumption of uncorrelated errors if the errors truly are correlated. The main reason not to routinely model spatio-temporal correlations in global flux inversion appears to be computational; we discuss a way to rectify this bottleneck in Sect. 3.

The parameter model
The parameter model (i.e., prior distributions on parameters) is dependent on the specification of the flux process model, the 210 mole-fraction process model, and the mole-fraction data model. Here, we describe the parameter model we use in the OSSE and in the MIP comparison in Sect. 5.
Parameters of the flux process model: In the experiments given below in Sect. 5, our flux basis functions are from bottomup inventories that are divided into r s spatial regions and r t time spans. This construction yields r = r s × r t basis functions, and it naturally suggests a temporal partitioning of α into (α 1 , . . . , α rt ) , where each α k ∈ R rs , for k = 1, . . . , r t . This, in and Dahlén et al. (2020). Specifically, α k+1 = M(κ)α k + w k , for k = 1, . . . , r t , where, in our examples, we constrain the matrix M(κ) to be diagonal with non-zero elements equal to κ ≡ (κ 1 , . . . , κ rs ) ; and we let w k ∼ Gau(0, Σ w ), where the precision matrix Q w ≡ Σ −1 w is diagonal with positive elements τ w ≡ (τ w,j : j = 1, . . . , r s ) . The flux-process parameters are therefore θ α ≡ (κ , τ w ) , which in turn govern the covariance matrix of α, notated as Σ α ≡ var(α). There is an ordering of α for which Σ −1 α is block diagonal with each block a tridiagonal matrix (see Appendix A). Either sequential estimation (e.g., Kalman filtering/smoothing) or batch Bayesian updating can be used to make inference on α. In our case we use the latter, and we take advantage of efficient algorithms that are available for sparse-linear-algebraic computations.
We expect that {α k } are positively correlated in time. Therefore, for the prior distributions for {κ j }, we use the beta distribution, which has support on the interval [0, 1]: Independently, κ j ∼ Beta(a κ,j , b κ,j ), for j = 1, . . . , r s , where {a κ,j } 225 and {b κ,j } are fixed and assumed known. For prior distributions on the precision parameters {τ w,j }, we use gamma distributions with shape parameters {ν w,j } and rate parameters {ω w,j }, which are fixed and assumed known: Independently, τ w,j ∼ Ga(ν w,j , ω w,j ), for j = 1, . . . , r s . Michalak et al. (2005) suggested that variance parameters could be estimated directly in a maximum-likelihood framework.
The use of prior distributions on κ and τ w adds an extra level of flexibility and allows the modeller to express the "uncertainty 230 on the uncertainties" in an inversion framework (e.g., Ganesan et al., 2014). A related advantage is that the prior distributions can be used to provide information on the variance parameters even when the mole-fraction observations are not informative of the parameters. One could even configure these prior distributions to be extremely informative and, effectively, fix the prior model for the flux. The choice of prior distribution can be made on a region-by-region basis, as is the case in our experiments (Sect. 4.5), where land regions are given largely uninformative priors, and ocean regions are given informative ones.

235
CA mole-fraction retrieval bias parameters: In Sect. 5, we consider OCO-2 retrievals and a different set of bias parameters for each instrument mode. In this context, β g is associated with a particular instrument mode and a single element of β g captures the bias arising from, for example, aerosol presence. Experiments (see Sect. 5.3) reveal that these bias parameters are quite easily constrained in an inversion framework. When constructing the model for each β g , we first standardise each row in A g so that the covariates have unit marginal empirical variance. Then we model {β g } as follows: Independently, 240 β g ∼ Gau(0, σ 2 β I), for g = 1, . . . , n g , with σ 2 β = 100 and where I is the identity matrix. This choice for σ 2 β renders the prior distribution uninformative for the data-set sizes we consider in our experiments.
Model-data discrepancy and measurement-error parameters: The retrievals used to perform inversions often come with prescribed variances, v ps g , that account for both retrieval error, correlated or otherwise, and CTM error. For example, the MIP protocol of Crowell et al. (2019) prespecified these variances. We therefore let the total marginal variance of ξ g + g be 245 equal to γ g · v ps g , where the inflation-factor parameter γ g accounts for the possibility of misspecified variances (Worden et al., 2017). To deconvolve ξ g and g , we first construct the correlation matrix R ξg ≡ corr(ξ g ) using a spatio-temporal correlation function R ξg (· , · ; ξg ), where ξg are length scales that need to be inferred from data. We then enforce the total marginal variance constraint by defining the covariance matrices of ξ g and g to be Σ ξg ≡ var(ξ g ) = diag(ρ g γ g ·v ps g ) 1/2 R ξg diag(ρ g γ g · v ps g ) 1/2 and Σ g ≡ var( g ) = diag((1−ρ g )γ g ·v ps g ). The parameter ρ g represents the relative contribution of the correlated-error 250 variance (comprising both CTM error and, if present, correlated measurement error) to the total inflated prescribed variance.

275
3 The model-data discrepancy term The posterior distribution over all the unknown parameters in our model is given by  The log of the first two terms on the right-hand side of Eq. (10) are expressions that are commonly seen in optimisation-based flux-inversion frameworks. In particular, we have that where Z 2,p (β, α) ≡Ẑ 0 2 +Ψα + Aβ; in our case we set α p = 0; and "const." denotes a constant. The primary differences between our framework and the usual optimisation-based flux-inversion frameworks are the presence of the log-determinants, which penalise covariance matrices that have large determinants (large correlations and/or large variances), and the presence of 285 off-diagonal elements in the matrix Σ ξ + Σ . Note that the log-determinants can only be omitted when all covariance matrices are considered known a priori.
The computational complexity of the Gibbs sampler is dominated by that of the log-likelihood function, log p(Z 2 | β, α, θ ξ, ), which is the sum of the group-wise log-likelihood functions, log p(Z 2,g | β, α, θ ξ, ), for g = 1, . . . , n g . For computationally efficient inference, we must ensure that each group-wise log-likelihood function is simple to evaluate. From Eq. (9), 290 the group-wise log-likelihood is given by, for g = 1, . . . , n g , where Σ g ≡ Σ ξg + Σ g , and m g is the number of observations and/or retrievals in group g. From a computational perspective there are two components in Eq. (11) that can present difficulties. The first component is the matrix Ψ g ; this matrix is dense, and its number of elements scales linearly with both the number of data points and the number of 295 basis functions, r. Fortunately, this matrix only needs to be evaluated once using the CTM, and it can typically be generated efficiently on a large parallel computing infrastructure. The second component is the matrix Σ g , which is of size m g × m g and generally dense. Recall from Sect. 2.4 that this covariance matrix is constructed from γ g , ξg , and ρ g , which are sampled within the Gibbs sampler. Therefore, this covariance matrix needs to be re-constructed at each sampler iteration. This is infeasible for the m g ≈ 100, 000 retrievals used in this study. 300 We deal with the denseness of Σ g by using an approximation first proposed by Vecchia (1988). Here, one first orders the elements of ξ g . Then one approximates the joint distribution of ξ g as, where N g,i is the "past" neighbour set of the ith datum in group g, which contains a (very small) subset of the integers between, and including, 1 and (i − 1). It can be shown that this formulation leads to a valid distribution for ξ g that approximates the true 305 joint distribution. The approximate distribution is Gaussian with mean 0 and a sparse precision matrix, Σ −1 ξg , with the degree of sparsity closely connected to the sizes of the sets {N g,i : i = 1, . . . , m g }.
In the version of WOMBAT presented here, we consider a special case of Eq. (12), where the observations are ordered in time and where the correlation function R ξg (·, · ; ξg ) is simply an exponential function of temporal separation. In this case, one only needs to consider one (temporal) length-scale parameter per group, g,1 , for g = 1, . . . , n g . The motivations 310 for this simplification are twofold. First, the remote-sensing instrument we consider in Sect. 5 flies on a satellite that is in a sun-synchronous orbit, so that correlation in time is a proxy for along-track correlations. This model for characterising correlation in the errors was suggested and used by Chevallier (2007). Second, the use of an exponential correlation function allows the approximation in Eq. (12) to become an equality, where N g,i = i − 1. This is a manifestation of the so-called "screening effect", where the exponential correlation function induces a first-order conditional-independence structure. Now, of Eq. (11) therefore follows by expressing Σ −1 g and |Σ g | in terms of these sparse matrices. Specifically, Σ −1 g and |Σ g | are evaluated through the use of the Sherman-Morrison-Woodbury matrix identity and a matrix-determinant lemma (e.g., Henderson and Searle, 1981, and Appendix A).

320
This section gives the setup needed for Sect. 5, where we compare the inversions from WOMBAT v1.0 to those from the OCO-2 MIP (Crowell et al., 2019). In the MIP, inversions followed a predefined protocol; we therefore configured WOMBAT to follow the same protocol. The MIP prescribed the data to be used, including both preprocessed point referenced and remotely sensed data from OCO-2 between 06 September 2014 and 01 April 2017. Participants were tasked to provide flux estimates for the years 2015 and 2016. The protocol also specified a fossil-fuel flux field that had to be assumed fixed and known, in order 325 to facilitate the interpretation of the differences in flux estimates obtained by the different participants. All other modelling choices (e.g., transport model, prior fluxes) were left to individual participants.
-Biospheric flux: This flux is a result of the interaction between the atmosphere and trees, shrubs, grasses, soils, dead wood, leaf litter, and other biota. It is defined by the formula GPP − R A − R H , where GPP stands for gross primary production and represents the uptake of carbon by plants due to photosynthesis; R A is autotrophic respiration, the release 340 of carbon through respiration by plants; and R H is heterotrophic respiration, the release of carbon through the metabolic action of bacteria, fungi, and animals. For Y 0,bio 1 (· , ·) we use one of the two priors used in CarbonTracker 2019 (Jacobson et al., 2020), specifically that based on the Carnegie-Ames-Stanford Approach (CASA) biogeochemical model (Potter et al., 1993;Giglio et al., 2013).
-Biofuel emissions: These emissions result from the burning of wood, charcoal, and agricultural waste for energy, as well 345 as the burning of agricultural fields. For Y 0,bf 1 (· , ·) we use the estimates of Yevich and Logan (2003) that in turn were based on data from 1985.
-Fire emissions: These emissions correspond to those from vegetative fires (wildfires), which may start either naturally or by humans. For Y 0,fire 1 (· , ·) we use the Quick Fire Emissions Dataset, version 2 (QFED2; Darmenov and da Silva, 2015).

350
-Ocean-air exchange: These fluxes are a result of ocean-air differences in partial pressure of CO 2 . For Y 0,ocn 1 (· , ·) we use the estimates of Takahashi et al. (2002), with annual scalings reflecting increasing uptake of CO 2 as described by Takahashi et al. (2009).
A summary of these components is provided in Table 1.  (2011)

355
We divided the globe into the r s = 22 disjoint TransCom3 regions (Gurney et al., 2002), and time into the r t = 31 months between (and including) September 2014 and March 2017, and then we constructed one flux basis function for each region/month pair. This yielded r = r s × r t = 682 basis functions, each with non-zero support in a space-time volume spanning one month in time, and one TransCom3 region in space. Half of the TransCom3 regions are land regions, and half are ocean regions, so that half of our basis functions correspond to land areas, and half to ocean areas. We show the TransCom3 regions in Fig. S1, 360 and their codes and labels in Table S1, both of which can be found in the supplement. Some areas of the globe, depicted in white in Fig. S1, are assumed to have zero flux; all our basis functions are zero in these regions.
For φ j (· , ·) a basis function corresponding to a land area, we have where D φ j is the space-time volume over which the jth basis function is defined to be non-zero. For φ j (· , ·) a basis function 365 corresponding to an ocean area, we have Both Eq. (14) and Eq. (15) exclude Y 0,ff 1 (· , ·). This is done to meet the MIP requirement that fossil-fuel fluxes are treated as fixed and known (which is common practice in flux inversion; e.g., see Basu et al., 2013). The influence of fossil fuels on the mole-fraction field is therefore present only as an invariant component of the prior expectation of the mole-fraction field.

370
Note that we have used the same inventories to construct Y 0 1 (·, ·) and the basis functions φ j (·, ·), j = 1, . . . , r; this was done for convenience, and different inventories could be used if needed.
Since the spatio-temporal patterns of the fluxes within a region-month space-time volume are fixed, our construction is quite restrictive. However, the space-time patterns are dictated by those in the inventories used to construct the basis functions.
Although there is a general lack of agreement between inventories targeting the same processes (e.g., Huntzinger et al., 2017), 375 these spatio-temporal patterns would not be unreasonable and certainly more reasonable than those from generic basis functions commonly used in spatial statistics (e.g., Wikle et al., 2019, Chapter 4). This underlying assumption is often made in fluxinversion systems; for example, Jacobson et al. (2020) scale 3-hourly fluxes using weekly scale factors over 156 regions, while Basu et al. (2013) use monthly scale factors for 3-hourly fluxes over 6 • × 4 • grid cells. Constraining the spatio-temporal pattern is inferentially advantageous because it helps address the ill-posed nature of flux inversion. It is also computationally 380 advantageous because it reduces the number of unknowns for which inference is needed. The disadvantage is that the reliance on a priori structures increases the risk of dimension-reduction error because, while our basis functions allow the posterior fluxes to vary at sub-TransCom3-region scales, variations that don't follow the prescribed pattern are necessarily ignored.
Therefore, if one wishes to make inference at scales that are finer than those resolved by the scaling factors, one should introduce additional basis functions for those regions and time spans. Moreover, for processes where there is disagreement 385 (such as biogeochemical processes), one may consider running separate inversions with basis functions constructed from different inventories and carry out a sensitivity analysis. We note that there is a considerable body of work tackling basisfunction choice in the context of inversion; see, for example, Turner and Jacob (2015).

Transport model and initial condition
For our approximate transport model,Ĥ, we used the GEOS-Chem global 3-D chemical transport model, version 12.3.2 (Bey et al., 2001;Yantosca, 2019), driven by the GEOS-FP meteorological fields from NASA's Goddard Earth Observing System (Rienecker et al., 2008). We use the offline GEOS-Chem CO 2 simulation (Nassar et al., 2010), with the native horizontal resolution of 0.25 • × 0.3125 • and 72 vertical levels aggregated to 2 • × 2.5 • and 47 vertical levels for computational efficiency.
We use a transport time step of 10 minutes, and a flux time step of 20 minutes. All fluxes described in sections 4.1 and 4.2 were 400 implemented in GEOS-Chem using the HEMCO emissions component (Keller et al., 2014). GEOS-Chem can be configured to allow for a 3-D chemical source of CO 2 due to oxidation of other trace gases, but this was disabled for compatibility with the OCO-2 MIP.
The approximate initial condition,Ŷ 2 (· , ·, t 0 ), specifies the mole-fraction field at the beginning of the study period on 01 September 2014. For our initial mole-fraction field, we used a modified version of that generated by Bukosa et al. (2019). This

Data
This study uses a subset of the data sources in the MIP (Crowell et al., 2019). These include retrievals of column-averaged CO 2 by NASA's OCO-2 satellite (Eldering et al., 2017), and retrievals of column-averaged CO 2 from TCCON (Wunch et al., 2011a). As in the MIP, we use OCO-2 data to estimate CO 2 fluxes, and TCCON data to validate the estimates. The OCO-2 satellite was launched in 2014 with the goal of retrieving atmospheric CO 2 mole fractions. The on-board instrument measures radiances in three near-infrared spectral bands, which in turn are used to retrieve the CO 2 mole fraction on 20 vertical levels via a retrieval algorithm based on Bayesian optimal estimation (Rodgers, 2000;O'Dell et al., 2012). The retrieved levels are column-averaged, and then bias-corrected through comparison with TCCON retrievals (Wunch et al., 2011a). The OCO-2 420 team releases regular revisions of the retrieval dataset.
OCO-2 radiance measurements are taken in three distinct pointing modes: nadir mode, where the satellite aims at the point directly beneath it; glint mode, where the satellite points at the reflection of the sun on the surface; and target mode, where the satellite aims at a specific target, typically a ground station that also measures CO 2 mole fractions. Target observations are generally excluded from flux inversions and used only for instrument calibration. Over the ocean, nadir measurements have 425 insufficient signal-to-noise ratio to provide useful retrievals, while over land, both nadir and glint retrievals are made. There are therefore three retrieval modes to consider, land glint (LG), land nadir (LN), and ocean glint (OG). The error properties of retrievals over land differ significantly from those over ocean; in particular, the OG retrievals in the V7r dataset (the dataset used in the MIP) are not considered reliable, and were therefore excluded from the MIP (Crowell et al., 2019). We follow this protocol, and only do inversions using LG and LN data.

430
The MIP protocol dictated the use of a post-processed version of the V7r retrievals; this post-processing was done as follows.
First, an additional bias-correction term related to high-albedo measurements was applied to the XCO 2 retrievals. The biascorrected retrievals were then grouped and averaged into 1 s bins, and then they were further grouped and averaged into 10 s bins. The 10 s spans correspond to ground swathes of approximately 67 km in length. The standard deviation for each 10 s retrieval was computed as a function of the individual retrieval standard deviations, with an additional model-data mismatch 435 term added to account for the expected differences arising from transport-model errors. For the MIP, the 10 s averages were assumed to be independent but, following Sect. 2.4, we treat them as dependent. For more details on how the 10 s averages were computed, including how the standard errors were derived, see Crowell et al. (2019).
The OCO-2 retrieval algorithm produces estimates of XCO 2 . In Eq. (7), we encapsulate the retrieval algorithm in the observation operator,Â i . Appendix B gives more details on this observation operator in the case of OCO-2 retrievals.

TCCON
TCCON is a network of ground-based sites measuring solar radiances in the near-infrared spectral band (Wunch et al., 2011a).
Similar to the way OCO-2 retrievals are obtained, these measurements are converted to retrievals of column-averaged CO 2 (and other gases) using a retrieval algorithm. TCCON retrievals have been adjusted to agree with World Meteorological Organization (WMO) trace-gas measurement scales, and validated using aircraft data (Wunch et al., 2010). As both TCCON and remote 445 sensing instruments retrieve column-averaged mole fractions, the TCCON data are an important validation resource (Wunch et al., 2011b). The MIP used TCCON column-averaged CO 2 retrievals from the GGG2014 release as validation data, including all retrievals available as of July 6, 2017. In the MIP, outliers and retrievals corresponding to soundings with high solar zenith angle were removed. The remaining retrievals were then averaged over 30-minute intervals; further details are given by Crowell et al. (2019). We note that the filtering procedure used in the MIP occasionally led to long periods of time for which data were 450 considered missing. For consistency with the MIP, in this study we used the same retrievals and postprocessing methods; the stations used are listed in Table 2.
Like OCO-2, TCCON retrievals also have an associated observation operatorÂ i . This has a similar form to the operator for OCO-2, which is described in Appendix B. A detailed description of the TCCON observation operator is given by Wunch et al. (2011b).   assigned a relatively informative prior. Informative priors for ocean fluxes were deemed necessary following OSSEs performed 460 by us, which revealed that it is often not possible to reliably constrain ocean fluxes from OCO-2 land data.
Specifically, for j corresponding to a land region, we assigned a prior to κ j by letting a κ,j = b κ,j = 1 (resulting in a uniform prior over [0, 1]), and for τ w,j we let ν w,j = 0.354 and ω w,j = (1 − κ 2 j )/0.0153. This prior on τ w,j implies that 1/τ w,j , the marginal variance of the elements of the scalings in α that correspond to land regions, has 5% and 95% percentiles of 0.01 and 10, respectively, which is reasonably uninformative. For j corresponding to an ocean region, we apply an independent and 465 identically distributed Gaussian prior with mean zero and standard deviation 0.5 to α j . This is achieved by fixing κ j = 0 and τ w,j = 4.
As described in Sect. 4.4.1, the OCO-2 MIP 10 s averages come with prescribed uncertainties that include both measurement error and transport-model error. In our framework, the parameters governing these error processes are γ g , ρ g , and g,1 . For the prior distribution of γ g , we let ν γg = 1.627 and ω γg = 2.171, which lead to 5% and 95% prior percentiles of 0.5 and 10, 470 respectively, while we used a uniform prior distribution for ρ g . For g,1 , we let ν g ,1 = 1 and ω g ,1 = 1 min −1 . When doing bias correction online, we used the prior on β described in Sect. 2.4.

Computation
Computations were performed in two stages. In the first stage, the 682 mole-fraction basis functions were precomputed in the manner described in Sect. 4.2. This is the most computationally demanding step, as each basis function requires the CTM to be 475 run for several days of clock time, on average. Fortunately, since every basis function can be computed independently from all others, computing them is an embarrassingly parallel problem. Furthermore, since the basis functions are shared between the different inversions in this section, they only need to be computed once. Computation of the basis functions took seven days in total using the Gadi supercomputer at the Australian National Computational Infrastructure.
The inversions were performed in the second stage. As mentioned in Sect. 2.5, the posterior distribution for each inversion 480 was estimated using an MCMC sampling scheme, with details given in Appendix A. The sampling schemes in all cases were run for 11,000 iterations, and the first 1,000 iterations were discarded as burn-in. Convergence of the MCMC chain was confirmed by inspection of all the trace plots. In our studies, we found that the vast majority of posterior distributions were different from the prior distributions, and this is not surprising. Although complex, the model used in WOMBAT is low-dimensional, and in this setup the total number of unknowns is 732 (r = 682 of which are flux scaling factors), which is orders of magnitude 485 less than the number of LG and LN data available for the inversion (114,808 and 129,203, respectively). Specifically, these data prove to be highly informative of the unknown parameters in our model. Generally, one need not be overly concerned if a parameter is poorly constrained by the data. In such cases, a Bayesian framework such as WOMBAT returns a posterior distribution over the poorly constrained parameter that tends toward the prior distribution, which in turn encapsulates the a priori belief on the plausible range of values the parameter can take.

490
The MCMC scheme was implemented in the R programming language (R Core Team, 2020), with intensive linear algebraic computations offloaded for performance to a graphics processing unit (GPU) using Tensorflow (Abadi et al., 2016). The total running time of the sampler depended on which model assumptions were used; in particular, whether uncorrelated (15 minutes) or correlated errors (two hours) were modelled. The bottleneck leading to a drastic increase in computing time when modelling correlated errors is due to the termΨ g (Σ ξg + Σ g ) −1Ψ g in Eq. (A7), which needs to be re-evaluated at each MCMC iteration.

495
This operation scales as O(r 3 + nr 2 ); on hardware current of the year 2021, r needs to be less than 10,000 for computations to remain tractable. On the other hand, when the errors are assumed to be uncorrelated or the length scale parameters are assumed known, many matrix computations can be done once (and not at each MCMC iteration); in this case the bottleneck becomes memory, and on current state-of-the-art servers one may accommodate an order of magnitude more basis functions.
All inversions were performed on a machine with an 8-core Intel i9-9900K CPU running at 3.60 GHz and an NVIDIA RTX 500 2080 GPU.

Results
In this section we evaluate WOMBAT, first in an OSSE, where the true fluxes are assumed known and data are simulated from these true fluxes, then on actual satellite data via the MIP protocol of Crowell et al. (2019). Using an OSSE, described in Sect. 5.1, serves two purposes: first, to show that WOMBAT can indeed recover the true fluxes in a controlled environment where the "working model" is the "true model"; and second, to illustrate the importance of modelling measurement-error biases and correlated errors when these are present in the true model from which the data are simulated. Then, in Sect. 5.2, we show that WOMBAT gives similar flux estimates to those obtained by different MIP participants, and that it performs well relative to the MIP participants in reproducing out-of-sample TCCON validation data. In Sect. 5.3 we show that WOMBAT is able to estimate bias coefficients online, if needed.

Observing system simulation experiment (OSSE)
In this section we illustrate the use of WOMBAT in an OSSE, where we randomly draw flux scaling factors α s from a Gaussian distribution with mean 0 and covariance matrix 0.09I, and assume that these are the true flux scaling factors. The (simulated) true flux Y s 1 (· , ·) is given by

515
where α s j is the jth element of α s . The (simulated) true mole-fraction field, Y s 2 (· , · , ·), is then given by Finally, we simulate measurements from the mole-fraction data model in Eq. (9) at the same times and locations as the OCO-2 10 s average retrievals for the LN and LG modes, by passing Y s 2 (· , · , ·) through the corresponding OCO-2 observation operators (see Sect. 4.4.1).

520
When simulating data via Eq. (9), we assume that both b g and ξ g are present. For the bias term b g , we assume that the OCO-2 retrieval biases are a linear combination of covariates that are associated with bias in the retrieval process: -"dp": The prior-retrieval surface pressure differential; -"logDWS": The logarithm of the total retrieved optical depth associated with the aerosol types dust, water cloud, and sea salt; and 525 -"co2_grad_del": The difference between the retrieved CO 2 mole fractions at the surface and retrieval vertical level 13 (corresponding to the height with air pressure equal to 63.2% of the surface pressure, around 520-650 hPa for most retrievals).
The "official" V7r bias-correction parameters (regression coefficients) for the original Level 2 (L2) data release were obtained through offline comparison of the raw L2 product with TCCON retrievals, and they are the same for both LG and LN observa-530 tions. They are equal to 0.3, 0.028, and 0.6, for the three variables, respectively. We construct our (simulated) true biases based on these coefficients.
As discussed in Sect. 2.4, we assume that the prescribed variance of each retrieval needs to be inflated, and the inflated variance is the sum of the variance from both the correlated (ξ g ) and uncorrelated ( g ) error components. In our OSSE, we assume that the inflation factor of the prescribed variances, v ps g , is γ g = 1.25, and that the proportion of this variance allocated 535 to the correlated error process is ρ g = 0.8. We induce the correlations using the exponential covariance function described in Sect. 3 with the single length scale of the correlated component set to g,1 = 1 minute for all g = 1, . . . , n g . We specify this correlation structure to be the same for both LG and LN data.
We ran five different setups in WOMBAT. In four of these setups, bias is assumed or not assumed to be present, and errors are assumed or not assumed to be correlated, and all model hyperparameters are estimated. The fifth setup attempts to mimic 540 a conventional flux inversion system based on scaling factors; here data is assumed to be unbiased, the errors uncorrelated, and the hyperparameters are fixed to their true values. The known true flux, generated as described above, is the same between the cases, and we evaluate the ability of WOMBAT to recover the truth under each of the setups. Table 3 gives the rootmean-squared error (RMSE) and continuous ranked probability score (CRPS, Gneiting and Raftery, 2007) Table 3 shows that the WOMBAT setup that takes into account both of these features performs the best in terms of both RMSE and CRPS, while the setups that assume that neither feature is present perform the worst. For the two partially misspecified setups, the bias-corrected/uncorrelated setup outperforms 550 the not-bias-corrected/correlated setup for LG data, while the opposite is true for LN data. Notably, despite the presence of systematic biases in the simulated data, the WOMBAT setup that assumes no bias, but which takes into account correlated errors, performs overwhelmingly better than the fully misspecified model that assumes no bias and uncorrelated errors, where the hyperparameters are assumed to be unknown. The performance of the setup with hyperparameters fixed to their true values is between those of the fully-and the partially-misspecified setups, illustrating the importance of modelling and estimating 555 biases and correlations when these are indeed present.  Table 3. Root-mean-squared error (RMSE) and continuous ranked probability score (CRPS) when estimating monthly regional fluxes using LG and LN data in the OSSE of Sect. 5.1. The lower the error or the score, the better the performance. In summary, this OSSE shows that WOMBAT can recover the true flux when the assumed model is the true model. But, more importantly, the OSSE also demonstrates the importance of modelling both bias and correlated errors in these flux inversions.
If the bias parameters are omitted, fluxes can be estimated incorrectly, although this may be partially mitigated by modelling 575 correlated errors. If uncorrelated errors are assumed, estimation performance suffers, and flux estimates will likely be reported with too small an uncertainty, even if the prescribed variances are allowed to be inflated when making inference. In a real-data setting, any factors thought to introduce systematic biases should be taken into account, but this OSSE also suggests that the use of correlated errors may provide some insurance against any remaining, unmitigated, spatio-temporal biases.

OCO-2 satellite data 580
In this section we present results from WOMBAT applied to OCO-2 satellite data under the MIP protocol (Crowell et al., 2019). The protocol mandates the use of OCO-2 retrievals with the TCCON-based offline bias correction. While WOMBAT is capable of online bias correction (see Sect. 5.3), in this section we follow the MIP protocol and set the bias parameters in WOMBAT equal to zero.     Crowell et al. (2019) noted that inversions using LG data led to a smaller net annual sink (averaged between 2015 and 2016) in the northern extratropics than those using LN data. WOMBAT also finds this feature, with a 95% credible interval of the LG-minus-LN difference spanning 0.14-0.6 PgC yr −1 . This is substantially smaller than the difference between the MIP ensemble means for these modes, which is 0.7 PgC yr −1 . Fluxes in the southern extratropics, shown in the fourth row of Figs. S5 and S6 in the supplement, are dominated by ocean fluxes for which, as noted 635 above, LG and LN data provide little information.

Flux comparison on a region-time basis with the MIP
One of the most prominent features in the MIP inversion results is a seasonal cycle in the tropics that is larger than that in both the prior mean and the in situ inversions (Crowell et al., 2019). From the second and third rows of Figs. S5 and S6 in the supplement, which depict inferred tropical-zone fluxes, it can be seen that WOMBAT does not reproduce this feature for both LG and LN inversions. In the northern tropics, the WOMBAT posterior means are similar to the prior means, and the 640 credible intervals in the annual fluxes reflect high confidence. However, results from WOMBAT do corroborate those of the MIP ensemble, in that non-fossil-fuel fluxes in the northern tropics were a net source of CO 2 in 2016.

TCCON comparison
To evaluate the estimated fluxes in the OCO-2 MIP, each participant was asked to use the 30-minute-average TCCON retrievals of column-averaged CO 2 (see Sect. 4.4.2) as validation data, and compare them to the column-averaged CO 2 predicted values 645 obtained by applying the process model to the estimated fluxes with the same CTM used for the inversion. Recall that, when performing the inversions, only OCO-2 data were used, and the TCCON data were treated as unobserved and set aside for validation. For WOMBAT, we repeated this validation exercise by examining the prior and posterior distributions of Z 2,g , where each g corresponds to a different TCCON site. For each group g, we set b g = 0, since we assume that the TCCON retrievals provided are free of bias. On the other hand, we assumed that ξ g + g has variance that is group-specific and that 650 these errors are fully correlated. While this assumption is conservative, it is also reasonable, since the CTM does induce errors that are highly correlated in time at a common spatial location, as it averages all variables on a rather coarse grid when simulating atmospheric transport. We estimate the variance of these correlated errors in a group g as the average of the reported variances of each TCCON retrieval within the g-th group.
In Fig. S7 in the supplement, we compare the time series of the TCCON retrievals with the predictions from WOMBAT 655 under the prior-mean flux, the posterior distribution of flux from LG data, and the posterior distribution of flux from LN data.
Several things are of note from this figure: First, the posterior-mean estimates are a better match to the TCCON retrievals than the prior-mean estimates, which is evidence that OCO-2 data do allow for improved flux estimates to be obtained. Second, discrepancies between TCCON and predicted retrievals persist for a long time, lending credence to our assumption that errors are highly temporally correlated. Third, the 95% prediction intervals are appropriate, and largely contain the TCCON retrievals. be seen to be better than those of the MIP participants, even by straightforward visual inspection. In Table 4 we compute mean-squared error, by participant and observation mode, averaged over the 19 TCCON stations used in the MIP. WOMBAT outperforms all other participants when using this metric with LG data, and it is fourth-best when using this metric with LN data. While these results are not conclusive on the validity of the WOMBAT fluxes globally, they are encouraging, especially in light of the fact that our flux process has a relatively low-dimensional representation.

The inferred parameters
One of the key features of WOMBAT is the use of a parameter prior distribution in the hierarchical Bayesian model, which applies to both the parameters governing the flux scaling factors and to the parameters governing the model-data discrepancy and measurement-error processes. Figure 6 shows the estimated posterior means and 95% credible intervals for the autoregressive parameters κ (top) and the innovation precisions τ w (bottom), for the 11 land regions, and for inversions using LG and LN   LG LN Figure 6. Posterior means and 95% credible intervals for κ (top) and τ w (bottom, shown using a log scale), for the 11 land regions: TransCom3 region 01 (T01) to TransCom3 region 11 (T11). Estimates are shown for inversions using LG data (yellow) and LN data (dark green). The parameters governing the model-data discrepancy and measurement-error processes are ρ g , g,1 , and γ g , for g = 1, . . . , n g . Table 5 gives the posterior means, 2.5% quantiles, and 97.5% quantiles, for these parameters. Recall that the LG and LN parameters are derived from different inversions; they are not two groups in the same inversion. Nonetheless, the inferred parameters are similar between the inversions, which is reassuring. The values for γ g are indicative of a 21% variance inflation needed for both instrument modes. The length scales, g , are 1.2 minutes for the LG data and 1.1 minutes for the LN 685 data, which corresponds to around 700-800 km on the ground. Finally, the estimated values of ρ g are around 0.835, indicating that the majority of the combined model-data discrepancy/measurement-error process should indeed be attributed to the correlated component, ξ g , given in Eq. (9).

Online bias correction
The OSSE-based sensitivity study in Sect. 5.1 demonstrated that WOMBAT is able to perform online bias correction with 690 simulated data, where biases are estimated while doing flux inversion. This is different to the typical offline approach to bias correction, where retrieval biases are determined in a separate study (e.g., Wunch et al., 2011b). To comply with the MIP protocol, the online bias-correction functionality of WOMBAT was disabled in the study of Sect. 5.2, and the TCCON-based offline bias-corrected OCO-2 retrievals from the MIP were used. In order to investigate the prospect of online bias correction with real data, we repeat the inversions with online bias correction enabled, using OCO-2 10 s average retrievals both with and 695 without the TCCON-based offline corrections.
In Fig. 7 for both LG and LN, while for "logDWS" a larger correction is favoured. For "dp", WOMBAT favours a smaller correction under the LG inversion.
We repeated the online bias-correction procedure using retrievals retaining the TCCON-based offline bias correction. In this 705 setting, if the retrievals are unbiased, bias coefficients equal to zero should be inferred. The posterior densities of the estimated coefficients under this configuration are shown in the second row of Fig. 7. As expected, the magnitudes of the online-estimated coefficients are close to zero, although the credible intervals do not always include zero. Naïvely, one might expect that the coefficients would be approximately equal to the difference between the TCCON-based offline coefficients and the coefficients estimated by WOMBAT when using uncorrected data. For "dp" and "co2_grad_del", the estimated coefficients indeed have 710 the expected sign, and the expected orders of magnitude. The inferred "logDWS" coefficients are surprising, however, with an opposite sign to what was expected for the LG inversion, and with smaller magnitudes for both the LG and LN inversions. This unexpected result is a reflection of the complex interplay, and nonlinear relationships, between the parameters and processes in a regularised flux-inversion model.
Overall, the online-estimated bias correction is practically, if not statistically, similar to the TCCON-based offline correction.

715
One possible explanation for the difference between the WOMBAT and the TCCON-based estimates is that different data are used, because WOMBAT does not use TCCON data for the correction. Moreover, it is likely that the true bias coefficients are spatio-temporally varying; if this is indeed the case, the estimated biases would be affected by the spatio-temporal locations of the retrievals used to estimate them. Another reason may be that, for simplicity, we have used only a few of the most important variables that are used for offline bias correction; a consideration of all the variables may lead to slightly different results.

720
Despite this, our results show that online bias correction is possible and that further research into it as an attractive by-product of flux inversion is warranted.  LG, offline−corr. (mean, 95% cred. int.) LG, online−corr. (mean, 95% cred. int.) LN, offline−corr. (mean, 95% cred. int.) LN, online−corr. (mean, 95% cred. int.) The WOllongong Methodology for Bayesian Assimilation of Trace-gases (WOMBAT) extends the standard synthesis fluxinversion framework, which does not put prior distributions on all unknowns, to a framework based on a fully Bayesian hierarchical statistical model. It incorporates physically motivated flux basis functions and follows the standard Bayesian synthesis framework by using a CTM to compute the corresponding mole-fraction basis functions offline. The scaling factors for the basis functions are inferred from mole-fraction satellite data. WOMBAT incorporates a correlated-error term, estimates measurement biases and measurement-error scaling factors online, estimates variances and length scales of flux scaling factors, 735 and uses an MCMC scheme that allows uncertainty quantification through posterior distributions on all unknowns in the model.
We have illustrated the importance of modelling correlation and bias within a flux-inversion system, and we have shown that WOMBAT produces global and regional flux estimates from OCO-2 data that are comparable to those from the MIP participants in Crowell et al. (2019). In particular, WOMBAT outperformed the other flux models in reproducing TCCON data when using the OCO-2 LG retrievals to obtain the fluxes, and it was competitive when using fluxes obtained from the OCO-2 LN retrievals.

740
When the fossil-fuel fluxes are included, we estimate a global carbon source of 6.11 ± 0.09 PgC yr −1 using the LG data, and 6.17 ± 0.07 PgC yr −1 using the LN data. These estimates corroborate those of the MIP within uncertainty. This paper presents the general, underlying Bayesian hierarchical framework for WOMBAT v1.0, which will serve as a baseline for our flux inversions based on current and future versions of OCO-2 data. There are several potential extensions being considered; here we discuss three of the most pertinent ones. First, WOMBAT, like most other flux-inversion systems, 745 currently operates using a single CTM. This is problematic from a statistical modelling point of view, as it does not allow one to attribute the correlated error either to the measurement-error process or to the mole-fraction process. If more than one CTM is used, in principle one could statistically attribute at least part of the error due to transport. This will not necessarily solve the problem though, since CTMs tend to share common features that induce similar correlations (e.g., due to unresolved sub-grid variation). A possible way forward is to take results from offline OSSEs to estimate and fix the parameters characterising the 750 CTM error, and then to attribute any residual observed correlation to the retrieval process.
Second, WOMBAT extends a traditional state-space approach to flux inversion, which competes with adjoint-based approaches that allow for a much higher flux dimensionality. In the current version of WOMBAT, one should use the largest number of basis functions possible, given available hardware requirements, and find a compromise between the temporal and spatial resolution of the flux basis functions, such that the chosen resolution is as close as possible, or finer, than that at which 755 the flux estimates need to be produced (in our case, this was the TransCom3-by-month level). At this chosen resolution we expect WOMBAT to perform well and give predictions that are valid within uncertainty: When one has broad spatial and temporal data coverage of the response functions, as in the case of OCO-2 and a TransCom3-by-month flux resolution, Bayesian synthesis can be expected to be reasonably robust to dimension-reduction error. Further, the WOMBAT posterior distributions over the fluxes are non-Gaussian, and can accommodate skewness and long tails; this added flexibility mitigates the risk of 760 under-fitting. Moreover, one may introduce additional scaling factors and corresponding basis functions in small "regions of interest," where the fine-scale fluxes are an inferential target, and this is something we are doing in a follow-up iteration of WOMBAT. However, it would be desirable to have global inversions yield valid inferences at fine spatio-temporal scales globally. Moving forward, WOMBAT will therefore seek to introduce higher dimensionality by using flux basis functions that are at a finer scale than the TransCom3-by-month spatio-temporal basis functions that we have used here, or a fine-scale variation 765 term in the flux process model that can be used to absorb variation in the flux that cannot be explained by the basis functions.
Finally, WOMBAT currently only considers along-track correlations when modelling the correlation-error term, ξ g . However, the general framework we have proposed, based on a sparse-precision-matrix approximation, can be extended to model full space-time correlations. Dedicated investigations will be required to assess the feasibility of the implementation and the impact it will have on flux estimates.

770
Code and data availability. The code associated with this article is available at https://doi.org/10.5281/zenodo.4886771 (Bertolacci et al., 2021). This code repository includes scripts for reproducing the entire analysis in this paper, which can be applied to a variety of inverse modelling problems. Re-running the entire analysis is computationally expensive due to the need to simulate atmospheric transport under various perturbations. To help with this, we provide these outputs as a separate download at https://doi.org/10.5281/zenodo.4887044 (Bertolacci et al., 2021). These allow the inversions to be done, and for results to be generated, without the need to run the atmospheric transport Appendix A: Markov chain Monte Carlo algorithm As mentioned in Section 2.5, WOMBAT makes inference on the fluxes and the other parameters in the model using a Gibbs sampler, wherein samples of subsets of the parameters are iteratively drawn from their full conditional distributions (e.g., Tierney, 1994). Recall that the target distribution is p(α, β, κ, τ w , θ ξ, | Z 2 ), as shown in Eq. (10).

780
The Gibbs sampler in WOMBAT is as follows. Given the ith sample, {α [i] , β [i] , κ [i] , τ [i] w , θ [i] ξ, }, the (i + 1)th sample is generated sequentially in the following manner. Below, in Appendices A1-A4, we give the details for Steps 1-4. In deriving the conditional distributions, we often make use of the Sherman-Morrison-Woodbury matrix identity and a matrix-determinant lemma (e.g., Henderson and Searle, 1981). The former identity states that, for conformable matrices A, U, C, and V, whenever the required inverses exist, while the latter lemma states that |A + UCV| = |C −1 + VA −1 U||C||A|.
As described in Section 3, Σ is diagonal and, under Markov assumptions, Σ ξ has a sparse inverse. These properties allow us to efficiently compute the required mean and covariance matrix through use of the Sherman-Morrison-Woodbury matrix 805 identity. Once these are computed, sampling from Eq. (A1) is straightforward.
To generate samples from Eq. (A2), we therefore successively sample κ j , j = 1, . . . , r s , from its full conditional distribution (Eq. A3). Equation (A3) does not correspond to any standard distribution, so we use slice sampling (Neal, 2003) to sample from it.
The matrix operations in Eq. (A7) may be computed efficiently using the matrix identities given at the start of this section.

835
Since Eq. (A7) does not correspond to any standard distribution, we use slice sampling to sample from it, for g = 1, . . . , n g .

Appendix B: Observation operator for OCO-2 retrievals
The retrieval algorithm used for OCO-2 takes spectral data as input and returns CO 2 mole fractions on 20 vertical levels as output via an optimisation procedure. The CO 2 mole fractions at these 20 vertical levels are subsequently used to compute a column-averaged CO 2 that we associate with a single retrieval. For the ith retrieval, denote the vector of retrieved mole 840 fractions as Z r 2,i . Following the argument given by Rodgers and Connor (2003) and Connor et al. (2008), the retrieved mole fractions are given by where Y 0,r 2,i = (Y 0,r 2,i,1 , . . . , Y 0,r 2,i,20 ) is the vector of prior-mean mole fractions used by the retrieval algorithm, specific to the ith retrieval (these are unrelated to the prior mean of the mole-fraction field used in our inversion, Y 0 2 (· , · , ·)); A i is the "averaging 845 kernel matrix"; Y 2,i ≡ (Y 2 (s i , h i,k , t i )) 20 k=1 is the true mole fraction at the 20 vertical levels for the retrieval at geopotential heights h i,k , k = 1, . . . , 20; and ε r i is a vector of measurement errors (which may also include systematic biases and errors induced by nonlinearity in the inversion process). The column-averaged retrieval is then where c i ≡ (c i,1 , . . . , c i,20 ) are quadrature weights used to estimate the column average, and Y 0,rc 2,i ≡ c i Y 0,r 2,i is the retrieval 850 prior mean column-averaged CO 2 . Define a i ≡ (a i,1 , . . . , a i,20 ) , where a i,k ≡ 1 c i,k (c i A i ) k , k = 1, . . . , 20, and (c i A i ) k denotes the kth element of c i A i . The vector a i is the "averaging kernel vector" of the ith retrieval, which is available in the official releases of the OCO-2 data. It follows that the observation operator in Eq. (7) is defined aŝ

855
Note that we do not explicitly account for the error term c i ε r i in our definition forÂ i . This is because it will be absorbed by the error terms we use in the data model (Eq. 8).
The averaging-kernel-vector elements {a i,k } are typically close in value to 1. They reflect the fact that the retrieval is not equally sensitive to the mole fractions at all the vertical levels. At levels where there is less sensitivity (i.e., values < 1), the retrieval prior-mean mole fraction will be assigned greater weight when producing the column-average CO 2 estimate (Rodgers, retrievals, and Beata Bukosa, Nicholas Deutscher, Anita Ganesan, Peter Rayner, Andrew Schuh, and members of the OCO-2 Flux Team for invaluable feedback on this research. This work was supported by resources provided by the Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia. The work was also supported by the NCI National Facility systems at the Australian National University through the National Computational Merit Allocation Scheme supported by the Australian Government.