Atmospheric inversions are used to derive constraints on the net sources and sinks of CO

In recent years, there has been an increasing demand from stakeholders for inversions at higher spatial resolution (country scale), in particular in the framework of the Paris agreement. This step up in resolution is in theory enabled by the growing availability of observations from surface in situ networks (such as ICOS in Europe) and from remote sensing products (OCO-2, GOSAT-2). The increase in the resolution of inversions is also a necessary step to provide efficient feedback to the bottom-up modeling community (vegetation models, fossil fuel emission inventories, etc.). However, it calls for new developments in the inverse models: diversification of the inversion approaches, shift from global to regional inversions, and improvement in the computational efficiency.

In this context, we developed LUMIA, the Lund University Modular Inversion Algorithm. LUMIA is a Python library for inverse modeling built around the central idea of modularity: it aims to be a platform that enables users to construct and experiment with new inverse modeling setups while remaining easy to use and maintain. It is in particular designed to be transport-model-agnostic, which should facilitate isolating the transport model errors from those introduced by the inversion setup itself.

We have constructed a first regional inversion setup using the LUMIA framework to conduct regional CO

The accumulation of greenhouse gases in the atmosphere is the main driver of climate change. The largest contribution of anthropogenic activities to global warming is through the release of fossil carbon (mainly as CO

One of the main approaches to estimate land–atmosphere carbon exchanges is through “direct” ecosystem modeling, i.e., using models (numerical or statistical) which simulate, as accurately as possible (given the precision requirements of the simulation), the various carbon exchange processes (respiration and photosynthesis, but also exchanges of carbon between different parts of the plants and soils, etc.) as a function of environmental parameters (meteorology, soil characteristics, hydrology, etc.).

Alternatively, the “inverse” approach infers changes in CO

In practice, the direct and inverse approaches are complementary. Ecosystem models can provide detailed estimates of the spatial and temporal variability of land–atmosphere carbon fluxes, but since they cannot account for the full complexity of natural processes, they rely on parameterizations, which are not always accurate and can aggregate to large-scale biases. This results in large uncertainties in the total fluxes from ecosystem models

An atmospheric inverse model typically couples an atmospheric transport model (which computes the relationships between fluxes and concentrations) with an optimization algorithm, whose task is to determine the most likely set of fluxes within some prior constraints and given the information from an observation ensemble (in a Bayesian approach). In practice, inversions are complex codes that are computationally heavy. The complexity arises in a large part from the necessity to combine large quantities of information from sometimes very heterogeneous datasets (various types of observations, flux estimates, meteorological forcings, etc.). The computational weight depends largely on that of the underlying transport model, which usually needs to be run a large number of times (iteratively or as an ensemble).

In recent years, the availability of observations has grown by orders of magnitude with the deployment of high-density surface observation networks (such as the Integrated Carbon Observation System, ICOS, in Europe) and fast developments in satellite retrievals of tropospheric greenhouse gas concentrations (GOSAT, OCO-2, etc.). Meanwhile, the demand for inversions is increasing, in particular from stakeholders such as regional, national and transnational governments who are interested in country-scale inversions as a means of quantifying their carbon emissions in connection with emission reduction targets as defined in the Paris agreement

This context puts strain on the existing inverse models. The larger availability of high-quality data means that fluxes can be constrained at finer scales, but it also means that models of higher definition and precision must be used. The development of regional inversions (of varying scales) allows in theory an efficient usage of high-resolution data while preserving a reasonable computational cost, but it comes with specific challenges such as the need for boundary conditions. The demands from various stakeholders (policy makers, bottom-up modelers, media, etc.) also call for developments in the inversion techniques, with, for instance, a more pronounced focus on the quantification of anthropogenic sources

To enable such progress in the method and quality of the inversions, it is important to have a robust and flexible tool. The purpose of LUMIA (Lund University Modular Inversion Algorithm) is to be a development platform for top-down experiments. LUMIA was developed from the start as a model-agnostic inversion tool, with a clear isolation of the data stream between the transport model and the optimization algorithm in an interface module. One of the main aims is to eventually allow a better characterization of the uncertainty associated with the transport model. Strong emphasis was put on the usability (low barrier entry code for newcomers, high degree of modularity to allow users to build their experiments in a very flexible way) and sustainability of the code (small, easily replaceable, one-tasked modules instead of large multi-option ones).

This paper presents the LUMIA inversion framework and a first application of regional (European) CO

The general principle of an atmospheric inversion is to determine the most likely estimate of a set of variables controlling the atmospheric content and distribution of a tracer (typically sources and sinks, but also initial or boundary conditions) given a set of observations of that tracer’s distribution in the atmosphere. The link between the set of parameters to optimize (control vector

The observation operator

In the simplest cases, the system can be solved for

An inversion system is therefore the combination of an observation operator (i.e., transport model, sampling operator), an inversion technique, and a set of assumptions on the prior values of the variables to estimate, their uncertainties and the uncertainties of the observations. Each of these components introduces its own share of uncertainty, which makes the results harder to interpret: which features of the solution are real and which are introduced by, e.g., the transport model or incorrect assumptions on some uncertainties?

Inversion flow diagram. The green boxes represent code that is part of the

The LUMIA system is designed with the aim to provide the modularity needed to quantify the impact of the inversion design choices on the inversion results themselves. The strict isolation of the transport model also enables the transport model and the inversion algorithm to evolve independently. Unfortunately, the modularity tends to lead to an increase in the overall complexity of the code (due to the need to develop and maintain generic interfaces), which can end up being counterproductive if it limits the performances and/or usability of the system. We nonetheless believe that the benefit of a higher modularity outweighs the risks. The potential adverse effects can be mitigated by careful design choices.
The code is distributed as a single Python package with the following structure (see also Fig.

The

The

The

The

The

The package can be installed using the standard “pip” command that installs

The

Our test inversion setup is designed to optimize the monthly net atmosphere–ecosystem carbon flux (NEE, net ecosystem exchange) over Europe at a target horizontal resolution of

The inversions are performed using a variational approach, which is presented in Sect.

Regional inversion domain and location of the observation sites. The area of the dots is proportional to the number of observations available at each site (the actual number of observations is reduced by the filtering described in Sect.

We use a Bayesian variational inversion algorithm, similar to that used in TM5-4DVAR inversions

An initial “prior” run is performed to compute the concentrations (

The local cost function (

A control vector increment (

The control vector increments are computed using an external library implementing the Lanczos algorithm

The non-preconditioned observational cost function gradient

The observation operator (

The reverse operation, i.e., the construction of a 3-hourly NEE field based on a given control vector

Prescribed CO

The adjoint operation corresponding to Eq. (

The adjustment of an offset to the 3-hourly NEE ensures that the amplitude of the daily cycle of NEE remains realistic. This definition of the control vector is in some respects sub-optimal: in particular, the control vector is un-necessarily large as it contains pixels for which NEE is by definition always zero, e.g., in the ocean. However, the aim here was not to obtain the best-performing inversion, but a setup easy to develop, test and replicate that will serve as a basis of comparison in future evolutions of the setup. It is, however, already possible to run LUMIA inversions with more complex configurations, such as variable spatial and temporal resolutions.

For this first implementation of CO

For each observation

We used the global coarse-resolution TM5-4DVAR inversion system to compute the background component of the concentrations. The TM5 setup is described in further detail in Sect.

The adjoint of the operation represented by Eq. (

The forward and adjoint transport applications (i.e., Eqs.

Using pre-computed footprints greatly reduces the computational cost of the inversions, since the forward and adjoint transport model applications simply consist of a series of very simple array operations.
This is, however, at the cost of an increase in I/O and storage requirements (one footprint

The footprints (

The simulations were driven by the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim reanalysis extracted at a 3-hourly temporal resolution at a

One set of 3-hourly footprints was computed for each observation up to 7 d backward in time (less if all the particles leave the domain sooner). For plain or low-altitude sites (see Table

The footprints are stored in HDF5 files, following a format described in the Supplement.

The background CO

An alternative coupling strategy has been proposed by

A global coarse-resolution inversion is performed (with TM5 in our case), constrained by a set of prior CO

A forward run of TM5 is then used to calculate the CO

Another forward run of TM5 is used to calculate the foreground concentrations

The background CO

Note that here

The initial global inversion (step 1) was performed using the TM5-4DVAR inversion system

The TM5 inversion covers the entire period of the LUMIA inversion, plus 6 extra months at the beginning and 1 month at the end to limit the influence of the initial condition and to ensure that the background concentrations in the last month of the LUMIA inversion are well constrained by the observations (observations provide important constraints on the fluxes from the preceding month).

Since the focus of the TM5 inversion was only to produce a set of CO

The total (

Observations from the GLOBALVIEWplus 4.2 obspack product were used in the inversions

Decomposition of the modeled mixing ratio and of the observation uncertainties at two sites (Cabauw, the Netherlands, and Hegyhatsal, Hungary). The “TM5 total” line is the concentration computed in the coarse-resolution TM5 inversion from which the background (thick black line) is extracted. The LUMIA prior concentration is shown in red, and the green and orange shaded areas respectively show the contribution of the prior biosphere flux and of the other CO

The observation uncertainty matrix (

Our inversion system uses an observation operator that decomposes the background and foreground components of the CO

The instrumental error (

The model representation error cannot be formally quantified, as this would require precisely knowing the CO

As described in Sect.

This comparison is not a formal performance assessment of either TM5 or of the FLEXPART-based transport used in LUMIA, and in particular the bias should be interpreted with care as the sign of the total net foreground flux changes during the year (which mechanically leads to a change of the sign of the bias). Nonetheless, it provides an indication of the order of magnitude of the foreground model transport errors. We use the absolute differences between the two models as a proxy for

LUMIA CO

Background concentrations are expected to be accurately estimated by the global TM5 inversion when the dominant winds are from the west and any signal from a strong point CO

There is no perfect and easy way to detect these events, but one of their consequences would be a less homogeneous background CO

The inversions are performed on a subset of the observations included in the obspack product. Only observations for which the transport model simulation is expected to result in accurate concentrations are kept. In practice, one of the main difficulties of transport modeling is to correctly compute the mixing of air in the lower troposphere below the boundary layer. The lowest model representation error is expected for observations that are either within the boundary layer when it is most developed (in the afternoon) or well above the boundary layer for high-altitude sites (during the night). For each site with continuous observations, we selected only observations sampled during the time range for which the model is expected to perform the best. The time ranges are based on the

Observation sites used in the inversions. The “sets” column refers to the site selection applied in inversions SA/SP and RA/RP; set P includes only low-altitude sites and set A includes mostly high-altitude sites. Data providers are as follows. 1:

In addition to the net ecosystem exchange (NEE, net atmosphere–land CO

The NEE prior is taken from simulations of the LPJ-GUESS and ORCHIDEE vegetation models: in the OSSEs (Sect.

LPJ-GUESS

ORCHIDEE is a global process-based terrestrial biosphere model (initially described in

Fossil fuel emissions are based on a pre-release of the EDGARv4.3 inventory for the base year 2010

The ocean–atmosphere flux is taken from the Jena CarboScope v1.5 product, which provides temporally and spatially resolved estimates of the global sea–air CO

A biomass burning flux category was also included in the inversion based on fluxes from the Global Fire Emission Database v4

All fluxes are regridded on the same

Prior and prescribed CO

The background error covariance matrix (

The true uncertainty of the prior fluxes (

We performed two sets of inversions, which are listed in Table

In the OSSEs, the LPJ-GUESS NEE dataset was taken as an arbitrary truth, and a dataset of synthetic pseudo-observations was generated at the times and locations of the actual observations listed in Table

The OSSEs use this set of pseudo-observations as observational constraints and the ORCHIDEE NEE dataset as a prior. The reference OSSE, SRef, uses a prior error covariance matrix (

The second set is essentially identical to the OSSEs, except that it uses real observations and the LPJ-GUESS flux dataset as a prior.

List of inversion experiments performed. The restricted observation sets A and P are reported in Table

We first analyze the capacity of the reference OSSE SRef to reconstruct various characteristics of the true LPJ-GUESS NEE fluxes (monthly and annual NEE budget, aggregated at spatial scales ranging from the entire domain down to single pixels). Then we analyze the results of the other OSSEs to test how sensitive the results are to a range of reasonable assumptions in the inversion settings.

Figure

At the domain scale, the prior estimate for the annual NEE is very close to the “truth” (

The inversion improves the estimation of the seasonal cycle at the domain scale, with a seasonal cycle amplitude reduced to a range of

Figure

Although our control vector contains the flux estimates at the native spatial resolution of the transport model, the effective resolution of the inversion is further constrained by the covariances contained in the prior error covariance matrix

Upper row, left axis: monthly prior NEE (dashed blue line), true NEE (solid black line), posterior NEE (blue), absolute prior error (dashed orange line) and posterior error (orange) in the OSSEs. Upper row, right axis: total error increase (i.e., positive component of the error reduction, green). The SRef inversion is shown as solid lines, and the set of sensitivity tests is shown as a shaded area (prior and posterior ensemble). Second, third and fourth rows: same variables but aggregated annually.

Prior

The total NEE flux, absolute error and error increase are shown in Fig.

Inversions SE.3H, SE.3Hcst and SE.x2 were designed to test the impact of the prescribed prior uncertainty vector (e.g., diagonal of

In SE.3H, the prior uncertainty is set proportional to the sum of the uncertainties in the 3-hourly fluxes:

In SE.3Hcst, the prior uncertainty is computed as in SE.3H, but it is then scaled monthly so as to lead to a flat distribution of the uncertainties across the year.

In SE.x2, the prior uncertainty is simply doubled compared to SRef.

SE.3Hcst leads to an improved value of the annual budget of NEE at the domain scale, but this is due to a poorer estimation of the summer fluxes (since the uncertainty is lower in summer, the inversion sticks more to its prior). In contrast, SE.3H leads to further degradation of the annual budget, without achieving better performances than SRef at the monthly scale. For both inversions, this translates into a slightly larger total posterior error (2.15 and 2.20 Pg C yr

Inversions SC.100 and SC.500 use prior error covariance matrices constructed using shorter (100 km) and longer (500 km) horizontal correlation lengths (

At the domain scale, the annual budgets are nearly identical in SC.100, SC.500 and SRef. However, the total error reduction is lower in SC.100 and higher in SC.500 compared to SRef (0.78, 1.28 and 1.02 Pg C yr

Compared to SRef, SO.A uses only high-altitude observations (plus LMP and TTA as these were the only sites available in their region) and SO.P uses only low-altitude sites. In terms of the annual budget, SO.P outperforms most of the other inversions, but as for SE.3Hcst, this results from poorer flux corrections in summer rather than from a better overall reduction of the uncertainties. In contrast, SO.A leads to results very comparable to SRef at the domain scale, with a nearly identical seasonal cycle and net annual flux. The net error reduction, however, remains slightly better in SRef (see also Fig. S1 for the seasonal cycles of SO.A and SO.P).

The comparison of the prior and posterior model fit to observations is a classical diagnostic of atmospheric inversions

In the right panel of Fig.

The center panel of Fig.

.

The OSSEs presented above neglect several complications of real inversions, in particular transport model errors (the observations were generated using the same transport model as the one used in the inversions). While it is not within the scope of this paper to precisely quantify these errors, we nonetheless performed a series of inversions constrained by real observations to assess to what extent the characteristics of the inversion results identified with the OSSEs remain under a more realistic situation.

The set of inversions used here is identical to the set of OSSEs, except that real observations are used and that the LPJ-GUESS flux is used as a prior (instead of ORCHIDEE in the OSSEs). The inversion settings are reported in Table

The monthly and annual prior and posterior NEEs are shown in Fig.

These monthly flux adjustments do not result in a change in the net annual flux (

In contrast to the OSSEs, the transport model error is not zero, which may explain the slightly higher sensitivity of the results to the extent of the observation network: RO.P and RO.A differ by, on average, 0.02 Pg C month

Maps of the prior and posterior fluxes, as well as the flux adjustments obtained with RRef, are shown in Fig.

The ensemble variability (lower row of Fig.

Total prior NEE (top), posterior NEE (second row) and NEE adjustment (third row) for the RRef inversion and for three 4-month periods (left to right); bottom row: posterior ensemble spread.

Figure

At the site level, the prior biases are more variable than in the OSSEs, from

As seen with the OSSEs, a better performance in the fit to observations is not necessarily an indication of a more accurate optimized solution. The site-by-site analysis of the misfits might point to limitations of the transport operator, but a more in-depth analysis would be required, which is out of the scope of this paper.

We have set up an atmospheric inversion system based on an implementation of the variational inversion approach (Sect.

The inversion setup was designed to optimize European NEE at a monthly 0.5

The first inversion results suggest that the inversion system is working as expected. In the OSSEs, the inversions enable on average a 40 % reduction of the monthly flux error at the grid-cell scale, and the differences between the optimized fluxes obtained from different sensitivity runs are in line with what could be expected from the different settings used. However, these local error reductions can be of opposite sign and do not always add up to a net error reduction at larger scales. In particular, while the NEE estimate is generally always improved at the monthly scale, the positive corrections in summer are much stronger than the negative corrections in winter, which results in an overall degradation of the annual NEE. Using an even month-to-month distribution of the uncertainties (SE.3Hcst inversion) leads to a more realistic annual estimate, but also to a higher occurrence of local degradations of the solution, which further complicates the interpretation of the results.

This high sensitivity of the annual NEE to the different choices of prior uncertainty show that this specific metric is not well constrained in our inversions. With further tuning, it might be possible to find a formulation of the prior uncertainties that allows OSSEs to converge towards the true annual NEE (see, e.g.,

A complementary approach could be to make better use of constraints from observations outside the domain: by definition, they cannot be accounted for directly by the regional inversion, but they were used to constrain the global inversion from which the background concentrations were extracted. By construction, the flux estimate in that global inversion is consistent with observations downwind of Europe, which is not necessarily the case in our regional inversions (there is no constraint on the CO

There is an ongoing debate on the net European CO

The annual NEE is an important metric as it summarizes the net impact of an ecosystem on the carbon cycle, but there are other aspects of the solution that the inversions solve for more robustly and which are potentially equally relevant to focus on. For instance, in the OSSEs, regardless of the specific inversion setup, the posterior provides a much more realistic depiction of the seasonal cycle of NEE and of its spatial variability. The corrections to the seasonal cycle phasing and amplitude are also very consistent across the set of inversions using real observations. This type of information is potentially very relevant when assessing the validity of flux estimates from vegetation models and can help pinpoint specific shortcomings in these models. For instance, the consistent correction of winter NEE towards more positive values could hint to an underestimation of winter respiration in LPJ-GUESS. Note, though, that such a statement should be supported by a form of independent validation (such as comparisons with independent observations and with results from previous studies), which we have not provided since the focus of the study is the validation of the inversion approach and not the CO

Another important aspect is the distribution of fluxes at finer spatial scales. We see that the OSSEs systematically lead to some degradation of the solution in the parts of the domain that are very densely covered by the observation network, which is counterintuitive. It may be partly because the prior was already very close to the truth in this part of the domain, which makes it difficult for the inversion to further optimize the solution, but a complementary explanation is that the system may not have sufficient degrees of freedom to adjust the fluxes to simultaneously improve the fit at all observation sites. In particular, the optimization of monthly fluxes is very restrictive. The implementation of an optimization at a higher temporal resolution will therefore be an important next step. In addition, varying the resolution of the optimization according to the density of the observation network may also help (either by varying the resolution of the optimized fluxes or by varying the covariance lengths in the prior error covariance matrix).

Finally, the application of the same inversion approach to real observations leads to overall smaller flux adjustments than in the OSSEs. This could be a sign that the difference between the LPJ-GUESS prior (used in this second set of inversions) and the true fluxes is smaller than that between the prior and synthetic truth in the OSSEs, but the analysis of the observation misfit reduction also points to potential site-dependent transport model errors. One of the next steps towards improving our inversions will therefore have to be a thorough assessment of the transport model biases. In that sense, the flexibility of LUMIA with regards to the transport model is particularly adapted.

The inversions rely on an offline coupling between the FLEXPART Lagrangian transport model (for regional high-resolution transport) and TM5-4DVAR for providing background concentrations. The setup replicates the two-step scheme of

A succinct comparison between this TM5–FLEXPART transport model and TM5 itself was performed and is used as a proxy for the transport model error. It does not show any global bias between the two models, but a possible seasonal offset towards the month of November (Sect.

The choice of the models and of that specific coupling was driven in part from the perspective of exchanges with other groups using similar setups. In the current stage, replacing the FLEXPART footprints with footprints from another similar Lagrangian transport model (e.g., STILT,

The

From a practical and technical point of view, the setup presents the advantage of speed and scalability: the application of the transport operator is done independently for each observation and can therefore be distributed on as many CPUs as available. It consists, for each observation, of a very simple sequence of operations in both the forward (Eq.

These steps are significant overheads, but they need to be computed only once regardless of how many inversions are performed with these footprints and background concentration time series, and they also contribute to the modularity of the setup (it would be easy to replace FLEXPART footprints by footprints computed with a different Lagrangian transport model, e.g., STILT or NAME). Our setup is therefore particularly adapted for conducting large sets of inversions such as presented in Sects.

We have developed the LUMIA inversion framework and performed a first set of inversions with it. The framework was initially designed for the purpose of performing regional CO

Technically, the inversion framework presented in this paper includes three major components: the

The

For this initial paper, we have performed regional CO

Although the inversion setup lacks the maturity of established systems, it offers promising computational performance, and the results suggest interesting scientific questions regarding the capacity of regional inversion systems to constrain the annual budget of CO

In the longer term, the aim is to use LUMIA as a platform for testing innovative inversion approaches (multiple transport models, use of satellite data, multi-tracer inversions, optimization of vegetation model parameters in CCDASs, etc.). The code corresponding to the inversions in this paper is provided for the research community at

The LUMIA source code used in this paper and updates can be downloaded from the LUMIA website:

The prior and posterior fluxes have been uploaded on Figshare and can be downloaded at

The supplement related to this article is available online at:

GM and MS designed the experiments, and GM developed the code and performed the simulations. GM prepared the paper, and MS provided corrections and suggestions for improvements.

Guillaume Monteil has been funded by the Swedish Research Council project “Development of regional ecosystem-atmosphere models assimilating the ICOS data for a European-scale intercomparison of net CO

We thank Michael Mischurow for providing the LPJ-GUESS net ecosystem exchange data, Philippe Peylin for providing the ORCHIDEE NEE fluxes, and Greet Janssens-Maenhout for providing the fossil fuel product. We thank the FLEXPART and TM5 developers for providing the transport model source codes. Finally, we thank all the observation data providers cited in Table

This research has been supported by the Vetenskaprådet (Swedish Research Council) (grant no. DNR 349-2014-6576). The computations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at NSC partially funded by the Swedish Research Council through grant agreement no. 2018-05973.

This paper was edited by Tomomichi Kato and reviewed by four anonymous referees.