Articles | Volume 15, issue 14
Geosci. Model Dev., 15, 5511–5528, 2022
Geosci. Model Dev., 15, 5511–5528, 2022
Development and technical paper
19 Jul 2022
Development and technical paper | 19 Jul 2022

Improving the joint estimation of CO2 and surface carbon fluxes using a constrained ensemble Kalman filter in COLA (v1.0)

Improving the joint estimation of CO2 and surface carbon fluxes using a constrained ensemble Kalman filter in COLA (v1.0)
Zhiqiang Liu1,2, Ning Zeng3,4,1, Yun Liu5,6, Eugenia Kalnay3, Ghassem Asrar7, Bo Wu1, Qixiang Cai1, Di Liu8, and Pengfei Han9,1 Zhiqiang Liu et al.
  • 1State Key Laboratory of Numerical Modeling for Atmospheric Sciences and Geophysical Fluid Dynamics, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing, China
  • 2College of Earth and Planetary Sciences, University of Chinese Academy of Sciences, Beijing, China
  • 3Department of Atmospheric and Oceanic Science, University of Maryland, College Park, Maryland, USA
  • 4Earth System Science Interdisciplinary Center, College Park, Maryland, USA​​​​​​​
  • 5International Laboratory for High-Resolution Earth System Model and Prediction (iHESP), Texas A&M University, College Station, Texas, USA
  • 6Department of Oceanography, Texas A&M University, College Station, TX, USA
  • 7Universities Space Research Association, Columbia, Maryland, USA​​​​​​​
  • 8Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, China
  • 9Carbon Neutrality Research Center, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing, China

Correspondence: Zhiqiang Liu ( and Ning Zeng (


Atmospheric inversion of carbon dioxide (CO2) measurements to better understand carbon sources and sinks has made great progress over the last 2 decades. However, most of the studies, including a four-dimensional variational ensemble Kalman filter and Bayesian synthesis approaches, directly obtain only fluxes, while CO2 concentration is derived with the forward model as part of a post-analysis. Kang et al. (2012) used the local ensemble transform Kalman filter (LETKF), which updates the CO2, surface carbon flux (SCF), and meteorology fields simultaneously. Following this track, a system with a short assimilation window and a long observation window was developed (Liu et al., 2019). However, this data assimilation system faces the challenge of maintaining carbon mass conservation. To overcome this shortcoming, here we apply a constrained ensemble Kalman filter (CEnKF) approach to ensure the conservation of global CO2 mass. After a standard LETKF procedure, an additional assimilation is used to adjust CO2 at each model grid point and to ensure the consistency between the analysis and the first guess of the global CO2 mass. Compared to an observing system simulation experiment without mass conservation, the CEnKF significantly reduces the annual global SCF bias from  0.2 to less than 0.06 Gt and slightly improves the seasonal and annual performance over tropical and southern extratropical regions. We show that this system can accurately track the spatial distribution of annual mean SCF. And the system reduces the seasonal flux root mean square error from a priori to analysis by 48 %–90 %, depending on the continental region. Moreover, the 2015–2016 El Niño impact is well captured with anomalies mainly in the tropics.

1 Introduction

Carbon dioxide (CO2) plays a crucial role in climate systems and projected warming (Friedlingstein et al., 2006). Approximately half of the fossil fuel and cement emissions are absorbed by the land and ocean, leaving the remaining half in the atmosphere (Friedlingstein et al., 2019). Without effective reduction in those emissions and advanced technologies for carbon capture and storage, the warming trend may exceed the tipping point with potential adverse impacts on the health of the environment, people, and the global economy. Recently, many countries, in places such as Asia, Europe, North America, and South America, have announced their pledge to achieve carbon-neutral targets by the middle of this century. To successfully implement these national pledges, accurate quantification of the spatial and temporal dynamics of Earth surface carbon fluxes (SCFs) and closing the global carbon budget are essential. There are the following two principal approaches for SCF estimation: top down and bottom up. The bottom-up estimates are obtained from process-based or empirical carbon cycle models (Kondo et al., 2020; Zeng et al., 2005; Denning et al., 1996). However, there is still a missing or residual carbon sink that is necessary to close the global carbon budget with bottom-up approaches because of our limited understanding of the natural carbon cycle and the lack of observations to validate the models on a global scale. The top-down approach optimizes the SCF by fusing atmospheric CO2 concentration measurements with the modeled CO2 using techniques, such as the Bayesian synthesis approach (e.g., Rödenbeck et al., 2003; Gurney et al., 2004), data assimilation (DA), such as ensemble Kalman filters (EnKFs; e.g., Peters et al., 2005, 2007; Feng et al., 2009; Zupanski et al., 2007; Lokupitiya et al., 2008; Bruhwiler et al., 2005), and variational methods (e.g., Baker et al., 2006b​​​​​​​; Basu et al., 2013; Chevallier et al., 2010a; Liu et al., 2014). In recent decades, global CO2 observation networks from the surface to the air and space have provided large amounts of high-precision atmospheric CO2 concentration data (Crevoisier et al., 2004; Crisp et al., 2017; Tans et al., 1990; Yang et al., 2018; Yokota et al., 2009), which greatly enhance the quality of top-down estimates.

Because CO2 is a long-lived tracer gas, remote observations can play an important role in estimating the local SCF. Thus, to compromise the sparse and unevenly distributed feature of the global CO2 observation network, most top-down systems do not localize the observations and set a very long assimilation window (AW) that ranges from several months to 1 year (Chevallier et al., 2010a; Peters et al., 2007; Rödenbeck et al., 2003; Liu et al., 2014). However, the atmospheric transport model (ATM)-generated atmospheric CO2 will deviate from a Gaussian distribution with a long AW. Both the EnKF and variational methods use the linear hypothesis to constrain the system. To obtain the optimal assimilation, the forecast uncertainties are expected to remain linear. It is very difficult to hold the linear perspective with a long AW. Therefore, only the SCF is considered a valuable product, while the CO2 concentration is derived with the forward model as part of a post-analysis.

Instead of treating CO2 as a byproduct of the inversion, Kang et al. (2011, 2012) developed a top-down carbon data DA system with a short AW (6 h) to simultaneously estimate SCF and CO2 concentrations. The system includes an online atmospheric general circulation model (AGCM) in which meteorological observations (wind, temperature, humidity, and surface pressure) and CO2 concentration observations are assimilated simultaneously to account for the uncertainties in the meteorological field and their impact on the transport of atmospheric CO2. Following this effort, we have developed a local ensemble transform Kalman filter (LETKF)-based carbon DA system (LETKF_C) to generate meaningful CO2 analysis using a combination of a short AW (e.g., 1 d) and a long observation window (OW; e.g., 7 d; Liu et al., 2019), and the observations within the long OW are assimilated to update the CO2 state and SCF parameter at the end of the short AW. Thus, the same observation will be assimilated multiple times. Although the online estimation of the transport uncertainty is useful and attractive, its computational cost is very high. Furthermore, tremendous effort is required for the assimilated meteorological fields to reach the quality of the state-of-the-art reanalysis datasets (e.g., MERRA, NCEP, and ECWMF). Thus, the LETKF_C system replaces the AGCM with an offline ATM driven by the reanalysis data to improve the accuracy of transport and to reduce the expensive computational cost. This approach does not include the estimation of transport uncertainties related to the meteorological field, which will lead to additional errors for SCF estimation in reality. The impact is assumed to be small but remains to be validated in the future. We can include the meteorological field uncertainties by driving the ATM using different reanalysis products for different ensemble members. Such a capacity is under development. In the context of observation system simulation experiments (OSSEs), both systems (Kang et al., 2012, 2011; Liu et al., 2019) successfully reproduced the global SCF seasonal cycle and annual SCF pattern at grid point resolution without direct a priori SCF information.

Based on the LETKF_C system, we developed a new system named Carbon in Ocean–Land–Atmosphere (COLA) with an improved framework. A major improvement for the COLA system is the conservation of carbon mass. Data assimilation (DA) systems use observations to statistically constrain the model state. The DA update process could not follow the model dynamic principle perfectly, hence leading to a loss of mass and energy conservation and dynamic balances (Zeng et al., 2017, 2021a, b; Greybush et al., 2011). The impact of such imbalances could be reduced or eliminated by model dynamic adjustment in a short period, but the impact of additional mass gain or loss could last for a long time. For example, mass conservation is crucial for carbon cycle and hydrological studies (Pan and Wood, 2006). The COLA system follows the same process as the DA process to update atmospheric CO2 directly using observations. Therefore, the carbon mass conservation will not hold within a DA cycle. To overcome this limitation, a constrained ensemble Kalman filter (CEnKF) step was applied to the COLA system. The CEnKF was originally used in the hydrological field for DA as a second constraining optimizer (Pan and Wood, 2006). The basic concept for CEnKF is to constrain the global analysis mass back to the first guess. With the CEnKF, COLA rebuilds carbon mass conservation and enhances the CO2 and SCF estimation.

This paper is organized as follows: Sect. 2 briefly describes the global COLA system and CEnKF. Section 3 describes the OSSE experimental design. Section 4 presents the results and analysis in the context of observing system simulation experiments (OSSE). A summary and discussion are presented in Sect. 5.

2 Methods

2.1 GEOS-Chem model

COLA uses GEOS-Chem as the ATM to simulate the global atmospheric CO2 variation (Nassar et al., 2013). In this study, we use the Modern-Era Retrospective analysis for Research and Applications Version 2 (MERRA-2; Gelaro et al., 2017) meteorology reanalysis to drive version 13.0.2 of GEOS-Chem at a 4× 5 horizontal resolution (native resolution of 0.5× 0.625) with 47 vertical levels ( 30 levels below the stratosphere). The time step interval of GEOS-Chem is set to 30 min for both chemical processes and transport.

Since CO2 is a passive tracer in GEOS-Chem, and our assimilation system does not consider the uncertainties of metrological reanalysis, we treated different CO2 ensemble members as different CO2 tracers in GEOS-Chem. Therefore, we produced the ensemble simulations by running a single GEOS-Chem, instead of GEOS-Chem ensembles, which significantly saved computational resources (we acknowledge Fuqing Zhang for the idea; personal discussion, 2017​​​​​​​).

To simulate the atmospheric CO2 concentration evolution, GEOS-Chem is forced with the SCF, including land–atmosphere fluxes (FTAs), ocean–atmosphere fluxes (FOAs), and fossil fuel emissions (FFEs). The total SCF at each model grid point is the parameter to be estimated in the COLA system.

2.2 A four-dimensional local ensemble transform Kalman filter (4D-LETKF)

Following Liu et al. (2019), we used the four-dimensional local ensemble transform Kalman filter (LETKF) as the DA algorithm. The LETKF algorithm is an ensemble square root Kalman filter developed by Hunt et al. (2005, 2007). This algorithm is widely used for DA, including several operational centers, and it has been applied in joint state and parameter DA problems (Ruiz et al., 2013), such as carbon data assimilation (Kang et al., 2012, 2011). Similar to the other EnKF algorithms, LETKF combines background (model forecast) and observations statistically based on their error covariance to generate an analysis with reduced uncertainties. The background and analysis error uncertainties are represented by the perturbations of background (xb=xkb-xkb) and analysis (xa=xka-xka) ensembles, respectively. xkb and xb are the background and its mean, respectively. xka and xa are the analysis ensemble and its mean, respectively. ykb and yb are the forecast observations and their mean, respectively. ykb=h(xkb) projects the background from the model space to the observation space with the observation operator h. In this study, h is a linear interpolation operator that projects the modeled CO2 concentration to the spatiotemporal locations of yo. The overall LETKF algorithm is summarized as follows:


Here Xbw is the ensemble mean analysis increment applied to each ensemble member, with R denoting the observation error covariance, where P̃a is the analysis error covariance, K is the number of ensemble members, and I is the identity matrix. LETKF simultaneously assimilates all observations within a certain distance at each model grid point, which defines the localization scale. Hunt et al. (2005) introduced a four-dimensional version, and Hunt et al. (2007) provided detailed documentation of the 4-D LETKF that we use in this study.

Previous work has shown that the LETKF can be successfully applied to estimate SCFs and CO2 concentrations simultaneously using atmospheric CO2 observations (Kang et al., 2012, 2011; Liu et al., 2012; Liu et al., 2019). The SCFs (f) are treated as parameters augmenting the state vector c (the prognostic variable of atmospheric CO2), X=[c,f]T. An EnKF usually assumes the estimated parameters to be special variables that are stationary during model integration. Therefore, the first guess of the parameter is the persistence of their analysis from the last analysis cycle (Fig. 1). Although the SCFs evolve with time, parameter estimation can still produce decent estimation if the SCFs are slowly evolving and the AW is short enough (Ruiz et al., 2013). To accelerate the spin-up and reduce the high-frequency noise generated from atmospheric synoptic variabilities, our system uses a unique setting of LETKF with a short AW of 1 d and a long observation window (OW) of 7 d; therefore, we update the atmospheric CO2 and SCF on a daily basis using the observations within the time window of 7 d (Fig. 1). Please see Liu et al. (2019) for details of this LETKF configuration.

Figure 1Flowchart of the COLA system.


2.3 Constrained ensemble Kalman filter (CEnKF)

As previously discussed, the LETKF and most of the ensemble-based Kalman filters do not maintain the physical bounds of the state and conservation of the physical laws of state dynamics (Zeng et al., 2017). Since the LETKF process destroys the mass conservation (Fig. 2), we applied a CEnKF to constrain the global mass of state c after the LETKF process (Fig. 1). The concept was based on Pan and Wood (2006), who applied the CEnKF to balance the water budget for each ensemble member. In our system, we choose to only rebuild the mass balance on the ensemble mean instead of on each ensemble member because the inflation step can destroy the balance within each ensemble member. Moreover, the computational cost can be significantly reduced.

The mass conservation is destroyed by adding or reducing mass during DA updating. We can rebuild the mass conservation by moving the mass back to its original values (before the DA update). Our target is to retain the global mass conservation as follows:

(5) m a - m b = 0 ,

where ma and mb are the expected analysis and the first guess global CO2 mass, respectively. The transformation from the CO2 concentration at each grid to a global CO2 mass can be expressed as follows:

(6) m = h c ,

where h is the linear observation operator that transforms the global 3D CO2 concentration to the global CO2 mass. At each grid, the operator is proportional to the air mass. Now the question becomes how to distribute the expected global total mass adjustment to each model grid point. CEnKF achieves this distribution by applying an EnKF step with the mb as observations and takes the constraint as the observation equation. We add the constraint to the common EnKF formula as follows:

(7) c a + = c a + E a ( h E a ) T ( h E a ( h E a ) T + r ) - 1 h c b - h c a ,

where ca+ is the CEnKF CO2 ensemble mean. ca is the LETKF ensemble mean of CO2. Ea is the ensemble perturbation of CO2 after the LETKF process. CEnKF defines the observations as the truth with r=0 to meet the mass conservation purpose. Therefore, the EnKF equation is written as follows:

(8) c a + = c a + E a ( h E a ) T ( h E a ( h E a ) T ) - 1 h c b - h c a ,

which is the original EnKF algorithm (Evensen, 1994). The perturbed observation step is not needed with r=0. Note that we are not using LETKF here because it cannot handle the condition of r=0 (Eq. 3). Generally, the CEnKF distributes the global mass adjustment to each grid point by taking advantage of the ensemble perturbation Ea given by the LETKF. The grid with a larger ensemble spread will likely have more mass constraints.

Figure 2Schematic illustration of the mass imbalance problem.


2.4 Inflation

Inflation and localization are commonly used techniques to improve the filter performance for EnKF applications. The ensemble is expected to underestimate the forecast uncertainties because of the error sources, such as limited ensemble size and model deficiencies. The reduced ensemble variance can degrade the filter performance and, in severe cases, can lead to filter divergence where the filter will reject the observations. Inflation plays an important role in compensating for the reduced ensemble variance, which can be separated into the following three categories: multiplicative inflation, relaxation inflation, and additive inflation (Anderson, 2007; Mitchell and Houtekamer, 2000; Zhang et al., 2004; Whitaker et al., 2008; Whitaker and Hamill, 2012; Miyoshi, 2011). We update our inflation strategy from Liu et al. (2019) to better fit the mass conservation requirement. The original additive inflation for CO2 in Liu et al. (2019) does not preserve the carbon mass conservation in the atmosphere. Therefore, for CO2, we apply the relaxation to prior spread (RTPS) scheme from Whitaker and Hamill (2012), which combines the relaxation to prior perturbation (RTPP) logic from Zhang et al. (2004) into the multiplicative inflation approach as follows:


where σ is the ensemble spread, and α is the scaling factor. In this study, we set α to 0.7.

We retained the additive inflation for the SCFs as in Liu et al. (2019) with a slight adjustment. We treat the SCFs as the parameter for estimation in our system. However, the SCFs are the boundary forcing with temporal evolution that is missing in our dynamic model. The additive inflation scheme was designed to add the missing uncertainties into the system, which prevents the effective ensemble dimension from collapsing toward the dominant directions of error growth (Whitaker et al., 2008). Since we do not know about the SCF uncertainty globally or at each grid, we use the a priori SCF annual cycle as the benchmark. For FTA, the added perturbation fields are selected randomly from SiB3 (Denning et al., 1996). After each LETKF process, the ensemble spread at each point is inflated back to the predefined uncertainty by adding random fields selected from prior SCF within 1 year centered at the assimilation time (Kang et al., 2012; Liu et al., 2019). Instead of randomly perturbing the ensembles based on a distance decaying model (Wu et al., 2013), the additive inflation takes advantage of the a priori randomness as follows:

(11) f k a = f k a + τ f k p - f p ,

where the subscript k denotes the kth ensemble member, and the superscript p denotes the sampled a priori SCF. τ is the factor that rescales the sample spread to the predefined magnitude. We retain the same localization scheme and ensemble size of 20 as in Liu et al. (2019).

3 Design of the observing system simulation experiment (OSSE)

3.1 Prescribed fluxes and initial conditions

The experiments span from 1 October 2014 to 1 January 2018. In this paper, we only focused on the FTA. The FFE and FOA are treated as background fluxes that are the same in the assimilation run and nature run (Table 1). The FFE is based on the monthly Open-source Data Inventory of Anthropogenic CO2 emissions (ODIAC; Oda and Maksyutov, 2011). It is disaggregated from monthly to hourly, based on the TIMES method (Nassar et al., 2013). We use a monthly pCO2 interpolated FOA product (Gruber et al., 2019). We use the daily FTA simulated by the VEGAS model (Zeng et al., 2005) as the true FTA in the nature run. In contrast, we used the daily FTA modeled by SiB3 in 2008 as a priori for all of the years in the control and assimilation runs (Denning et al., 1996). Moreover, the annual mean of SiB3 is subtracted. Thus, there is no interannual variation or mean source–sink information coming from the a priori FTA. As mentioned in Sect. 2.4, the a priori SCF is used to inflate the SCF ensembles.

The nature run and control run are initialized on 1 January 2014, with a globally uniform 3-D concentration of 397.51 ppm (parts per million) based on the NOAA-ESRL global monthly mean averaged concentration over marine surface sites (Tans et al., 1989). To create the initial ensemble CO2 and FTA conditions for assimilation runs on 1 October 2014, we randomly select 20 nonrepeating CO2 and FTA pairs from the control run between 15 September and 15 October 2014. The ensemble mean initial SCF and CO2 conditions are significantly larger than the truth over most of the northern extratropic regions (Fig. A1). Moreover, since the initial CO2 state shows a clear bias pattern, constraining the mass at the initial time can degrade the flux estimation. Thus, we spin up the assimilation runs from 1 October 2014 to 1 January 2015 to obtain a jointly stable CO2 state and SCF parameter without applying the CEnKF.

3.2 Pseudo-observations

The pseudo-observations are sampled from the true CO2 field generated by the nature run at the specific time and location of the real surface and satellite observations, and then random errors are added based on the error scale of the real observations. The CO2 GLOBALVIEWplus v6.0 ObsPack is the main source of surface data (Schuldt et al., 2020). Since there are few stations over Siberia, we included several tower observations obtained by the National Institute for Environmental Studies (Sasakawa et al., 2010). For satellite data, we used Orbiting Carbon Observatory-2 (OCO-2) data (Crisp et al., 2017). Since we are focusing on the CEnKF impact, we considered only the experiments that are based on both surface and OCO-2 observations, and the influence of the two different observation networks is not considered. We plan to address the potential effects of such differences in future studies.

Figure 3The location of the surface pseudo-observations. The dots are the locations of the GLOBALVIEW-CO2 observations, and the pentagram is the location of the AMES tower observations. The colors indicate the representative errors assigned to each station.

The observation error is an essential part of the assimilation. Generally, the error is the sum of the instrument error (RI) and representative error (RR). For the surface observations, to estimate RR at each site, we followed Chevallier et al. (2010a), who used the standard deviation of the detrended and deseasonalized data as a proxy. Overall, the error ranged from less than 0.1 ppm near the South Pole stations to over 10 ppm at some northern midlatitude tower stations (Fig. 3).

The original OCO-2 sampling pixel is relatively small ( 3 km) compared with the model grid size. Moreover, there are approximately 400 soundings along every latitude. Thus, appropriate data thinning and filtering are necessary. In addition, the retrieval error needs to be estimated. We used post-processed OCO-2 level 2 data based on a new exponentially decaying error correlation model with a length scale computed from airborne lidar measurements (Baker et al., 2022). Since ocean glint observations have system bias compared with land observations (Crowell et al., 2019), only the land nadir and land glint data are assimilated (Fig. 4).

Figure 4The daily pseudo OCO-2 land nadir and land glint observation numbers along the 4 latitude band.


4 OSSE results

In this section, we present the seasonal cycle (SC) and interannual variation (IAV) in the FTA estimated by the COLA system. Then, we systematically investigate the impact of CEnKF on the estimation of FTA and CO2 on the annual scale by comparison with an experiment without CEnKF (Table 1).

Table 1Summary of the nature run, control run, and assimilation run experimental setup. We conducted three different assimilation experiments using LETKF (L), LETKF together with CEnKF applied to ensemble mean (LC), and LETKF together with CEnKF applied to ensemble members (LCE). Note that the interannual variation and annual mean source and sink information in the SiB3 have been subtracted.​​​​​​​

Download Print Version | Download XLSX

4.1 Seasonal cycle and interannual variation

As in Liu et al. (2019), only the global-scale analysis is presented, and the regional analysis is not discussed. Thus, before discussing the CEnKF impacts on flux and CO2 estimation, we would like to show the overall performance of the COLA system with improved algorithms from the global to regional seasonal cycle (SC) using EXP-LC as an example. Here, EXP-L is not directly shown because the difference between EXP-L and EXP-LC is not visible at the seasonal scale. The main reason is that CEnKF is applied to CO2 but not the flux, and the flux is constrained indirectly using the covariance between CO2 and flux. Another reason is that the magnitude of the FTA SC amplitude is much larger than the annual mean. One would expect a clearer impact of CEnKF if the SC amplitude is small.

Globally, the larger a priori SC amplitude is corrected, and the SC phase is also fixed (Fig. 5a). The global or regional analysis root mean square error (RMSE) for FTA is calculated as follows:

(12) RMSE reg a = E T ( ( FTA reg a ( T ) - FTA reg t ( T ) ) 2 ) ,

where reg and T indicate the region and time, respectively. FTArega(T) and FTAregt(T) indicate the regional total analysis and true FTA at a given time T, respectively. ET is the temporal average. The RMSE of the a priori FTA, RMSEregp, can be calculated using a similar formula. Furthermore, we define the root mean square error reduction (RMSER) from a priori to analysis as follows:

(13) RMSER reg a = RMSE reg p - RMSE reg a RMSE reg p .

The RMSER of the global daily FTA is 28 % (Fig. 5b). While zooming into the continental regions monthly, the RMSE over all these regions significantly decreases (Figs. 6 and 7). This reduction ranges from 43 % to 90 % (Fig. A2). Over the northern extratropical regions, where there are dense observations, the reduction exceeds 71 %. The most significant error reduction occurs over the Eurasia boreal region. Over the tropical and southern extratropical regions, the RMSER is smaller (Fig. A2). Since there are fewer observations, obtaining an accurate estimation over those regions is more challenging. However, the SC amplitude and phase are corrected. Over northern Africa, the analysis FTA is close to the a priori FTA during the growing season. Over southern tropical South America, the SC phase shows a 1-month lag, while the SC amplitude is fixed. Such a temporal lag is not well understood but is likely due to the sparse observations over tropical South America.

Figure 5(a) The global daily FTA of truth (black), a priori (gray), and analysis of EXP-LC (red). The vertical line on 1 January 2015 indicates the start of assimilation. Before 1 January 2015, the system spin-up lasted for 3 months. The pale gray and pale red shadings are the ensemble spread of the a priori and analysis, respectively. (b) The difference compared with the truth. The RMSE at the bottom right corner is the root mean square error of the analysis (red) and the a priori (gray) calculated based on Eq. (12).


Figure 6The FTA seasonal cycle (SC) and interannual variation (IAV) in truth (black), a priori (gray), and analysis of EXP-LC (red) over the Northern Hemisphere regions and Australia. The solid lines marked with open circles are the SC. The dashed lines are the IAV calculated from the original SC using a 12-month moving average method. The RMSE in the top-right corner is the SC root mean square error of the analysis (red) and the a priori (gray) calculated based on Eq. (12). The correlation (CORR) in the bottom-right corner is the IAV correlation between the analysis and the truth (the red dashed line and the black dashed line). Note that there is no IAV in the a priori. Thus, there is no IAV correlation between the a priori and the truth.


Figure 7Same as in Fig. 6 but for the tropical regions.


Focusing on the grid scale, the bias of EXP-LC compared with the a priori is significantly reduced during all the seasons (Fig. 8). The largest difference in the a priori compared with the truth occurred over the Northern Hemisphere forest region, where the SC amplitude is large. A significant bias can also be observed from the regional total time series (Fig. 5). Over the tropical region, the a priori distribution is also significantly biased, especially for tropical South America and northern Africa. In contrast, the bias of EXP-LC is much smaller and evenly distributed. In addition, the bias is comparatively larger during summer than in the other seasons.

Furthermore, we analyze the IAV in the FTA, which is calculated using the 12-month moving average method. Since the OSSE period covers the 2015–2016 El Niño event, the tropical FTA of truth shows a large IAV. In contrast, it is smaller over the Northern Hemisphere. The EXP-LC showed that the IAV is well reproduced with anomalies mainly in the tropics (Figs. 6 and 7). However, the IAV may leak between adjacent large continental regions. For example, the EXP-LC shows an upward trend compared with the truth over the Eurasia boreal region and a downward trend over Europe from January 2017 to June 2017. Since there is no IAV originating from the a priori FTA, we hypothesize that the IAV estimation could be improved using a better a priori FTA with IAV.

Figure 8The top three columns are the FTA climatological seasonal cycle of the truth, a priori, and EXP-LC from December to February (DJF), March to May (MAM), June to August (JJA), and September to November (SON). The bottom two columns are the difference between the a priori and truth (P–T) and between the EXP-LC and truth (E–T).

4.2 The impact of CEnKF on flux estimation

The improvement in CEnKF manifested while averaging to the global annual scale. To illustrate its impact, we conduct a contrast experiment without CEnKF (EXP-L). For EXP-L, the accumulation of the annual global imbalances is 0.154, 0.173, and 0.024 GtC for 2015, 2016, and 2017, respectively (Fig. 9). Such an imbalance is not negligible compared with the annual mean FTA of approximately −1.2 GtC. Moreover, the bias compared with the truth is −0.191, −0.267, and −0.024 GtC for 2015, 2016, and 2017, respectively. Compared to EXP-L, EXP-LC significantly reduces the annual global SCF bias from  0.2 to less than 0.06 Gt (Fig. 9). The significantly reduced bias indicates that the CEnKF could efficiently improve the global flux estimation.

Figure 9The global annual total FTA, imbalance, and bias of EXP-LC (red), EXP-L (orange), and EXP-LCE (green) compared with the truth (black) in 2015, 2016, and 2017. The imbalance is the mass loss for each year. The bias is the analysis of EXP-L and EXP-LC compared with the truth for each year. Note that there is no imbalance problem for EXP-LC and EXP-LCE. The error bar of the annual total is the uncertainty.


Regionally, EXP-LC does not significantly outperform EXP-L (Fig. 10). For both EXP-LC and EXP-L, the source or sink is very consistent with the truth. However, EXP-LC shows slightly better estimation over tropical and southern extratropical regions, except for the South American temperate region. For EXP-L, the FTA is reversed from a source to a small sink in northern tropical Asia. This slightly better performance over the tropical and southern extratropical regions is also supported by the seasonal RMSER analysis in Sect. 4.1.

Figure 10The total regional FTA of EXP-LC, EXP-L, EXP-LCE, and the truth from January 2015 to December 2017. The error bar of EXP-LC, EXP-L, and EXP-LCE is the uncertainty.


For both EXP-LC and EXP-L, the FTA pattern is well reproduced at the grid scale (Fig. 11b, c). The widespread carbon sink over the northern extratropics and carbon source over the tropics and southern extratropics are reproduced. Furthermore, the carbon source over Southeast Asia and the carbon sink over southern South America are captured. However, over North America, EXP-LC shows a clearer west–east dipole pattern than EXP-L. Over northern tropical Africa, EXP-LC successfully captures the carbon source at the side and the carbon sink at the center. The improved fine-scale FTA estimation is not significant but indicates that the CEnKF does not degrade the pattern estimation of the annual mean FTA. For both experiments, the carbon sink over central Russia is shifted northward (Fig. 11d, e).

Figure 11The spatial distribution of FTA for the truth (a), EXP-LC (b), EXP-L (c), and EXP-LCE (d) averaged from January 2015 to December 2017. The annual mean of the prior FTA is not shown because it is zero at each grid. The bias of EXP-LC compared with the truth (e), EXP-L compared with the truth (f), and EXP-LCE compared with the truth (g).

Since we simplified the CEnKF to constrain the ensemble mean only, the potential effects need to be discussed. We conducted an experiment with the ensemble member constrained (EXP-LCE). We compared the regional RMSERs of the three experiments (Fig. A2). We find that all the experiments show comparable RMSERs over the northern extratropical regions, and the differences appear over the tropical and southern extratropical regions. EXP-LC shows slightly better performance compared with EXP-L over all the tropical and southern extratropical regions, which indicates that the additional mass constraint may have a positive effect on the performance over poorly observed regions. Comparing EXP-LC and EXP-LCE, EXP-LC shows a larger RMSER over Australia, northern tropical South America, and southern Africa, and EXP-LCE shows a larger RMSER over South America temperate and northern tropical Asia. Notably, EXP-LCE shows a worse performance than EXP-L over Australia and northern tropical Asia. Thus, the simplified CEnKF scheme does not degrade the overall performance at the seasonal and regional scales.

4.3 The impact of CEnKF on CO2 estimation

Since the CEnKF is applied to the state CO2, we further analyze the impact of CEnKF on the state CO2. From the DA increment perspective (Fig. 12), the CO2 tracers are redistributed horizontally (Fig. 12a, d) and vertically after the LETKF process. Then, the CEnKF process conducts another redistribution that counterbalances the superfluous LETKF increment (Fig. 12b, e). Finally, the global mass increment becomes to zero. Horizontally, the increment of both LETKF and CEnKF is larger over the land region. However, the magnitude of the CEnKF increment is much smaller than that of LETKF, which indirectly suggests that the CEnKF assists in improving the flux estimation without overriding the LETKF increment. The comparison between EXP-L and EXP-LC further suggests that the CEnKF does not degrade the long-term CO2 forecast (Fig. A3).

The time series of the global imbalance shows that it is less than 0.03 GtC at every assimilation time (Fig. 13a). The imbalance is smaller from September to May than in the other months, and there is no significant positive or negative bias. From June to August, the imbalance is usually positive and more significant than that in the other months. At the start of the spin-up period, the imbalance is out of the image range. Because of the significantly biased initial CO2 and FTA conditions (Fig. A1), the CO2 state is not consistent with the SCF, which leads to a large imbalance. The spatial patterns of the LETKF increment and CEnKF increment are opposite in most regions on 15 December 2015. There is a weak negative temporal mean correlation between the two increments. The correlation may be weakly positive or moderately negative at some assimilation times (Fig. 13b). We further find that the magnitude of the increment correlation has a moderate relationship with the absolute global LETKF mass imbalance (Fig. 13c). Generally, a larger mass loss/gain may lead to a higher correlated LETKF and CEnKF increment pattern.

Figure 12The ensemble mean LETKF and CEnKF increments of the surface CO2 on 15 June 2015 (a–c) and 15 December 2015 (d–f) for EXP-LC.

Figure 13(a) The global mass imbalance caused by LETKF. The red line is the ensemble mean of the global mass imbalance. The gray shading indicates the imbalance ensemble variance. (b) The sky blue line is the surface spatial correlation between the CEnKF increment and the LETKF increment at each assimilation time. The blue line is calculated from the sky blue line using the 30 d moving average method. Note that, during the spin-up period, the CEnKF is not applied. Thus, there are no correlations. (c) The red dots indicate the relationship between the increment correlation (sky blue line in panel b) and the absolute LETKF imbalance (absolute value of red line in panel a) at each assimilation time. The gray line is the linear least squares regression fits to the scattered dots. The correlation is shown in the upper right corner.


5 Summary and discussion

In this study, we described the development of the COLA system using the CEnKF which was implemented in a carbon cycle study for the first time. We present the performance of the COLA system at the multi-spatiotemporal scale and show the positive effects of the CEnKF in the context of OSSE. By assimilating the pseudo-surface and OCO-2 observations, the LETKF could effectively estimate the spatial pattern of the annual mean FTA. The biased seasonal cycle amplitude and phase from the a priori are corrected over most of the continental regions. The estimation is relatively better over the northern extratropics, where there are denser observations compared with other regions. However, without mass conservation, the annual global FTA is significantly biased. After the CEnKF process, the CO2 mass is constrained without disrupting the LETKF CO2 increment. More importantly, the constrained CO2 state significantly helps improve the estimation of global annual FTA and slightly improves the seasonal and annual FTA estimation over the tropics and southern extratropics. In this study, we simplified the original CEnKF to constrain the ensemble mean only, which does not degrade the performance compared with the original CEnKF, while significantly reducing the computational cost.

Over the tropics, there are fewer surface stations, and the satellite retrievals are usually contaminated by the clouds and aerosols. Thus, most inversion systems use a very long OW (3 months to 1 year) to track the tropical fluxes from the remote observations on a weekly or monthly basis. However, we show that COLA can accurately infer the tropical fluxes from only 7 d of observations. We summarize four potential reasons as follows: (1) using a very short AW of 1 d, the problem of lacking a dynamic SCF model is alleviated as the ensembles can evolve as linearly as possible and remain Gaussian. The persistent forecast model is reasonable using an AW that is as short as possible. (2) Instead of abandoning the error transport property of EnKF and using the a priori SCF as the first guess in each AW, the SCF ensembles could be transported to the next AW, indicating that LETKF could sequentially learn from the previous AWs and give a more precise first guess for the current AW without iteration. (3) The COLA system perturbs the ensembles using the additive inflation method based on the a priori SCF, which introduces an appropriate spatial correlation based on the a priori randomness, which also reduces the dependence of large ensemble size. In contrast, most ensemble-based CO2 inversion systems perturb the ensembles based on the distance decaying model by assigning a correlation length. (4) Most inversion systems do not update the CO2 state, and the update to CO2 at each assimilation time could reduce the error from the previous AWs and make the flux signal of the current AW clearer and more sensitive. In summary, the additional observations in the OW, the rapid update with ensemble transport from the previous AW, the additively introduced a priori randomness, and the update to the CO2 state reduce the dependency of a very long OW in COLA.

In terms of computational cost, COLA is very efficient mainly because of the small ensemble size and short OW. For example, the computational time required in our OSSE is approximately 1.5 min per assimilation cycle using 20 cores of Intel Xeon E5-2650 (Table A1). Thus, the 3 years of OSSE only used less than 1.5 d of computational time. As denser observations will be available in the future, increasing the horizontal resolution of ATM becomes urgently needed. However, this will be limited by the increased computational cost. The method proposed in this study and in Liu et al. (2019) has the potential to break through this limitation.

The transport model error is always a major issue in CO2 inversion studies. Several model intercomparison projects have found that the transport model uncertainty is at least of the same order of magnitude as the flux uncertainty (Baker et al., 2006a; Basu et al., 2018; Crowell et al., 2019; Schuh et al., 2019; Chevallier et al., 2010b). Therefore, quantitative transport uncertainty estimation is needed to obtain a robust estimate of SCF and provide information to policymakers. The EnKF can efficiently estimate the transport uncertainty online by perturbing the meteorological state (Kang et al., 2011; Liu et al., 2011; Chen et al., 2019), which requires close collaboration between the weather forecast community and CO2 inversion community. Moreover, the estimation of transport uncertainty needs to update the CO2 state and meteorology state together, which will inevitably cause the mass imbalance problem. The CEnKF method proposed here overcomes this limitation and offers a computationally efficient way of constraining global mass.

Appendix A

Table A1The computational cost for one assimilation cycle (7 d observation window). Each component is running in parallel, using 20 cores of Intel Xeon E5-2650. Note that the cost of the CEnKF with ensemble member constrained exceeds the cost of GEOS-Chem while increasing the horizontal resolution to 2 × 2.5.

Download Print Version | Download XLSX

Table A2List of the major abbreviations and their corresponding full names.

Download Print Version | Download XLSX

Figure A1The initial FTA and surface CO2 condition of the truth (a, c) and the ensemble mean first guess (b, c) on 1 October 2014.

Figure A2The RMSER in EXP-LC, EXP-L, and EXP-LCE.


Figure A3The top four columns are the CO2 climatological seasonal cycle of the nature run, control run, EXP-LC, and EXP-L from December to February (DJF), March to May (MAM), June to August (JJA), and September to November (SON). The bottom three columns are the difference between the control run and nature run (C–T), between EXP-LC and nature run (LC-T), and between the EXP-L and nature run (L–T).

Code and data availability

The code for CEnKF can be accessed from (Liu and Zeng, 2021). The related codes for GEOS-Chem and LETKF can be accessed from (last access: 30 June 2022; GEOS-Chem, 2022​​​​​​​) and (last access: 25 June 2022, Miyoshi, 2022), respectively.

Author contributions

ZL conceived the CEnKF scheme. ZL, NZ, YL, and EK developed the system. QC supplied the VEGAS model output. ZL designed and ran the experiments. ZL, NZ, and YL wrote the paper. GA, BW, DL and PH provided comments on and suggestions for the paper. All authors contributed to the preparation of this paper.​​​​​​​

Competing interests

The contact author has declared that none of the authors has any competing interests.


Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


The authors thank Zhimin Zhang, for his contribution to the development of the computer environment.

Financial support

This research has been supported by the Dream Project of Ministry of Science and Technology of the People's Republic of China (grant no. 2017YFB0504000).

Review statement

This paper was edited by Yuefei Zeng and reviewed by two anonymous referees.


Anderson, J. L.: An adaptive covariance inflation error correction algorithm for ensemble filters, Tellus Dyn. Meteorol. Oceanogr., 59, 210–224,, 2007. 

Baker, D. F., Law, R. M., Gurney, K. R., Rayner, P., Peylin, P., Denning, A. S., Bousquet, P., Bruhwiler, L., Chen, Y.-H., Ciais, P., Fung, I. Y., Heimann, M., John, J., Maki, T., Maksyutov, S., Masarie, K., Prather, M., Pak, B., Taguchi, S., and Zhu, Z.: TransCom 3 inversion intercomparison: Impact of transport model errors on the interannual variability of regional CO2 fluxes, 1988–2003, Global Biogeochem. Cy., 20, GB1002,, 2006a. 

Baker, D. F., Doney, S. C., and Schimel, D. S.: Variational data assimilation for atmospheric CO2, Tellus B, 58, 359–365,, 2006b. 

Baker, D. F., Bell, E., Davis, K. J., Campbell, J. F., Lin, B., and Dobler, J.: A new exponentially decaying error correlation model for assimilating OCO-2 column-average CO2 data using a length scale computed from airborne lidar measurements, Geosci. Model Dev., 15, 649–668,, 2022. 

Basu, S., Guerlet, S., Butz, A., Houweling, S., Hasekamp, O., Aben, I., Krummel, P., Steele, P., Langenfelds, R., Torn, M., Biraud, S., Stephens, B., Andrews, A., and Worthy, D.: Global CO2 fluxes estimated from GOSAT retrievals of total column CO2, Atmos. Chem. Phys., 13, 8695–8717,, 2013. 

Basu, S., Baker, D. F., Chevallier, F., Patra, P. K., Liu, J., and Miller, J. B.: The impact of transport model differences on CO2 surface flux estimates from OCO-2 retrievals of column average CO2, Atmos. Chem. Phys., 18, 7189–7215,, 2018. 

Bruhwiler, L. M. P., Michalak, A. M., Peters, W., Baker, D. F., and Tans, P.: An improved Kalman Smoother for atmospheric inversions, Atmos. Chem. Phys., 5, 2691–2702,, 2005. 

Chen, H. W., Zhang, F., Lauvaux, T., Davis, K. J., Feng, S., Butler, M. P., and Alley, R. B.: Characterization of Regional-Scale CO2 Transport Uncertainties in an Ensemble with Flow-Dependent Transport Errors, Geophys. Res. Lett., 46, 4049–4058,, 2019. 

Chevallier, F., Ciais, P., Conway, T. J., Aalto, T., Anderson, B. E., Bousquet, P., Brunke, E. G., Ciattaglia, L., Esaki, Y., Fröhlich, M., Gomez, A., Gomez-Pelaez, A. J., Haszpra, L., Krummel, P. B., Langenfelds, R. L., Leuenberger, M., Machida, T., Maignan, F., Matsueda, H., Morguí, J. A., Mukai, H., Nakazawa, T., Peylin, P., Ramonet, M., Rivier, L., Sawa, Y., Schmidt, M., Steele, L. P., Vay, S. A., Vermeulen, A. T., Wofsy, S., and Worthy, D.: CO2 surface fluxes at grid point scale estimated from a global 21 year reanalysis of atmospheric measurements, J. Geophys. Res., 115, D21307,, 2010a. 

Chevallier, F., Feng, L., Bösch, H., Palmer, P. I., and Rayner, P. J.: On the impact of transport model errors for the estimation of CO2 surface fluxes from GOSAT observations, Geophys. Res. Lett., 37, L21803,, 2010b. 

Crevoisier, C., Heilliette, S., Chédin, A., Serrar, S., Armante, R., and Scott, N. A.: Midtropospheric CO2 concentration retrieval from AIRS observations in the tropics, Geophys. Res. Lett., 31, L17106,, 2004. 

Crisp, D., Pollock, H. R., Rosenberg, R., Chapsky, L., Lee, R. A. M., Oyafuso, F. A., Frankenberg, C., O'Dell, C. W., Bruegge, C. J., Doran, G. B., Eldering, A., Fisher, B. M., Fu, D., Gunson, M. R., Mandrake, L., Osterman, G. B., Schwandner, F. M., Sun, K., Taylor, T. E., Wennberg, P. O., and Wunch, D.: The on-orbit performance of the Orbiting Carbon Observatory-2 (OCO-2) instrument and its radiometrically calibrated products, Atmos. Meas. Tech., 10, 59–81,, 2017. 

Crowell, S., Baker, D., Schuh, A., Basu, S., Jacobson, A. R., Chevallier, F., Liu, J., Deng, F., Feng, L., McKain, K., Chatterjee, A., Miller, J. B., Stephens, B. B., Eldering, A., Crisp, D., Schimel, D., Nassar, R., O'Dell, C. W., Oda, T., Sweeney, C., Palmer, P. I., and Jones, D. B. A.: The 2015–2016 carbon cycle as seen from OCO-2 and the global in situ network, Atmos. Chem. Phys., 19, 9797–9831,, 2019. 

Denning, A. S., Randall, D. A., Collatz, G. J., and Sellers, P. J.: Simulations of terrestrial carbon metabolism and atmospheric CO2 in a general circulation model. Part 2: Simulated CO2 concentrations, Tellus B, 48, 543–567,, 1996. 

Evensen, G.: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics, J. Geophys. Res., 99, 10143,, 1994. 

Feng, L., Palmer, P. I., Bösch, H., and Dance, S.: Estimating surface CO2 fluxes from space-borne CO2 dry air mole fraction observations using an ensemble Kalman Filter, Atmos. Chem. Phys., 9, 2619–2633,, 2009. 

Friedlingstein, P., Cox, P., Betts, R., Bopp, L., von Bloh, W., Brovkin, V., Cadule, P., Doney, S., Eby, M., Fung, I., Bala, G., John, J., Jones, C., Joos, F., Kato, T., Kawamiya, M., Knorr, W., Lindsay, K., Matthews, H. D., Raddatz, T., Rayner, P., Reick, C., Roeckner, E., Schnitzler, K.-G., Schnur, R., Strassmann, K., Weaver, A. J., Yoshikawa, C., and Zeng, N.: Climate–Carbon Cycle Feedback Analysis: Results from the C4MIP Model Intercomparison, J. Climate, 19, 3337–3353,, 2006. 

Friedlingstein, P., Jones, M. W., O'Sullivan, M., Andrew, R. M., Hauck, J., Peters, G. P., Peters, W., Pongratz, J., Sitch, S., Le Quéré, C., Bakker, D. C. E., Canadell, J. G., Ciais, P., Jackson, R. B., Anthoni, P., Barbero, L., Bastos, A., Bastrikov, V., Becker, M., Bopp, L., Buitenhuis, E., Chandra, N., Chevallier, F., Chini, L. P., Currie, K. I., Feely, R. A., Gehlen, M., Gilfillan, D., Gkritzalis, T., Goll, D. S., Gruber, N., Gutekunst, S., Harris, I., Haverd, V., Houghton, R. A., Hurtt, G., Ilyina, T., Jain, A. K., Joetzjer, E., Kaplan, J. O., Kato, E., Klein Goldewijk, K., Korsbakken, J. I., Landschützer, P., Lauvset, S. K., Lefèvre, N., Lenton, A., Lienert, S., Lombardozzi, D., Marland, G., McGuire, P. C., Melton, J. R., Metzl, N., Munro, D. R., Nabel, J. E. M. S., Nakaoka, S.-I., Neill, C., Omar, A. M., Ono, T., Peregon, A., Pierrot, D., Poulter, B., Rehder, G., Resplandy, L., Robertson, E., Rödenbeck, C., Séférian, R., Schwinger, J., Smith, N., Tans, P. P., Tian, H., Tilbrook, B., Tubiello, F. N., van der Werf, G. R., Wiltshire, A. J., and Zaehle, S.: Global Carbon Budget 2019, Earth Syst. Sci. Data, 11, 1783–1838,, 2019. 

Gelaro, R., McCarty, W., Suárez, M. J., Todling, R., Molod, A., Takacs, L., Randles, C. A., Darmenov, A., Bosilovich, M. G., Reichle, R., Wargan, K., Coy, L., Cullather, R., Draper, C., Akella, S., Buchard, V., Conaty, A., da Silva, A. M., Gu, W., Kim, G.-K., Koster, R., Lucchesi, R., Merkova, D., Nielsen, J. E., Partyka, G., Pawson, S., Putman, W., Rienecker, M., Schubert, S. D., Sienkiewicz, M., and Zhao, B.: The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2), J. Climate, 30, 5419–5454,, 2017. 

GEOS-Chem: GEOS-Chem Wiki, GEOS-Chem [code],, last access: 30 June 2022. 

Greybush, S. J., Kalnay, E., Miyoshi, T., Ide, K., and Hunt, B. R.: Balance and Ensemble Kalman Filter Localization Techniques, Mon. Weather Rev., 139, 511–522,, 2011. 

Gruber, N., Clement, D., Carter, B. R., Feely, R. A., van Heuven, S., Hoppema, M., Ishii, M., Key, R. M., Kozyr, A., Lauvset, S. K., Lo Monaco, C., Mathis, J. T., Murata, A., Olsen, A., Perez, F. F., Sabine, C. L., Tanhua, T., and Wanninkhof, R.: The oceanic sink for anthropogenic CO2 from 1994 to 2007, Science, 363, 1193–1199,, 2019. 

Gurney, K. R., Law, R. M., Denning, A. S., Rayner, P. J., Pak, B. C., Baker, D., Bousquet, P., Bruhwiler, L., Chen, Y.-H., Ciais, P., Fung, I. Y., Heimann, M., John, J., Maki, T., Maksyutov, S., Peylin, P., Prather, M., and Taguchi, S.: Transcom 3 inversion intercomparison: Model mean results for the estimation of seasonal carbon sources and sinks​​​​​​​, Global Biogeochem. Cy., 18, GB1010​​​​​​​,, 2004. 

Hunt, B. R., Kostelich, E. J., and Szunyogh, I.: Efficient Data Assimilation for Spatiotemporal Chaos: a Local Ensemble Transform Kalman Filter, arXiv [preprint],, 29 December 2005. 

Hunt, B. R., Kostelich, E. J., and Szunyogh, I.: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter, Phys. Nonlinear Phenom., 230, 112–126,, 2007. 

Kang, J.-S., Kalnay, E., Liu, J., Fung, I., Miyoshi, T., and Ide, K.: “Variable localization” in an ensemble Kalman filter: Application to the carbon cycle data assimilation, J. Geophys. Res., 116, D09110,, 2011. 

Kang, J.-S., Kalnay, E., Miyoshi, T., Liu, J., and Fung, I.: Estimation of surface carbon fluxes with an advanced data assimilation methodology, J. Geophys. Res. Atmospheres, 117, D24101,, 2012. 

Kondo, M., Patra, P. K., Sitch, S., Friedlingstein, P., Poulter, B., Chevallier, F., Ciais, P., Canadell, J. G., Bastos, A., Lauerwald, R., Calle, L., Ichii, K., Anthoni, P., Arneth, A., Haverd, V., Jain, A. K., Kato, E., Kautz, M., Law, R. M., Lienert, S., Lombardozzi, D., Maki, T., Nakamura, T., Peylin, P., Rödenbeck, C., Zhuravlev, R., Saeki, T., Tian, H., Zhu, D., and Ziehn, T.: State of the science in reconciling top-down and bottom-up approaches for terrestrial CO 2 budget, Glob. Change Biol., 26, 1068–1084,, 2020. 

Liu, J., Fung, I., Kalnay, E., and Kang, J.-S.: CO2 transport uncertainties from the uncertainties in meteorological fields, Geophys. Res. Lett., 38, L12808,, 2011. 

Liu, J., Fung, I., Kalnay, E., Kang, J.-S., Olsen, E. T., and Chen, L.: Simultaneous assimilation of AIRS Xco2 and meteorological observations in a carbon climate model with an ensemble Kalman filter​​​​​​​, J. Geophys. Res.-Atmos., 117, D05309,, 2012. 

Liu, J., Bowman, K. W., Lee, M., Henze, D. K., Bousserez, N., Brix, H., James Collatz, G., Menemenlis, D., Ott, L., Pawson, S., Jones, D., and Nassar, R.: Carbon monitoring system flux estimation and attribution: impact of ACOS-GOSAT XCO2 sampling on the inference of terrestrial biospheric sources and sinks, Tellus B, 66, 22486,, 2014. 

Liu, Y., Kalnay, E., Zeng, N., Asrar, G., Chen, Z., and Jia, B.: Estimating surface carbon fluxes based on a local ensemble transform Kalman filter with a short assimilation window and a long observation window: an observing system simulation experiment test in GEOS-Chem 10.1, Geosci. Model Dev., 12, 2899–2914,, 2019. 

Liu, Z. and Zeng, N.: The Constrained Ensemble Kalman Filter used in the COLA data assimilation system (v 1.0), Zenodo [code],, 2021. 

Lokupitiya, R. S., Zupanski, D., Denning, A. S., Kawa, S. R., Gurney, K. R., and Zupanski, M.: Estimation of global CO2 fluxes at regional scale using the maximum likelihood ensemble filter, J. Geophys. Res., 113, D20110,, 2008. 

Mitchell, H. L. and Houtekamer, P. L.: An Adaptive Ensemble Kalman Filter, Mon. Weather Rev., 128, 416–433​​​​​​​,<0416:AAEKF>2.0.CO;2, 2000. 

Miyoshi, T.: The Gaussian Approach to Adaptive Covariance Inflation and Its Implementation with the Local Ensemble Transform Kalman Filter, Mon. Weather Rev., 139, 1519–1535,, 2011 (data available at:, last access: 25 June 2022). 

Nassar, R., Napier-Linton, L., Gurney, K. R., Andres, R. J., Oda, T., Vogel, F. R., and Deng, F.: Improving the temporal and spatial distribution of CO2 emissions from global fossil fuel emission data sets, J. Geophys. Res.-Atmos., 118, 917–933,, 2013. 

Oda, T. and Maksyutov, S.: A very high-resolution (1 km × 1 km) global fossil fuel CO2 emission inventory derived using a point source database and satellite observations of nighttime lights, Atmos. Chem. Phys., 11, 543–556,, 2011. 

Pan, M. and Wood, E. F.: Data Assimilation for Estimating the Terrestrial Water Budget Using a Constrained Ensemble Kalman Filter, J. Hydrometeorol., 7, 534–547,, 2006. 

Peters, W., Miller, J. B., Whitaker, J., Denning, A. S., Hirsch, A., Krol, M. C., Zupanski, D., Bruhwiler, L., and Tans, P. P.: An ensemble data assimilation system to estimate CO2 surface fluxes from atmospheric trace gas observations, J. Geophys. Res., 110, D24304,, 2005. 

Peters, W., Jacobson, A. R., Sweeney, C., Andrews, A. E., Conway, T. J., Masarie, K., Miller, J. B., Bruhwiler, L. M. P., Petron, G., Hirsch, A. I., Worthy, D. E. J., van der Werf, G. R., Randerson, J. T., Wennberg, P. O., Krol, M. C., and Tans, P. P.: An atmospheric perspective on North American carbon dioxide exchange: CarbonTracker, Proc. Natl. Acad. Sci. USA, 104, 18925–18930,, 2007. 

Rödenbeck, C., Houweling, S., Gloor, M., and Heimann, M.: CO2 flux history 1982–2001 inferred from atmospheric data using a global inversion of atmospheric transport, Atmos. Chem. Phys., 3, 1919–1964,, 2003. 

Ruiz, J. J., Pulido, M., and Miyoshi, T.: Estimating Model Parameters with Ensemble-Based Data Assimilation: A Review, J. Meteorol. Soc. Jpn. Ser II, 91, 79–99,, 2013. 

Sasakawa, M., Shimoyama, K., Machida, T., Tsuda, N., Suto, H., Arshinov, M., Davydov, D., Fofonov, A., Krasnov, O., Saeki, T., Koyama, Y., and Maksyutov, S.: Continuous measurements of methane from a tower network over Siberia, Tellus B, 62, 403–416,, 2010. 

Schuh, A. E., Jacobson, A. R., Basu, S., Weir, B., Baker, D., Bowman, K., Chevallier, F., Crowell, S., Davis, K. J., Deng, F., Denning, S., Feng, L., Jones, D., Liu, J., and Palmer, P. I.: Quantifying the Impact of Atmospheric Transport Uncertainty on CO2 Surface Flux Estimates, Global Biogeochem. Cy., 33, 484–500,, 2019. 

Schuldt, K. N., Mund, J., Luijkx, I. T., Jacobson, A. R., Cox, A., Vermeulen, A., Manning, A., Beyersdorf, A., Manning, A., Karion, A., Hensen, A., Arlyn Andrews, Frumau, A., Colomb, A., Scheeren, B., Law, B., Baier, B., Munger, B., Paplawsky, B., Viner, B., Stephens, B., Daube, B., Labuschagne, C., Myhre, C. L., Hanson, C., Miller, C. E., Plass-Duelmer, C., Sloop, C. D., Sweeney, C., Kubistin, D., Goto, D., Jaffe, D., Say, D., Dinther, D. V., Bowling, D., Dickon Young, Weyrauch, D., Worthy, D., Dlugokencky, E., Gloor, E., Cuevas, E., Reyes-Sanchez, E., Hintsa, E., Kort, E., Morgan, E., Apadula, F., Francois Gheusi, Meinhardt, F., Moore, F., Vitkova, G., Chen, G., Bentz, G., Manca, G., Brailsford, G., Forster, G., Riris, H., Meijer, H., Matsueda, H., Huilin Chen, Levin, I., Lehner, I., Mammarella, I., Bartyzel, J., Abshire, J. B., Elkins, J. W., Levula, J., Jaroslaw Necki, Pichon, J. M., Peischl, J., Müller-Williams, J., Turnbull, J., Miller, J. B., Lee, J., Lin, J., Josep-Anton Morgui, DiGangi, J. P., Hatakka, J., Coletta, J. D., Holst, J., Kominkova, K., McKain, K., Saito, K., Aikin, K., Davis, K., Thoning, K., Tørseth, K., Haszpra, L., Mitchell, L., Gatti, L. V., Emmenegger, L., Lukasz Chmura, Merchant, L., Sha, M. K., Delmotte, M., Fischer, M. L., Schumacher, M., Torn, M., Leuenberger, M., Steinbacher, M., et al.: Multi-laboratory compilation of atmospheric carbon dioxide data for the period 1957–2019; obspack_co2_1_GLOBALVIEWplus_v6.0_2020-09-11, NOAA [data set]​​​​​​​,, 2020. 

Tans, P. P., Conway, T. J., and Nakazawa, T.: Latitudinal distribution of the sources and sinks of atmospheric carbon dioxide derived from surface observations and an atmospheric transport model, J. Geophys. Res., 94, 5151–5172​​​​​​​,, 1989. 

Tans, P. P., Fung, I. Y., and Taikahashi, T.: Observational Constraints on the Global Atmospheric CO2 Budget, Science, 9, 1431–1438,, 1990. 

Whitaker, J. S. and Hamill, T. M.: Evaluating Methods to Account for System Errors in Ensemble Data Assimilation, Mon. Weather Rev., 140, 3078–3089,, 2012. 

Whitaker, J. S., Hamill, T. M., Wei, X., Song, Y., and Toth, Z.: Ensemble Data Assimilation with the NCEP Global Forecast System, Mon. Weather Rev., 136, 463–482,, 2008. 

Wu, L., Bocquet, M., Chevallier, F., Lauvaux, T., and Davis, K.: Hyperparameter estimation for uncertainty quantification in mesoscale carbon dioxide inversions, Tellus B, 65, 20894,, 2013. 

Yang, D., Liu, Y., Cai, Z., Chen, X., Yao, L., and Lu, D.: First Global Carbon Dioxide Maps Produced from TanSat Measurements, Adv. Atmospheric Sci., 35, 621–623,, 2018. 

Yokota, T., Yoshida, Y., Eguchi, N., Ota, Y., Tanaka, T., Watanabe, H., and Maksyutov, S.: Global Concentrations of CO2 and CH4 Retrieved from GOSAT: First Preliminary Results, SOLA, 5, 160–163,, 2009. 

Zeng, N., Mariotti, A., and Wetzel, P.: Terrestrial mechanisms of interannual CO2 variability, Global Biogeochem. Cy., 19, GB1016,, 2005. 

Zeng, Y., Janjiæ, T., Ruckstuhl, Y., and Verlaan, M.: Ensemble-type Kalman filter algorithm conserving mass, total energy and enstrophy: SQPEns Conserving Mass, Total Energy and Enstrophy, Q. J. Roy. Meteor. Soc., 143, 2902–2914,, 2017. 

Zeng, Y., de Lozar, A., Janjic, T., and Seifert, A.: Applying a new integrated mass-flux adjustment filter in rapid update cycling of convective-scale data assimilation for the COSMO model (v5.07), Geosci. Model Dev., 14, 1295–1307,, 2021a. 

Zeng, Y., Janjiæ, T., de Lozar, A., Welzbacher, C. A., Blahak, U., and Seifert, A.: Assimilating radar radial wind and reflectivity data in an idealized setup of the COSMO-KENDA system, Atmospheric Res., 249, 105282,, 2021b.  

Zhang, F., Snyder, C., and Sun, J.: Impacts of Initial Estimate and Observation Availability on Convective-Scale Data Assimilation with an Ensemble Kalman Filter, Mon. Weather Rev., 132, 16​, 1238–1253,<1238:IOIEAO>2.0.CO;2, 2004. 

Zupanski, D., Denning, A. S., Uliasz, M., Zupanski, M., Schuh, A. E., Rayner, P. J., Peters, W., and Corbin, K. D.: Carbon flux bias estimation employing Maximum Likelihood Ensemble Filter (MLEF), J. Geophys. Res., 112, D17107,, 2007. 

Short summary
We described the application of a constrained ensemble Kalman filter (CEnKF) in a joint CO2 and surface carbon fluxes estimation study. By assimilating the pseudo-surface and OCO-2 observations, the annual global flux estimation is significantly biased without mass conservation. With the additional CEnKF process, the CO2 mass is strictly constrained, and the estimation of annual fluxes is significantly improved.