A Global Carbon Assimilation System using a modified EnKF assimilation method

A Global Carbon Assimilation System using a modified EnKF assimilation method S. Zhang, X. Zheng, Z. Chen, B. Dan, J. M. Chen, X. Yi, L. Wang, and G. Wu College of Global Change and Earth System Science, Beijing Normal University, Beijing, China Department of Geography, University of Toronto, Toronto, Canada International Institute for Earth System Science, Nanjing University, Nanjing, China Department of Statistics, University of Manitoba, Winnipeg, Canada Received: 18 August 2014 – Accepted: 18 September 2014 – Published: 1 October 2014 Correspondence to: X. Zheng (x.zheng@bnu.edu.cn) Published by Copernicus Publications on behalf of the European Geosciences Union. 6519


Introduction
The carbon dioxide concentration in the atmosphere plays an essential role in the study of global change for its potential to warm up the atmosphere and the surface.A better estimation of carbon fluxes over global ecosystems would help better understand each nation's contribution to the global warming and improve the global warming science.
In the past decade, many e orts have been made to estimate the surface CO 2 fluxes using both atmosphere-based top-down and land-based bottom-up methods.CarbonTracker (Peters et al., 2005(Peters et al., , 2007) may be one of the most advanced among these e orts.It uses an ensemble square root filter to assimilate atmospheric CO 2 mole fractions into an ecosystem model coupled with an atmospheric transport model.The model state vectors in CarbonTracker are carbon fluxes within 5 weeks.However, Miyazaki et al. (2011) and Kang et al. (2011) argued that the state vectors should also include atmospheric CO 2 concentration, because the observed CO 2 consists of both existing atmosphere CO 2 and recently released carbon fluxes, so including CO 2 concentration in the state vectors should improve the estimation of carbon fluxes.However, their e orts mainly focus on studying the performance of the assimilation methodology and observation settings by using idealized models only, not on assimilating real observations.

Discussion
It is well known that correct estimation of the forecast error statistics is crucial for the accuracy of any data assimilation algorithm.In all existing EnKF assimilations for estimating carbon fluxes, the ensemble forecast errors are estimated by perturbed forecasts minus their ensemble mean.However, the definition of the perturbed forecast errors is the perturbed forecast states minus the true state.Motivated by the fact that the analysis state is a better estimate of the true state than the forecast state, Wu et al. (2013) proposed a new estimator for the perturbed forecast errors using the perturbed forecast states minus the analysis state.Moreover, they used a simulation study to demonstrate that the new estimator can lead to better assimilations for models with large errors.Since the model errors of ecosystem models are generally large, the new estimation of the perturbed forecast errors is potentially useful to improve EnKF assimilation for estimating carbon fluxes.
Besides forecast errors, the observation errors need also be accurately estimated.In the majority schemes for estimating carbon fluxes, including CarbonTracker, the observation error variances are not estimated but empirically assigned.The quality of the estimation of observation error variances critically depends on whether the forecast error covariance matrix is appropriately estimated or not (Desroziers et al., 2005).However, appropriate estimation of the forecast error covariance matrix is a challenge in real applications.
In this paper we propose several modifications to the conventional EnKF for assimilating atmospheric CO 2 observations into ecosystem models.Firstly, the model state is set as a combination of the surface carbon fluxes and atmospheric CO 2 concentration as suggested by Miyazaki et al. (2011) and Kang et al. (2011).Secondly, the analysis state is used to adaptively estimate forecast errors as suggested by Wu et al. (2013) and Zheng et al. (2013).Thirdly, both forecast and observation errors are  Liang et al. (2012).This modified EnKF is used to assimilate real CO 2 concentration data into the Boreal Ecosystem Productivity Simulator (BEPS, Chen et al., 1999;Liu et al., 1999;Mo et al., 2008) for estimating real world terrestrial carbon fluxes with 3 hourly and 1 1 resolution from 2002 to 2008.
This paper consists of 7 sections.The models and data used in this study are introduced in Sect.2, while the methodology is in Sect.3. Section 4 presents the observing system simulation experiment results.A real data application of the proposed methodology is presented in Sect. 5. Conclusions and discussions are given in Sects.6 and 7 respectively.
2 Models and data

Surface carbon flux models
The surface carbon fluxes mainly arise from fossil fuel combustion, vegetation fire, oceanic uptake and biosphere.In this study, only the surface carbon fluxes from biosphere are simulated using BEPS, the rests are taken from datasets of CarbonTracker.BEPS is a process-based ecosystem model mainly developed to simulate forest ecosystem carbon budget (Chen et al., 1999;Ju et al., 2006;Liu et al., 1999).For many reasons, such as the complexity of ecosystem processes, spatial-temporal variabilities, representative errors, parameters in process-based models often do not represent their true values when these models are used to calculate carbon budget over large areas and for long time periods (Mo et al., 2008).Errors in these parameters lead to biases of model results.In this study, we try to reduce biases of BEPS simulated carbon fluxes by incorporating atmospheric CO 2 concentration measurements with data assimilation methods.The prior carbon fluxes simulated by BEPS are at a spatial resolution of 1 1 and for every 3 h.On each model grid, BEPS calculates carbon fluxes of 6 di erent ecosystem types and outputs the sum of them through weighting the areal fractions of the ecosystem types.Figure 1 shows the ecosystem types with the largest weight on each grid.

Atmospheric transport model
The global chemical transport Model for OZone And Related chemical Tracers (MOZART, Emmons et al., 2010) is used as the atmospheric transport model.In this study, MOZART is run at a horizontal resolution of approximately 2.8 2.8 with 28 vertical levels.The forcing meteorology is from NCAR reanalysis of the National Centers for Environmental Prediction (NCEP) forecasts (Kalnay et al., 1996;Kistler et al., 2001).Since CO 2 is chemically inert in atmosphere, we turn o all the chemistry processes and leave only transport of CO 2 by atmospheric motions.Given the atmospheric CO 2 concentration in the previous week and the surface carbon fluxes in the current week, MOZART is used to forecast gridded atmospheric CO 2 concentration within the current week.

Observation
The The observation error variance dataset is also released on CarbonTracker's website.They were subjectively chosen and manually tuned to fit into their models and observations (Peters et al., 2005(Peters et al., , 2007)).The observation sites were divided into six categories, each with their own assigned observation errors (see the document of CarbonTracker for details).These observation error variances are used as prior values in our system and are adaptively adjusted with the proposed assimilation scheme.

Methodology
Within tth week, let c t be a set of gridded atmospheric CO 2 concentrations every 3 h, f t be the set of prior carbon fluxes every 3 h, and t be a set of factors defined as constants on areas and within a week for adjusting f t .Then, the model state is defined as

EnKF with error inflations
Using the notations of Ide et al. (1997), the first EnKF algorithm used in this study consists of the following three main steps: 1. Forecast step The perturbed forecast states are estimated as where i represents an ensemble member, t,i are vectors sampled from a distribution with mean 0 and a given covariance matrix (taken from prior covariance structure in CarbonTracker, see the document of CarbonTracker and Peters et al., 2005Peters et al., , 2007)), and G is the atmospheric transport operator which maps c t 1 and the t -adjusted f t onto gridded CO 2 concentration.Then the forecast state is estimated as where m is the ensemble size.

Error step
The ensemble forecast errors and the observation error covariance matrix are estimated as t X f t and µ t R t respectively, where and R t is the observation error variance matrix for CarbonTracker.t and µ t are the inflation factors (of the forecast error and the observation error respectively) which are estimated by minimizing the following objective function (Liang et al., 2012;Zheng, 2009): where y o t is the vector of atmospheric CO 2 concentration measurements, H t is a linear observation operator, which interpolates gridded CO 2 concentrations at observation times and locations.

Analysis step
The perturbed analysis states are estimated as where t,i is a normal random variable with mean zero and covariance matrix µ t R t (Burgers et al., 1998).The analysis state x a t is estimated as Finally, set t = t + 1 and return to step (1) for the assimilation at next time step.
The assimilated surface carbon fluxes are from all sources because the observed CO 2 concentrations are arising from all sources.Then the surface carbon fluxes from biosphere are estimated by the assimilated total carbon fluxes minus carbon fluxes from other sources supplied by the forcing data.(Wu et al., 2013).Therefore after the analysis step (3) in Sect.3.1, it is suggested to return to the error step (2), and substitute x f t in Eq. ( 4) by x a t .This procedure is repeated until the corresponding objective function (Eq.5) converges (Wu et al., 2013;Zheng et al., 2013).In this study, if the di erence between the minima of 2L t ( , µ) at nth and n + 1th iterations is less one 1, then the iteration is stopped.A flowchart of the proposed assimilation scheme in this study is shown in Fig. 2.

Validation statistic
In ideal experiments where the "truth" is known, we can calculate the root-mean-square error (RMSE) of the model states.RMSE of carbon fluxes is defined as where T stands for the total period of the assimilation, f t (k) is an average of fluxes in the kth TransCom region (Gurney et al., 2004) and tth 3 h-period, while f t t is the corresponding "truth".The spread is defined as, where f t,j (k) is the j th member of ensemble fluxes in the kth TransCom region.If all the members of ensemble analysis states have the same distribution as the "truth", then estimated RMSE and spread should be close.
The RMSE of CO 2 concentrations at tth time is defined as where y t (l ) is the CO 2 concentration value at l th location and y t t (l ) is the corresponding "truth".No ecosystem model is perfect.Therefore it is desirable to introduce ecosystem model error.In order to mimic this model error, we set the "true" carbon fluxes to be 80 % of the BEPS simulated values plus 20 % of the CarbonTracker assimilated values.
The "true" gridded CO 2 concentration is calculated starting from a CO 2 concentration field taken from CarbonTracker document, and is forced by the "true" carbon fluxes.
The synthetic observations are formed by adding noise to the interpolated "true" CO 2 concentrations at the observation locations and times.All the observation errors are assumed to be statistically independent and normally distributed with standard deviation 0.2 ppmv.The experiments are carried out for 2002 with BEPS and MOZART.The ensemble size is 150 unless otherwise noted.

Inflation on observation error variance
In this section, we test the method of inflation on observation error covariance matrices described in Sect.3.1.Three experiments with inflated forecast error are compared.In Experiment 1, the observation error variance is incorrectly specified (with variance of 0.5 ppmv), and the inflation procedure is not carried out (refer to as Wrong R).In Experiment 2, the observation error variance is also specified as 0.5 ppmv, but the inflation procedure is carried out (refer to as Wrong R + Inf).In Experiment 3, the observation error variance is correctly specified as 0.2 ppmv (refer to as True R).The regional RMSEs of the assimilated carbon fluxes and the monthly RMSEs of atmospheric concentrations at the observation sites in these three experiments are shown in Figs. 3  and 4, respectively.
Figure 3 shows that the RMSEs of the estimated carbon fluxes (Eq.8) in all experiments are reduced on all 11 regions compared with the RMSEs of the prior carbon fluxes.Similarly, Fig. 4 shows that the RMSEs of the estimated in situ atmospheric CO 2 concentrations (Eq.10) in all experiments are also reduced in all months compared with the RMSEs of the prior CO 2 concentrations.These facts indicate that the data assimilation schemes studied in this paper are useful.Moreover, the RMSEs from Experiment 3 (True R) is smaller than that from Experiment 1 (Wrong R).This suggests that correct specification of the observation error variances is important in improving the assimilation results.
The mean value of the estimated standard observation errors in Experiment 2 Although it is larger than the true value 0.2 ppmv, but is much smaller than the incorrectly specified value 0.5 ppmv.Nevertheless, the RMSEs from Experiment 2 (Wrong R + Inf) and Experiment 3 (True R) are comparable (see Figs. 3 and 4).

Constructing forecast error statistics using the analysis states
In this section, we test the methodology of using the analysis state to construct forecast error statistics described in Sect.3.2.The corresponding Experiment 4 is referred to as "Wrong R + Inf + Anl".
Figure 3 shows that the RMSEs of the carbon fluxes assimilated in Experiment 4 (Wrong R + Inf + Anl) are the smallest for all regions and in all experiments.Similarly, Fig. 4 shows that the RMSEs of the in situ atmospheric CO 2 concentrations assimilated in Experiment 4 are the smallest for all months except September.These results indicate that the methodology of using the analysis state to construct forecast error statistics described in Sect.3.2 can improve the assimilation results.The mean value of the estimated standard observation errors in Experiment 4 (Wrong R + Inf + Anl) is 0.268 ppmv, which is virtually identical to that in Experiment 2 (Wrong R + Inf).

Validation
For investigating the sensitivity of the inflation factors toward the choice of ensemble size, the time series of inflation factors are calculated with the ensemble size 200, and are plotted against the estimated inflation factors with the the ensemble size 150 (not shown here).In fact they are very close, suggesting that the ensemble size 150 used in both CarbonTracker and this study is reasonable.
The analysis spread and RMSE are also calculated for each region using Eq. ( 9) and they are very close (not shown here).This indicates that the member of analysis states may have the same distribution as the true states.
By including CO 2 concentration in the state vectors, the initial value of gridded CO 2 concentration at every week is the analysis states of CO 2 concentration at previous week.However, when excluding CO 2 concentration from the state vectors, the initial value of gridded CO 2 concentration can also be estimated by using atmospheric transport model forced by the assimilated surface carbon fluxes at previous week.Sensitivity experiments were carried out to compare these two approaches.The RMSEs of analysis in the experiment without including CO 2 concentration in state vectors are close to those in Wrong R + Inf + Anl, which suggests that including CO 2 concentration in state vectors may not significantly improve the assimilation results.However, in Wrong R + Inf + Anl the initial states of gridded atmospheric CO 2 concentration need not be re-estimated, thus the computational cost of atmospheric transport model is about half of that in experiment without including CO 2 concentration, which is one advantage of including CO 2 concentration.

Application and results
In this section we use the data assimilation methods described in Sect.(Gurney et al., 2004) and 19 Olson ecosystem types, as in Car-bonTracker.Prior observation error covariance matrix is adopted from CarbonTracker.We refer to this data assimilation scheme as Global Carbon Assimilation System using Ensemble Kalman filter (GCAS-EK).

Adjustment to total carbon budget of BEPS
We first carry out a control run starting from 1 January 2002 with no adjustment of prior fluxes.The simulated CO 2 concentrations are interpolated at measurement times and locations, and compared with real observations in the year 2005.The result is shown in Fig. 5a.It shows that the simulated concentrations have a bias of 2.945 ppmv and a RMSE of 4.525 ppmv, which implies an underestimation of carbon sinks by BEPS.With the ecosystem fluxes estimated by GCAS-EK, we carry out another control run and comparisons.The bias and RMSE are reduced to 0.967 and 3.675 ppmv, respectively (Fig. 5b).
We have to point out that the underestimation of carbon sinks by BEPS is conditioned on the estimated carbon fluxes released by fossil fuel and fire, even if the ocean fluxes used in our assimilation system are accurate.As described in Sect.2, the observed variability of CO 2 concentration is due to the variability of carbon fluxes from all sources, including fossil fuel combustion, vegetation fire, oceanic uptake and biosphere.If carbon sources not from biosphere are underestimated, the carbon sinks from biosphere simulated by BEPS would also be underestimated.Nevertheless, our adjustment to carbon sinks simulated by BEPS appears reasonable.

Multiyear average of the global carbon flux distribution
The distribution of the average global carbon budget from 2002 to 2008 is shown in Fig. 6.The two spatial patterns of carbon fluxes related to BEPS (Fig. 6a and b) are similar.However, they are quite di erent from that of CarbonTracker products (Fig. 6c).
In the North American region, CarbonTracker exhibits a nearly west-east strip of the carbon sink (Fig. 6c), while the carbon sinks assimilated or simulated by BEPS are mainly distributed in the east of 95 W (Fig. 6a and b).In the central Africa near the southern edge of Sahara desert, CarbonTracker simulates a strong carbon sink (Fig. 6c), but BEPS simulates a weak sink (Fig. 6a), while the assimilated result shows a weak source (Fig. 6b).In Indonesia, CarbonTracker simulates a moderate carbon source (Fig. 6c), while carbon sinks are simulated and assimilated by BEPS (Fig. 6a and b).In Australian Northern Territory, CarbonTracker simulates a carbon sink (Fig. 6c), while the other two produce carbon sources (Fig. 6a and b).In North American Temperate and Eurasia Boreal, the assimilated carbon sink is clearly larger than that simulated.
Carbon budgets are calculated based on the BEPS ecosystem types and the 11 TransCom regions (Fig. 7).Similar to the global distribution maps (Fig. 6), the assimilated BEPS carbon budgets (Fig. 7) have almost the same property in sources or sinks with that simulated by BEPS.However for the C4 and the shrub in Australia, BEPS simulates carbon sources while CarbonTracker shows carbon sinks.Moreover in North America, there is a large carbon sink increase of the assimilated over the BEPS simulated.Further diagnostic (not shown here) reveals that, between October and April, the carbon sinks estimated by CarbonTracker are much larger than that estimated by GCAS-EK.But between May and September, the carbon sinks estimated by Carbon-Tracker and GCAS-EK are very close.

Interannual and seasonal variations
The interannual variations of the global total carbon budgets are shown in Fig. 8.It shows that CarbonTracker predicts the largest multiyear average carbon sink ( 3.89 Pg C year 1 ), compared with the smallest one simulated by BEPS ( 2.23 Pg C year 1 ).The assimilated mean carbon sink ( 3.87 Pg C year 1 ) is virtually identical to that estimated by CarbonTracker.The carbon sinks simulated by BEPS and predicted by CarbonTracker obviously have more interannual oscillation than that assimilated by GCAS-EK.
The monthly changes of the multiyear-averaged carbon budgets before and after the assimilation of BEPS results are compared with that by CarbonTracker in Fig. 9.
Clearly, the seasonal variability of the carbon budgets by CarbonTracker is the largest.The assimilated fluxes based on BEPS have larger sinks in the summer and smaller sources in the winter than those before the assimilation.

Comparison with CarbonTracker
Including CO 2 concentration in the state vector implies that an atmospheric transport model is part of the forecast operator, not part of the observation operator (such as in CarbonTracker).In this framework, the forecast operator comprises an atmospheric transport model and forecast of adjusted factors (Eqs. 1 and 2).The observation operator is the linear operator which interpolates gridded atmospheric CO 2 concentration onto the observation points.Moreover for remotely sensed CO 2 cylinder concentration data, the observation operator can be chosen as a weighted average of gridded atmosphere CO 2 concentrations.
Since atmospheric CO 2 concentration is not model variable in CarbonTracker, the observation operator in this study is di erent from that in CarbonTracker.In CarbonTracker, the observation operator is atmospheric transport model coupled with a linear spatial interpolation operator which maps surface CO 2 fluxes to atmospheric CO 2 concentration observation network.Then its observation error comprises the following three components: (1) measurement error covariance, (2) representation error covariance; and (3) model transport error covariance.In this study, the observation operator is only the linear spatial interpolation operator.So the observation error in our experiment only comprises components (1) and (2).
The mean value of the estimated µ t for inflating the prior observation error variances is 0.74.This indicates that the estimated observation error variances are smaller than that used in CarbonTracker.This may be due to that model transport errors are not included in our observation errors, but are included in the observation errors for Car-bonTracker.

Forecast of adjusted factors
From the extensive experiments conducted in this study, we find that the spatial pattern of assimilated fluxes is highly correlated with the spatial pattern of prior fluxes.This is due to the fact that the unconditional expectation of the analysis E a t,i is 1, which could be attributed to the setting of forecast of adjusted factors (Eq.1).Then the probability of shifting between carbon sources and sinks is small.It means that GCAS-EK generally trusts the spatial pattern simulated by the ecosystem model.
To avoid this problem, more flexible adjustment of f t with more adjusted factors may be considered.However, the increased number of the adjusted factors results in increased degrees of freedom of adjustment model.To fit such kind of model more abundance of observations may be required.For surface flask observations with a total number of about 100 every week over the entire globe, the number of adjusted factors has to be carefully controlled.Therefore, under the current density of ground-based observation network, improving the accuracy of the ecosystem model producing the prior fluxes may be more feasible strategy to improve the surface flux estimation.

Length of the assimilation time window
Di erent lengths of the assimilation time window are used in various systems (5 weeks in CarbonTracker, 3 and 7 days in Miyazaki et al., 2011, and6 h in Kang et al., 2012).We choose one week as the length in our methodology for the following two reasons.Firstly, since most surface stations only have weekly observations, we need at least one week data to cover the globe.Secondly, beyond one week the model errors of MOZART and BEPS may be significant, but they are very di cult to quantify.

Conclusions
We propose a methodology to assimilate atmospheric CO 2 concentration into surface carbon fluxes simulated by an ecosystem model.In our framework CO 2 concentration is included in the state vector, both forecast and observation errors are inflated and forecast error statistics are estimated in an adaptive procedure using the analysis states.Generally speaking, these adaptive estimations improve the accuracy of assimilated error statistics in EnKF, which leads to further improvement in the accuracy of analysis states.Importantly, pre-assigned values of the observation error variance are improved if these adaptive procedures are applied.
Four simulation experiments were carried out to show the e ectiveness of the proposed methodology.In the first two experiments, we assumed the observation error variances were incorrectly specified and compared assimilation results with and without using inflations on the observation error statistics.The third experiment, in which the observation error variances were supposed to be known, served as a benchmark of how the observation error variances were estimated using our methodology.The fourth experiment showed the e ectiveness of using analysis states to further improve the estimation of forecast error.The results from all experiments met our expectation and increased our confidence in applying the improved EnKF to assimilate real observations.The application of the methodology to real data shows that the assimilated carbon fluxes by GCAS-EK are comparable to those reported by CarbonTracker.However, there are significant regional di erences between carbon flux distributions assimilated by GCAS-EK and CarbonTracker, which may be attributed to the di erences between the ecosystem models and the assimilation methodologies.
In our future study, we will investigate the sensitivity of assimilation results to the accuracy of ecological models and transport models.Also, more observation datasets, such as remote sensing data, will be introduced into the GCAS-EK.
Paper | Discussion Paper | Discussion Paper | Discussion Paper |

6521
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | inflated as suggested by

6523
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

6525
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

6527
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 4 Simulation study4.1 Experimental designIn this section, the e ectiveness of data assimilation methods introduced in Sect. 3 is examined with simulation experiments.

6529
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

6531
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

6533
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

6535
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Figure 5 .Figure 6 .Figure 7 .Figure 8 .
Figure 5. Comparisons between real observations and simulated concentrations by control runs: (a) control run forcing by prior carbon fluxes; (b) control run forcing by assimilated carbon fluxes by GCAS-EK.Both simulations start from 1 January 2002 and all simulated concentrations at observation locations and times in 2005 are compared here.
The initial atmospheric CO 2 concentration field is also from CarbonTracker products.The prior land surface carbon fluxes are simulated by the ecosystem model BEPS.In this study, only land surface carbon fluxes need to be adjusted.The partition of the adjustment factors (i.e.t in Sect.3) is based on 11 TransCom regions 3 to estimate the land surface carbon fluxes from 2002 to 2008.The fossil fuel, forest fire and ocean surface carbon flux forcing fields of the atmospheric transport model MOZART are all taken from CarbonTracker website.