Interactive comment on “ Comparison of Different Sequential Assimilation Algorithms for Satellite-derived Leaf Area Index Using the Data Assimilation Research Testbed ( lanai )

The submitted paper uses four assimilation methods (KF, EnKF, EAKF and PF) and CLM4CN to assimilate LAI, and chooses a best assimilation method by comparing with MODIS LAI. MODIS satellite remote sensing data can obtain LAI products with long time series. However, due to the impacts of cloud cover, aerosols, snow cover, and sensor failure, MODIS LAI products are characterized by high noise, low accuracy, and large fluctuations in the time series. Therefore, MODIS LAI data with better quality should be selected as observations based on quality control (QC) information. The research objective is reasonable and the review portion and figures need to be


Introduction
Land surface processes play an important role in the earth system because all the physical, biochemical, and ecological processes occurring in the soil, vegetation, and hydrosphere influence the mass and energy exchanges during landatmosphere interactions (Bonan, 1995;Pitman, 2003;Pitman et al., 2009Pitman et al., , 2012)).The leaf area index (LAI) is a key biophysical parameter of vegetation in land surface models (LSMs) and influences their simulation performance.Therefore, high-quality, spatially and temporally continuous LAI inputs are extremely important (Bonan et al., 1992;Li et al., 2015).
Real-time monitoring of LAI on a large scale is a worldwide problem.The lack of spatial representativeness caused by the sparse distribution of conventional observations makes it difficult to achieve a global observational LAI dataset.Remote sensing can provide global data with high spatial and temporal resolutions, but the inversion accuracy is associated with different plant functional types (PFTs) and vegetation fractions.Furthermore, although advanced land surface models (LSMs, e.g., the Community Land Model version 4, CLM4) can predict LAI variation, the model performance is greatly affected by the model structure, meteorological forcing, and initial and boundary conditions of the input (Dai et al., 2003;Luo et al., 2003;Levis et al., 2004).Data assimilation (DA), through optimally combining both dynamical and X.-L.Ling et al.: Comparing algorithms for LAI assimilation physical mechanisms with real-time observations, can effectively reduce the estimation uncertainties caused by spatially and temporally sparse observations and poor observed data accuracy (Kalnay, 2003).
As a link between observations and dynamic model states, mathematical algorithms play an important role in calculating the increments and adjusting the state vector during assimilation (Kalnay et al., 2007).The two basic data assimilation algorithms are the variational DA based on optimal control theory and sequential algorithms based on the Kalman filter (Dimet and Talagrand, 1986;Gordon et al., 1993;Bannister, 2017;Vetra-Carvalho et al., 2018).Because the Kalman filter algorithm is based on the linear model error assumption, many new sequential algorithms have been proposed.For example, the extended Kalman filter (EKF) was developed to meet the need for a nonlinear observation operator, but the tangent operator needs to be developed (Kalnay, 2003).Based on the Monte Carlo method and focused on the nonlinear operator, the ensemble Kalman filter (EnKF) was developed (Evensen, 1994) and was first used in the study of atmospheric science (Houtekamer and Mitchell, 1998).Since then, the EnKF has been widely applied for the assimilation of ocean, land surface, and atmospheric data (Houtekamer et al., 2005;Evensen, 2007).In recent years, the Monte Carlo methods have been proposed to allow the assimilation of information from sources that have non-Gaussian errors.
Many previous studies focusing on the comparison of variational and sequential algorithms have been conducted to determine the optimal assimilation method (Han and Li, 2008).Wu et al. (2011) systematically compared the EnKF, 3DVAR, and 4DVAR algorithms and found that the EnKF algorithm was better than the 3DVAR method and the same as the 4DVAR method.For this reason, the application of the EnKF algorithm has been expanded quickly, and many other forms of the EnKF method have been developed, such as the dual EnKF (Li et al., 2014), ensemble square root filter (EnSRF) (Whitaker and Hamill, 2002), and ensemble adjust Kalman filter (EAKF, Anderson, 2001).At the same time, combinations of variational algorithms and sequential algorithms have also been developed.For example, the maximum likelihood ensemble filter (MLEF, Zupanski, 2005), the combination of 3DVAR and PF algorithms (Leng and Song, 2013), and the hybrid variational-ensemble data assimilation methods, i.e., the 4DEnKF (Hunt et al., 2004;Fertig et al., 2007;Zhang et al., 2009) and the DrEnKF (Wan et al., 2009), have been developed at NCEP and applied to improve model predictions (Whitaker et al., 2008).
A complete Land Data Assimilation System (LDAS) is mainly composed of forcing datasets, initial and boundary datasets, parameterization sets, dynamical models as physical constraints, assimilation algorithms, observational data, and target output.In recent decades, studies of land data assimilation have become very active, although this topic was proposed later than the assimilation of atmospheric observations (Lahoz and De Lannoy, 2014).Land data assimila-tion can implement both in situ observations and remotely sensed data like satellite observation of soil moisture, snow water equivalent (SWE), land surface temperature, and so on to constrain the physical parametrization and initialization of land surface state.(Liu et al., 2008;Reichle et al., 2014;Zhang et al., 2014;Zhao et al., 2016;Zhao and Yang, 2018).The widely acknowledged LDASs include the North LDAS (NLDAS, Mitchell et al., 2004;NLDAS-2, Luo et al., 2003;Xia et al., 2012), the Global LDAS (GLDAS, Rodell et al., 2004), the European LDAS (ELDAS, Jacobs et al., 2008), the West China LDAS (WCLDAS, Huang and Li, 2004), and the Canadian LDAS (CaLDAS, Carrera et al., 2015).
Recent studies focusing on assimilation in terrestrial systems have tended to add multiple phenological observations to constrain and predict biome variables and further improve model performance (Knyazikhin et al., 1998;Xiao et al., 2009;Viskari et al., 2015).Joint assimilation of surface incident solar radiation, soil moisture, and vegetation dynamics (LAI) into land surface models or crop models is of great importance since it can improve the model results for national food policy and security assessments (Sabater et al., 2008;Ines et al., 2013;Sawada et al., 2015;Jin et al., 2018;Mokhtari et al., 2018).Furthermore, the ability to simulate river discharge, land evapotranspiration, and gross primary production has been improved in Europe (Barbu et al., 2011;Albergel et al., 2017).To date, such studies have been conducted using a single sequential algorithm at a single site or on regional scales (Montzka et al., 2012;Sawada, 2018).
The Data Assimilation Research Testbed (DART) is an open-source community facility and includes several different types of Kalman filter algorithms (Anderson et al., 2009).It has been coupled to many high-order models and observations for ocean, atmosphere, land surface, and chemical constituents.For example, DART has been coupled with CLM4 or CLM4.5 to improve snow and soil moisture estimations as well as land carbon processes (Zhang et al., 2014;Kwon et al., 2016;Zhao et al., 2016;Fox et al., 2018;Zhao and Yang, 2018).Utilizing the coupled DART-CLM4, the Global Land Surface Satellite LAI (GLASS LAI) data are assimilated into the Community Land Model with carbon and nitrogen components (CLM4CN) in the present study to explore the optimal assimilation algorithm for model performance.The experimental design and different assimilation algorithms are described in Sect. 2. Section 3 describes the optimal algorithm for LAI assimilation, and the proportion of observations is discussed in Sect. 4. Conclusions and discussions are given in Sect. 5.

Data and methodology
A complete LDAS is mainly composed of meteorological forcing, initial and boundary datasets, parameterization sets, dynamical LSMs, assimilation algorithms, observational data, and target output.LSMs play an important role in the LDAS because they can add physical constraints to the control variables during assimilation.In addition, the simulation ability of LSMs can directly affect the output because they provide the associated uncertainty for assimilation.

CLM4CN
Developed by the National Center for Atmospheric Research (NCAR), the Community Land Model (CLM) can simulate energy, momentum, and water exchanges between the land surface and the overlying atmosphere at each computational grid.The CLM is designed mainly for coupling with the atmospheric numerical model and providing the surface albedo (direct and scattered light within the visible and infrared bands), upward longwave radiation, sensible heat flux, latent heat flux, water vapor flux, and east-to-west and south-tonorth surface stress needed by the atmospheric model.These parameters are controlled by many ecological and hydrological processes.The model can also simulate leaf phenology and physiological processes, as well as water circulation through plant pores.Ecological differences between vegetation types and thermal and hydrological differences between different soil types are also considered.Each grid cell can be covered by several different land use types.Each cell contains several land units, each land unit contains a different number of soil and snow cylindrical blocks, and each cylindrical block may contain several types of vegetation functions.The CLM employs 10 soil layers to resolve soil moisture and temperature dynamics and uses PFTs to represent subgrid vegetation heterogeneity (Oleson et al., 2010).
There are two ways to update LAI in CLM4.The LAI is treated as a diagnostic variable that is linearly interpolated from a 30-year averaged satellite dataset, and there is no annual LAI variation for CLM4 with satellite phenology (CLM4SP) (Lawrence and Chase, 2007).For CLM4CN, the prognostic LAI is calculated by the leaf carbon pool and an assumed vertical gradient of specific leaf area (SLA) (Thornton and Zimmermann, 2007).Carbon and nitrogen are obtained by plant storage pools in one growing season and then retained and distributed in the subsequent year.All carbon and nitrogen state variables in vegetation, litter, and soil organic matter (SOM) are prognostic based on the prescribed vegetation phenology.The CLM4CN offline mode with prescribed meteorological forcing is used in this study.

DART (the Lanai version)
DART is developed and maintained by the Data Assimilation Research Section (DAReS) at NCAR.The purpose of DART is to provide a flexible tool for data assimilation (DA), and it has been coupled with many high-order models.As a software environment, DART makes it easy to explore a variety of data assimilation methods and observations with different numerical models.The DART system includes several different types of sequential algorithms, which are se-lected at runtime by a namelist setting.The Lanai version of DART, which supports many existing models including the CESM climate component, the MPAS (Model for Prediction Across Scales) models, and the NOAH land model, is used in this study.Released in December 2013, the Lanai version of DART can process many new observation types and sources and include new diagnostic routines as well as new utilities.Detailed settings for DART can be found at https://www.image.ucar.edu/DAReS/DART/(last access: 1 July 2019).
Currently, the coupled DART-CLM4 model has produced many reanalysis data for snow and soil moisture.It has been found that snow DA can improve temperature predictions, especially over the Tibetan Plateau, implying great implications for future land DA and seasonal climate prediction studies (Lin et al., 2016).Furthermore, the coupled DART-CLM framework would be employed to assimilate other variables, such as LAI, from various satellite sources and ground observations (i.e., truly multimission, multiplatform, multisensor, multisource, and multiscale).Ultimately, this would allow earth system models to be constrained by all types of observations to improve model performance for seasonal and decadal prediction skills.

Sequential assimilation algorithms
According to Anderson (2001), Eq. ( 1) is used to express how new sets of observations modify the prior joint state conditional probability distribution obtained from predictions based on previous observation sets.
in which Y t,k is defined as the superset of all observation subsets, y o t,k is the kth subset of observations at time t, and z t,k is the joint state-observation vector for a given t and k.In ensemble applications, generally there is no need to compute the denominator of Eq. (1).Four algorithms for approximating the product in the numerator of Eq. ( 1) are presented below, and detailed information can be found in Anderson (2001).

Kernel filter (KF)
The kernel filter (KF) mechanism, first proposed by Lindgren et al. (1993) and further developed by Anderson and Anderson (1999), has been incorporated into DART and can be extended to the joint state space.A detailed calculation process can be found in Anderson (2001).The KF is potentially general, because the values and expected values of the mean and covariance and higher-order moments of the resulting ensemble are functions of high-order moments of the prior distribution.However, when applied to large models, computational efficiency will be an issue for the application of the algorithm.The Kalman filter (Kalman, 1960) algorithm has not been widely used because of computing limitations and the linear model error assumption.The EnKF was proposed based on a Monte Carlo approximation, for which the background error covariance is approximated using an ensemble of forecasts (Evensen, 1994).The EnKF algorithm can be utilized for nonlinear systems and can also reduce the computing requirement of DA (Burgers et al., 1998;Evensen, 2003Evensen, , 2007)).
The EnKF procedure is divided into two stages: prediction and analysis.(1) In the prediction stage, the ensemble forecast field is generated from the ensemble initial condition, and the error covariance matrix of the ensemble forecast is calculated.(2) In the analysis stage, the simulation of each member of the ensemble is updated using the covariance matrix of the observation vector error and state vector error.The traditional EnKF, an ensemble of Kalman filters with each member using a different sample estimate of the prior mean and observations, is used in this study (Houtekamer and Mitchell, 1998).

Ensemble adjust Kalman filter (EAKF)
Although the forms of expression are different, the proposed EnSRF (Whitaker and Hamill, 2002) and EAKF (Anderson, 2001) are the same algorithm.
The difference between the EAKF and the traditional EnKF lies in the adjustment of the gain matrix to avoid filtering the divergence problem by increasing the premise of the analysis error covariance (Anderson, 2003(Anderson, , 2007;;Wang et al., 2007).In the EAKF algorithm, ensemble observation members are calculated by the observation operator, and the increment of each observation member is calculated as Y i .
The increment X ij for each ensemble sample of each state variable in terms of Y i can then be calculated as follows: where i indicates the ensemble member, j is the state vector member, σ p j o is the prior covariance of the state vector and observation, and σ p o is the prior variance of observation.

Particle filter (PF)
The particle filter (PF) is also a sequential Monte Carlo method, which is based on the Bayesian sequential importance sampling method (SIS).The PF algorithm finds a set of random samples in the state space to approximate the probability density function and then replaces the integral operation with the sample mean to obtain the process of minimum variance distribution of the state (Moradkhani et al., 2005).The procedure of the PF algorithm can also be divided into two frameworks: forecast and analysis.
If there are enough observations, the posterior density at k can be approximated as where δ ( * ) is the Dirac function and in which p X a k |Y 1:k is the posterior probability distribution, X a i,k is the particle element, w i,k is the weight of each particle, and N is the number of particles.Unlike the EnKF algorithm, the PF method takes into account the weights of different particles and can be better applied to nonlinear systems.However, in association with the DA, there are a limited number of particles with large weights, and too many computing resources are distributed to particles with weights of approximately 0. This situation is called particle degradation (Doucet et al., 2000).Effective methods to solve this issue include resampling or selecting more reasonable importance functions.

Ensemble meteorological forcing and initial conditions
The ensemble initial conditions and background error (Hu et al., 2014) are produced from ensemble analysis products generated by running DART and the Community Atmosphere Model (CAM4) (Raeder et al., 2012).DART-CAM4 produced 80 atmospheric forcing datasets with 6 h time intervals for the period of 1998-2010.These ensemble meteorological data have been widely employed in DA for ocean, snow, soil moisture, and many other related studies (Danabasoglu et al., 2012).By considering computational cost and filter performance, 40 members among the ensemble forcing datasets are chosen to drive the CLM4CN.
To achieve a steady state solution for all state variables, the CLM4CN was run for 4000 years using Qian's forcing (Qian et al., 2006) at the resolution of 1.9 • latitude by 2.5 • longitude (Shi et al., 2013).The CLM4CN was then forced by the ensemble mean of selected 40 members of DART-CAM datasets for 1000 years.In the last step, the ensemble simulation during the time period from 1998 to 2001 was treated as a spin-up process, and 40 ensemble initial conditions were obtained.Aiming at a global scale and considering the computational cost, only 1-year assimilation and ensemble simulation were conducted.Our goal is to first find out the best experiment and then conduct long-term simulation or assimilation in the future.

LAI datasets
The Global Land Surface Satellite (GLASS) LAI dataset is used in this study as observations for assimilation (Zhao et   al., 2013).Since the ensemble simulation or assimilation is run at the resolution of 0.9 • latitude by 1.25 • longitude, the original spatial resolution of 0.05 • of the GLASS LAI is upscaled to the same resolution.
An independent LAI dataset from the Copernicus Global Land Service (CGLS) with version 2 (GEOV2 LAI) was utilized to validate the assimilation result.The GEOV2 LAI is derived from the vegetation instruments on Satellite Pour I'Observation de la Terre (SPOT-VGT) and on board the PROBA satellite (PROBA-V satellites) (Verger et al., 2014).The resolution of GEOV2 LAI is 1 km, which is also upscaled to the grid level to evaluate the analysis of LAI and assimilation effect.

Experimental design
To determine the optimal assimilation algorithm, five experiments corresponding to the KF, EnKF, EAKF, and PF methods are designed and shown in Table 1, in which the "Algorithms" experiments would reject some observations under certain conditions using the KF, EnKF, EAKF, and PF algorithms.The expected value of the difference between the prior mean and observation is σ 2 prior + σ 2 obs , in which σ prior and σ obs are standard deviations of the prior probability density function (PDF) and observation PDF respectively.DART will reject the observation if the bias of the prior mean and observations is larger than 3 times the expected value.The "Algorithms without observation rejection" experiments would accept all the observed LAI.During assimilation, CLM stops and writes restart and history files at a frequency of 8 d.If there are available observational GLASS LAI data, they are assimilated into the CLM4CN.DART extracts the state vector; the increments are calculated by filtering at each time step; and the LAI, leaf carbon (Leaf C), and leaf nitrogen (Leaf N) are updated.The adjusted DART state vector is resent to the CLM restart files as a new initial condition for the next time step.All the simulation and assimilation are conducted at the spatial resolution of 0.9 • latitude by 1.25 • longitude.The ensemble assimilation is conducted point wise, indicating that spatial covariances are not considered.There are two latitudinal belts of high LAI values located in the tropics and at 50-65 • N in July.These two regions are mainly dominated by evergreen broadleaf forests and boreal forests, respectively.There are three high-LAI regions located in the tropics: the Amazon, central Africa, and some islands in Southeast Asia.Because of the presence of deserts, plateaus, and bare ground, the LAI is low in northern Africa, western North America, western Australia, southern Africa,  and southern South America, where shrubs and/or grass are dominant.Globally, the CLM4CN can simulate the LAI distribution characteristics, except that it systematically overestimates LAI, especially at low latitudes and boreal forest regions, with the largest bias of 5 m 2 m −2 .The global LAI is lower in November than in July.The LAI values in the high latitudes of the Northern Hemisphere are higher in July than in November because November is not the growing season for most of the vegetation in the Northern Hemisphere.
The differences between the methods of (a) EAKF, (b) EnKF, (c) KF, and (d) PF and GEOV2 LAI are displayed in Fig. 2. Globally, the differences between assimilation with the four methods and GEOV2 LAI are larger in lower-latitude regions, indicating that assimilation also overestimates the LAI value in these regions.The biases of assimilation and observation reduce to 2 m 2 m −2 in the low-latitude regions compared with the biases of simulation and observation in Fig. 1, where they are dominated by BET tropical and mixed forest types.The LAI values from the assimilation experiment are always 1 m 2 m −2 higher in the middleand high-latitude regions, especially in western North America, northwestern China, and western Australia, where open shrublands and grasslands are dominant.Assimilation always underestimates the LAI values in eastern North America, northeastern China, and the 50-65 • N latitude regions of Eurasia, where they are dominated by NET boreal forests and mixed forest types.The assimilation with the EAKF and EnKF algorithms displays a lower bias than the KF and PF algorithms compared to GEOV2 LAI, especially in the northern and eastern Amazon, central Africa, southern Eurasia, and Southeast Asia.Notably, the correction of overestimated LAI is significantly better than that of underestimated LAI, which is mainly attributed to the high dispersion of LAI in those regions.In other words, high dispersion is beneficial to assimilation.
The results also indicate that the EAKF and EnKF assimilation algorithms are better than the KF and PF algorithms in November (figures not shown).In detail, the EAKF algorithm is better than the EnKF method in November, especially in the Amazon, central Africa, and southern Eurasia.The biases of assimilated LAI relative to the observed LAI are higher in November in the 20-65 • N region, which may be because vegetation during this period in the Northern Hemisphere is not lush.In western Australia and central Eurasia, the improvement of the underestimation in November is not as significant as that in July, which indicates that the system has a limited capability to simulate the vegetation process, especially for open shrubland and grassland.From the perspective of the average and RMSE, the PF algorithm performs worse than the EAKF and EnKF algorithms because of the gradually reduced acceptance of observations with assimilation steps (will discuss below).Note that the average and RMSE only make sense for the ensemble Kalman filters.For the PF algorithm, the particle with the largest weight (a posteriori maximum for the PDF) should be discussed separately.
The RMSEs of ensemble members are shown in Fig. 3 to provide hints where the assimilation is the most efficient.The RMSEs of ensemble members for the EAKF and EnKF algorithm are larger than those for the KF and PF algorithms, indicating that the EAKF and EnKF are more effective.In July 2002, the RMSE of the ensemble estimates is the largest in lower-latitude regions, with particularly high values in central South America, central Africa, and Southeast Asia.The regions with comparatively large ensemble spreads are located in western North America and western Europe.The large ensemble spreads areas are also transitional regions with different vegetation types, indicating low capability of the models to simulate complex vegetation types.
The globally mean LAI and the LAI in five latitudinal bands were chosen for analysis in this study.The five bands are boreal (45-65 • N), northern temperate (23-45 • N), northern equatorial (0-23 • N), southern equatorial (0-23 • S), and southern temperate (23-90 • S). Figure 4 presents the rootmean-square deviation (RMSDs) of the ensemble means of simulation and assimilation versus GEOV2 LAI for (a) global, (b) boreal, (c) northern temperate, (d) northern equatorial, (e) southern equatorial, and (f) southern temperate.Generally, although they all feature similar variation pattern characteristics, the RMSDs of all the assimilation datasets relative to the GEOV2 LAI are less than those of the simulation, indicating that all four assimilation algo- rithms can improve the LAI estimation.For boreal regions, there are two maxima for the RMSD in May and September respectively, which is also the period with abrupt variation for the LAI value.During the growing season, the RMSDs of LAI reach relatively low values, especially for the regions in the middle and high latitudes of the Northern Hemisphere and high latitudes of the Southern Hemisphere.In the low-latitude region covered by evergreen or deciduous broadleaf forests, the RMSD does not present an obvious annual change.The EnKF algorithm performed best in the boreal region with the smallest RMSD, while it did not perform as well in the northern temperate and northern equatorial regions.The EAKF algorithm presented the lowest RMSD in the southern equatorial and southern temperate regions, as well as global regions.The assimilation is far less efficient in the boreal region than in other areas, which is partly attributed to the consistently low initial RMSD during nongrowing seasons and limited capability of the models for simulating processes associated with boreal forest type.
Figure 5 shows the globally or regionally averaged RMSDs of simulation and assimilation and GEOV2 LAI.The RMSDs of assimilation are lower than those of simulation, implying that assimilating remotely sensed LAI data into the CLM4CN is an effective method for improving the model performance.The difference between simulation and all four algorithms in the northern and southern equatorial regions is larger than in other regions, indicating that the assimilation is more efficient there.The global averaged RMSD for LAI from the EAKF experiment is lower than the other three algorithms, except for the boreal regions, indicating the better performance in assimilation.
The background and analysis departures are calculated as (1) innovations, which are the differences between the assimilated LAI and model background; and (2) residuals, which are the differences between the assimilated LAI and analysis (Barbu et al., 2011).It was concluded that the LDAS system is working well based on the condition that the residuals are reduced compared to the innovations (Albergel et al., 2017).Figure 6 shows the histograms of innovation and residuals of LAI globally and for all subregions during July 2002.Generally, the distribution characteristics of both innovations and residuals are similar for the algorithms of KF and PF, which means that these two algorithms are not very efficient for LAI assimilation.The distribution of residuals is more centered on 0 than that of the innovations for the EAKF and EnKF algorithms, especially for the EAKF algorithm.The innovations dominantly exhibit a large negative bias, indicating that the model always highly overestimates LAI.The residuals can improve this overestimation situation, especially for the EAKF algorithm.The analysis departures for the EAKF al- gorithm are more centered on 0 than the EnKF algorithm, especially in global, northern temperate, and southern temperate regions.

Effective observational proportion
The assimilation results depend not only on the algorithm but also on the observations.This not only requires a sufficiently strong degree of discretization for ensemble simulations but also requires the observational variables to be sufficiently trustworthy.In this section, the proportion of LAI observations that can be accepted for the four algorithms is discussed.During assimilation, DART can calculate the number of nonassimilated observations when the difference of prior mean and observations is larger than 3 times the expected value.The proportion of accepted LAI observations is defined as the number of accepted observations divided by the number of total observations.To explain the relationship between the assimilation algorithms and observation rejection, Fig. 7 displays the proportion of accepted LAI observations for the four algorithms in the zonal regions.In general, the EnKF and EAKF methods accepted many more observational LAI observations than the PF and KF methods.In the low-latitude regions, the proportion of accepted LAI observations is approximately 75 %, which is lower than in the high-latitude regions.This may be because the broadleaf forest in tropical regions can grow unrestrictedly in the model, producing LAI values that are much higher than the observations.At the very beginning of assimilation, DART rejects the largest proportion of LAI observations in the southern equatorial, northern equatorial, and northern temperate zones due to large biases between the simulation and the observations.Over time, the rejection proportion gradually decreases for the northern equatorial, southern equatorial, and southern temperate regions.As ensemble-analyzed LAI values tend to be relatively fixed, the rejection proportion increases over regions with small LAI amplitudes, such as the northern temperate and boreal region.From May to September in the boreal region and from April to September in the northern temperate region, the proportion of accepted LAI observations is much smaller than in the other regions.These two periods with abrupt variation for the LAI value are also when the model simulation presents an obvious discrete characteristic.This experiment illustrates the utility of the spin-up process for ensemble initial conditions.Furthermore, the KF and PF algorithms gradually reduce the acceptance of observations as assimilation progresses, which may partially explain their worse performance than the EnKF and EAKF algorithms (see Fig. 5).
The differences between globally assimilated and GEOV2 LAI with the methods of EAKF (with rejection) in (a) July and (b) November are shown in Fig. 8 to illustrate the role of observation proportion.It can be concluded that when accepting all the observations, the assimilation results seem to be better than when some observations are rejected during assimilation.Large negative biases occur in the Amazon, central Africa, southern Eurasia, and the boreal region, where the LAI is overestimated in the model.Large positive biases occur in southeastern China, western North America, western Australia, and central South America in July, partly due to the influence of topography.In November the positive biases are observed around the whole middle-and high-latitude regions of the Northern Hemisphere, indicating the overestimation for the LAI value in nongrowing seasons.
During assimilation, the assimilated observations (GLASS LAI) are always treated as true values.The question thus becomes how do the true values influence the assimilation results? Figure 9 shows the RMSDs of simulation experiments with and without rejection (EAKF_reject and EAKF_noreject) and GEOV2 LAI over the (a) global, (b) boreal, (c) northern temperate, (d) northern equatorial, (e) southern equatorial, and (f) southern temperate regions.In the EAKF_reject experimental design, if the observed LAI is 3 times larger than the bias between the simulation and the observations, the observation would be rejected by DART, while in the EAKF_noreject experiment all observed LAIs are assimilated.Generally, RMSDs for both simulation and assimilation present obvious annual variations.The RMSD of assimilation is far less than that of the simulation, although their characteristic variation patterns are similar.This demonstrates the effectiveness of assimilation for improving model simulation.The RMSD relative to the observations was highest for the simulation, followed by the EAKF_reject experiment, and was lowest for the EAKF_noreject experiment.During assimilation, when accepting all the observations, the RMSD is smaller than when rejecting some observations.Compared with the EAKF_reject experiment and other algorithms in Fig. 5, the globally and regionally averaged RMSDs from the EAKF_noreject experiment is much smaller, indicating the most efficient performance.

Conclusions and discussion
The Community Land Model version 4 with prognostic carbon and nitrogen components (CLM4CN) is coupled with the Data Assimilation Research Testbed (DART) to determine the optimal assimilation algorithm for leaf area index (LAI).The kernel filter (KF), ensemble Kalman filter (EnKF), ensemble adjust Kalman filter (EAKF), and particle filter (PF) are discussed in this paper.
The results show that assimilating remotely sensed LAI into the CLM4CN is an effective method for improving model performance.Globally speaking, the EAKF and EnKF assimilation algorithms are better than the KF and PF assimilation algorithms.The LAI obtained by the EAKF algorithm is more continuous than that obtained by the EnKF algorithm and more consistent with observations in central South American and central Africa, whereas the deviation in the EnKF method can be from −4 to 4 m 2 m −2 .Furthermore, the assimilation shows better performance in the vegetation growing season.The lowest root-mean-square deviation is associated with the EAKF algorithm, suggesting that the EAKF algorithm is the best and has a robust performance.
The proportion of observations accepted by the land data assimilation system is another topic of this research.The proportion of accepted LAI observations is 10 %-20 % in the low latitudes, which is lower than in the high latitudes because of large biases between the assimilation and the observations.In contrast, low observation acceptance does not mean bad assimilation results, indicating that assimilation performance relies on not only observation factor but also the background error and ensemble model performance.When all the observations are accepted, the RMSD of the results is smaller than that when some observations are rejected.
The ensemble assimilation is conducted point wise without considering spatial covariances, which will be consid- X.-L.Ling et al.: Comparing algorithms for LAI assimilation 2.3.2Ensemble Kalman filter (EnKF)

Figure 1 .
Figure 1.Spatial distributions of global LAI values in 2002 for (a) GEOV2 LAI in July, (b) ensemble mean of simulations in July, (c) GEOV2 LAI in November, and (d) ensemble mean of simulations in November.

Figure 2 .
Figure 2. Differences between global LAI from assimilation experiments with the methods of (a) EAKF, (b) EnKF, (c) KF, and (d) PF and GEOV2 LAI in July 2002.

Figure 3 .
Figure 3. Same as Fig. 2 but for RMSE of ensemble members.

Figure 5 .
Figure 5. Globally or regionally averaged RMSDs for the simulation and assimilation results and GEOV2 LAI.

Figure 7 .
Figure 7.The proportion of accepted LAI observations for the four algorithms in the zonal regions.

Figure 8 .
Figure 8. Differences between globally assimilated and GEOV2 LAIs for the methods of EAKF in (a) July and (b) November.