The Land Variational Ensemble Data Assimilation Framework (LAVENDAR) implements the method of four-dimensional ensemble variational (4D-En-Var) data assimilation (DA) for land surface models. Four-dimensional ensemble variational data assimilation negates the often costly calculation of a model adjoint required by traditional variational techniques (such as 4D-Var) for optimizing parameters or state variables over a time window of observations. In this paper we present the first application of LAVENDAR, implementing the framework with the Joint UK Land Environment Simulator (JULES) land surface model. We show that the system can recover seven parameters controlling crop behaviour in a set of twin experiments. We run the same experiments at the Mead continuous maize FLUXNET site in Nebraska, USA, to show the technique working with real data. We find that the system accurately captures observations of leaf area index, canopy height and gross primary productivity after assimilation and improves posterior estimates of the amount of harvestable material from the maize crop by 74 %. LAVENDAR requires no modification to the model that it is being used with and is hence able to keep up to date with model releases more easily than other DA methods.

Land surface models are important tools for representing the interaction between the Earth's surface and the atmosphere for weather and climate applications. They play a key role in the translation of our knowledge of climate change into impacts on human life.
Most land surface models will converge to a steady state; their state vector tends toward an equilibrium defined by forcing variables (i.e., the meteorology experienced by the model) and the model parameters. This is quite unlike fluid dynamics models used for the atmosphere and oceans, which exhibit chaotic behaviour; a small change in their initial state can lead to large deviations in the state vector evolution with time.
Consequently, for some land surface applications parameter estimation can have greater utility than state estimation

Data assimilation (DA) combines models and data such that resulting estimates are an optimal combination of both, taking into account all available information about respective uncertainties. DA techniques are typically derived from a Bayesian standpoint and have been largely developed to service the needs of atmospheric and ocean modelling, especially where there is a need to provide near-real-time forecasts. Typically the focus of such activities is on estimating the optimal model state as the fundamental laws underlying fluid dynamics are well understood and many of the model parameters are known physical constants. However, this is not true for land surface models where parameters are much less well understood. Indeed these parameters can be allowed to change over time within a developing ecosystem or when an ecosystem is subject to a disturbance event to account for model structural inadequacies.

DA applications for land surface models are becoming increasingly common, using a wide variety of techniques and estimating both state and parameters. Many studies have employed Markov chain Monte Carlo (MCMC) methods (e.g.,

In this paper we present the first application of the Land Variational Ensemble Data Assimilation Framework (LAVENDAR) for implementing the hybrid technique of four-dimensional ensemble variational (4D-En-Var) data assimilation (DA) with land surface models. We show that LAVENDAR can be applied to the Joint UK Land Environment Simulator (JULES) land surface model

Data assimilation has previously been implemented with the JULES land surface model with

Our results show that 4D-En-Var is a promising technique for land surface applications that is easy to implement for any land surface model and provides a reasonable trade-off between the computational efficiency of a full 4D-Var system and the complexity and effort of maintaining a model adjoint. Perhaps most significantly, no modification to the model code itself is required. In Sect.

The Joint UK Land Environment Simulator (JULES) is a community-developed process-based land surface model and forms the land surface component in the next-generation UK Earth System Model (UKESM). A description of the energy and water fluxes is given in

We have used observations from the Mead FLUXNET US-Ne1 site

This section follows the derivation given in

In 4D-Var we require a prior estimate to the state and/or parameters of the system at time

For certain applications the prior error covariance matrix

In this section we outline a 4D-En-Var scheme using the notation defined in Sect.

We can see that the tangent linear model and adjoint are still present in Eqs. (

Test of the gradient of the 4D-En-Var cost function.

In order to implement 4D-En-Var we construct an ensemble of parameter vectors and then run the process model for each unique parameter vector over some predetermined time window. We then extract the ensemble of model-predicted observations from then ensemble of model runs and compare these with the observations to be assimilated over the given time window. In our code

Description of parameters optimized in experiments and model truth value. PAR represents photosynthetically active radiation.

To use another model in this framework, new wrappers would have to be written to mimic the functionality of

It is important to ensure correctness of the 4D-En-Var system. We show that our system is correct and passes tests for the gradient of the cost function

A so-called “twin” experiment in data assimilation is one where a model is used to generate synthetic observations to be assimilated. This is a commonly used approach to test whether particular combinations of observations can, in principle, be used to retrieve desired target variables using some DA method. In effect the model that the observations are being assimilated into is “perfect” because it represents the underlying physics that gave rise to them in the first place.
We conducted a parameter estimation twin experiment with the aim to recover values for
key JULES-crop parameters:
the quantum efficiency of photosynthesis, nitrogen use efficiency (scale factor relating Vcmax with leaf nitrogen concentration), scale factor for dark respiration, two allometric coefficients for calculation of senescence and two coefficients for determining specific leaf area (see Table

The model truth was taken from the values given in

For the experiments using real data from the Mead US-Ne1 FLUXNET site, the same seven parameters were optimized (shown in Table

4D-En-Var twin results for leaf area index using 50 ensemble members. Blue shading: prior ensemble spread (

4D-En-Var twin results for gross primary productivity using 50 ensemble members. Blue shading: prior ensemble spread (

4D-En-Var twin results for canopy height using 50 ensemble members. Blue shading: prior ensemble spread (

Figures

4D-En-Var twin assimilated observation RMSE for the four target variables when an ensemble of size 50 is used in experiments.

Prior and posterior distributions for the seven parameters are shown in Fig.

4D-En-Var twin results for harvestable material using 50 ensemble members. Blue shading: prior ensemble spread (

4D-En-Var twin distributions for the seven optimized parameters for both the prior ensemble (light grey) and posterior ensemble (dark grey). The value of the model truth is shown as a dashed vertical black line.

4D-En-Var twin results and percentage error for each of the seven optimized parameters when an ensemble of size 50 is used in experiments.

Figures

Prior and posterior estimates for unassimilated independent observations are shown in Figs.

Prior and posterior ensemble parameter distributions are shown in Fig.

4D-En-Var results for leaf area index using 50 ensemble members. Blue shading: prior ensemble spread (

4D-En-Var results for gross primary productivity using 50 ensemble members. Blue shading: prior ensemble spread (

4D-En-Var results for canopy height using 50 ensemble members. Blue shading: prior ensemble spread (

4D-En-Var results for harvestable material using 50 ensemble members. Blue shading: prior ensemble spread (

4D-En-Var results for leaf carbon using 50 ensemble members. Blue shading: prior ensemble spread (

4D-En-Var results for stem carbon using 50 ensemble members. Blue shading: prior ensemble spread (

4D-En-Var distributions for the seven optimized parameters for both the prior ensemble (light grey) and posterior ensemble (dark grey).

4D-En-Var Mead assimilated observation RMSE for the three target variables when an ensemble of size 50 is used in experiments.

4D-En-Var Mead unassimilated observation RMSE when an ensemble of size 50 is used in experiments.

In Sect.

In the results for all predicted variables we find that the posterior ensemble converges around the model truth. This can also be seen for the parameters in Fig.

We have demonstrated the ability of the technique to improve JULES model predictions using real data in Sect.

The experiments with Mead field observations do not show the same level of reduction in ensemble spread as in the twin experiments (see Fig.

Avoiding the computation of an adjoint makes the technique of 4D-En-Var much easier to implement and also agnostic about the land surface model used. By maintaining a variational approach and optimizing parameters over a time window against all available observations, we also avoid retrieving non-physical time-varying parameters associated with more common sequential ensemble methods. However, as with other ensemble techniques, results are dependent on having a well conditioned prior ensemble.
Methods of ensemble localization

In 4D-En-Var we approximate the tangent linear model using an ensemble perturbation matrix. Without the explicit knowledge of the tangent linear and adjoint models 4D-En-Var could be less able to deal with non-linearities in the process model in cases where the ensemble is small or ill-conditioned. For the examples presented in this paper 4D-En-Var deals well with the non-linearity of the JULES land surface model. However, it is possible that for high dimensional spaces, a technique of stochastic ensemble iteration

In this paper we have focused on using LAVENDAR for parameter estimation. However, the technique we present can just as easily be used to adjust the model state at the start of an assimilation window in much the same way as is done in weather forecasting

A particularly appealing aspect of LAVENDAR as presented in this paper is that there is no interaction between the DA technique and the model itself – once the initial ensemble is generated it is not necessary to run the model again to perform any aspect of the DA. Because the main computational overhead is running the model, this makes the DA

Variational DA with land surface models holds a lot of potential, especially for parameter estimation, but as land surface models become more complex and subject to more frequent version releases the calculation and maintenance of a model adjoint will become increasingly expensive. One way to avoid the computation of a model adjoint is to move to ensemble data assimilation methods. In this paper we have documented LAVENDAR for the implementation of 4D-En-Var data assimilation with land surface models. We have shown the application of LAVENDAR to the JULES land surface model, but as it requires no modification to the model itself it can easily be applied to any land surface model. Using LAVENDAR with JULES we retrieved a set of true model parameters given known prior and observation error statistics in a set of twin experiments and improved model predictions of real-world observations from the Mead continuous maize US-Ne1 FLUXNET site. The use of 4D-En-Var with land models holds a great deal of potential for both parameter and state estimation. The additional computational overhead compared to 4D-Var is an appealing compromise given the simplicity and generality of its implementation.

The code and documentation used for the experiments in this paper are available from

EP, TQ and KW designed the experiments. AL provided advice on the data assimilation technique and tests. EP developed the data assimilation code and performed the simulations. TA and DS provided observations from the Mead US-Ne1 site. EP prepared the article with contributions from all co-authors.

The authors declare that they have no conflict of interest.

This work was funded by the UK Natural Environment Research Council's National Centre for Earth Observation ODA programme (NE/R000115/1). The US-Ne1, US-Ne2 and US-Ne3 AmeriFlux sites are supported by the Lawrence Berkeley National Lab AmeriFlux Data Management Program and by the Carbon Sequestration Program, University of Nebraska-Lincoln Agricultural Research Division. Funding for AmeriFlux core site data was provided by the U.S. Department of Energy’s Office of Science. Partial support from the Nebraska Agricultural Experiment Station with funding from the Hatch Act (accession number 1002649) through the USDA National Institute of Food and Agriculture is also acknowledged.

This research has been supported by the Natural Environment Research Council (grant no. NE/R000115/1).

This paper was edited by Adrian Sandu and reviewed by two anonymous referees.