We compare two optimized chemical data assimilation systems, one based on the
ensemble Kalman filter (EnKF) and the other based on four-dimensional
variational (4D-Var) data assimilation, using a comprehensive stratospheric
chemistry transport model (CTM). This work is an extension of the Belgian
Assimilation System for Chemical ObsErvations (BASCOE), initially designed to
work with a 4D-Var data assimilation. A strict comparison of both methods in
the case of chemical tracer transport was done in a previous study and
indicated that both methods provide essentially similar results. In the
present work, we assimilate observations of ozone, HCl, HNO

The ensemble Kalman filter (EnKF) and the four-dimensional variational
algorithm (4D-Var) are widely used data assimilation methods that utilize the
model to propagate observational information in time and space into an
estimate of the state. Each method is built around different assumptions and
has its own merits. However, to some extent, the relative merits are
application dependent. In the context of meteorological data assimilation,
the relative advantages of these two methods have been discussed by

A short literature review discussing the CDA
problems related to EnKF and 4D-Var their intercomparison and application
to the atmospheric chemistry modelling is already given in

As in S14, here we use the BASCOE (Belgian Assimilation System for Chemical
ObsErvations) environment. BASCOE was designed to assimilate satellite
observations of chemical composition into a stratospheric CTM originally
using the 4D-Var assimilation method

How, then, do the EnKF and the 4DVar methods compare when photochemical reactions are taken into account? Do the results depend on the assimilated chemical species? Using actual satellite data sets and operational configurations, what are their respective performances in terms of precision, accuracy and computational efficiency? What is the role of the practical implementation of each method, when the full description of the stratospheric chemistry is taken into account in the CTM? These are the main questions addressed in this paper.

The application of the multi-variate EnKF method to an assimilation system with full chemistry should in principle address two important issues: the estimation of a large number of input error statistics and the problem of localization between chemical species.

The first issue is the large number of input error statistics that is needed
(e.g. the observation error variances for each species at each vertical
levels). Clearly, an online estimation of error statistics is desirable to
accomplish this task. In an idealized framework,

The second issue related to the implementation of a multi-species EnKF is the
localization between species. It is well known in EnKF applications that a
tapering of the sampling error correlations is needed when the true error
correlation is not close to

In this paper we perform an assimilation with EnKF and 4D-Var of several species in the stratosphere that are not necessarily directly chemically linked and with real-life constraints. The lifetimes of the assimilated species are quite diversified and vary with altitude. We use a state-of-the-art CTM that is in fact in constant improvement but also has some deficiencies. We use limb sounding observations that give vertically resolved measurements, and thus there is a need to have vertically resolved error statistics. As it was shown in S14, the EnKF is more sensitive to the observation error statistics than 4D-Var assimilation. Yet, to provide a consistency between the two assimilation systems, the observation error statistics of 4D-Var will be subject to the same Desroziers estimation procedure. Localization between species, which is needed in EnKF, is in fact not applied to 4D-Var because the cross-covariance between species is taken into account automatically using the 4D-Var adjoint model.

The paper is organized as follows. The next section describes the main
components of the BASCOE Data Assimilation System (version 5.8): the common
CTM, the 4D-Var system and the EnKF system. It also describes the
implementation of Desroziers' method and the tuning of the error covariances
in each system. The assimilated observations and independent data used to
validate the results are given in Sect.

In this study, all numerical experiments are performed with the Belgian
Assimilation System for Chemical ObsErvations (BASCOE) and its underlying
CTM. The BASCOE CTM computes the temporal
evolution of 58 stratospheric chemical species accounting for the advection,
photochemical reactions and a parametrization of PSC (polar stratospheric
clouds) microphysics. We used a CTM configuration nearly identical to the one
described by

All species are advected by the flux-form semi-Lagrangian scheme

The photochemical scheme of BASCOE account for 208 stratospheric chemical
reactions: 146 gas-phase, 53 photolysis and 9 heterogeneous. Photolysis rates
are provided by the Jet Propulsion Laboratory (JPL) recommendations

Schematic representation of the practical implementation of the 4D-VAR (top) and EnKF (bottom) assimilation methods in BASCOE. Black dots represent model state and observational information is depicted in blue. The black arrows represent model integrations by one time step, vertical red arrows represent model state optimization (4D-VAR) or Kalman filter (EnKF). Green dots represent the analyses at 0 h which are used as initial conditions for the diagnostic 24 h forecasts (green arrows). For clarity, the number of 4D-VAR iterations has been limited to two and the number of EnKF members has been limited to three.

In order to describe the practical implementations of the 4D-Var and EnKF
algorithms in BASCOE, we must first explain the different set-up of their
assimilation windows with respect to time. This is schematically shown by
Fig.

The EnKF initializes its ensemble of model states from one given state using
a procedure described in Sect.

The BASCOE 4D-Var of this study was already used by S14 and is described in detail by

The background error covariance matrix

The observation errors are assumed to be uncorrelated both horizontally and
vertically. The observation error covariance matrix

Finally, the BASCOE 4D-Var implementation includes the background quality
control procedure

The BASCOE EnKF of this study is similar to the system used in S14. An
ensemble of initial states

The operator

The observation error covariance matrix

As in our previous study with a chemical tracer model, BASCOE EnKF uses the
Schur (element-wise) product of the ensemble covariance matrix with a compact
support correlation function. This function is the fifth-order piecewise
rational function of

The EnKF analyses of this study are performed in parallel for every observed species in its own space.
Thus, such analysis increments of every species do not account for cross-correlations
between different chemical observations, which is not the case for the 4D-Var system.
However, it is technically possible to keep all observations from multiple species
in one observation space, thus introducing the cross-correlations between species.
An example of such EnKF data assimilation is discussed in Sect.

We use the

The BASCOE data assimilation is initialized using

Initial (INI) observation error covariance matrix of experiment A
(dashed red), starting with

The Desroziers estimates appear to be asymptotically stable after only 1
day. That means that changing the initial parameter value has little to no
effect on the resulting time series of estimated parameter values.
Figure

Each assimilation system (i.e. EnKF and 4D-Var) has its own optimized error
variances but shares a common error correlation for the prescribed

The observation error variance scaling factor

Estimated error scale factors within 4D-Var (left) and EnKF (right) for the period April–November 2008.

The performance of each data assimilation system of BASCOE can be monitored
by the

Finally, we wish to remark that to keep comparable CPU costs in both data
assimilation systems that can be carried out in a reasonable time,
4D-Var is run with 10 iterations (including 10 adjoint iterations) and the
EnKF uses 20 ensemble members. As in S14, the computation of the EnKF Kalman
gain is performed using Cholesky decomposition in which the full observation
vector is considered at a given time step for a given species. No
simplification is used to compute the inversion of the innovation matrix

The data set assimilated in this study is the version 4.2 of the retrievals
from the Microwave Limb Sounder (MLS) on board the EOS (Earth Observing
System) Aura satellite

Some results of the data assimilation will be validated against independent
observations. This will be the case for N

We will also use the Michelson Interferometer for Passive Atmospheric
Sounding (MIPAS) retrievals by the IMK/IAA (Institut für Meteorologie und
Klimaforschung, Karlsruhe/Instituto de Astrofisica de Andalucia, Grenada) to
validate the unconstrained distributions of CH

List of species retrieved in Aura MLS v4.2 and assimilated for this paper.

This section reports the numerical experiments performed in this study: the
control run, i.e. an unconstrained simulation by the BASCOE CTM including
photochemistry; the “EnKF” and “EnKF tracer” experiments, the former
including photochemistry and the latter neglecting it (i.e. assimilation
in chemical tracer mode as done in S14); and the two corresponding “4D-Var”
and “4D-Var tracer” experiments. All experiments start on 1 April 2008 from
the same initial condition, i.e. a 4D-Var analysis of Aura-MLS retrievals

The results of our model and data assimilation experiments will be assessed
using OmF statistics, relative bias and
standard deviation, computed in the observation space. In the case of N

O

Figure

All data assimilation experiments succeed in eliminating these biases nearly completely in the lower and middle stratosphere. The resulting biases are smaller than 2 % except for the 4D-Var experiment, which overestimates ozone depletion in the Antarctic ozone hole region (around 50 hPa) by up to 5 %. Compared with the CTM results, the 4D-Var and EnKF experiments also reduce significantly the OmF standard deviation in the lowest levels. The smallest OmF standard deviations are delivered by the 4D-Var experiment, with results about 1 % smaller than those delivered by the EnKF in pressure range 30–2 hPa.

The experiments 4D-Var tracer and EnKF tracer allow us to assess the impact
of stratospheric chemistry. Neglecting this process results in larger biases
and OmF standard deviations above the South Pole in the region 10–2 hPa,
where both tracer data assimilation systems overestimate ozone by

The photochemical lifetime of ozone decreases rapidly in the upper
stratosphere and reaches values as short as a few minutes

HCl OmF bias (top) and standard deviation (bottom) computed for the full chemistry CTM (green), EnKF (red) and 4D-Var (blue). OmF statistics are computed in percent with respect to the assimilated EOS Aura-MLS data for the period May–June 2008 and for three latitude bands (from left to right: South Pole, tropics–middle latitudes and North Pole).

During the largest part of the CTM simulation, the HCl distribution is in
agreement with the Aura-MLS observations. Additionally, the EnKF and 4D-Var experiments
deliver nearly identical results where the small CTM biases are completely
corrected (not shown). The only exception is in the South Pole latitude band
during the period May–June 2008, which is shown on Fig.

Staying in the lower stratosphere (100–10 hPa), the outcome of the
experiments is different than above the South Pole. Northward of
60

In the middle stratosphere, the chemical lifetime of HCl decreases from about
1 week at 10 hPa to about 1 day at 1 hPa

HNO

Figure

Both data assimilation experiments correct the OmF model bias at all
latitudes and at all pressure levels between 100 and 10 hPa. Above that
level, the quickly increasing model OmF bias is not corrected by either
assimilation algorithm. The explanation for this different behaviour in the
upper stratosphere is twofold. First, the observation error grows quickly
with altitude, reducing the weight of observations in the assimilation
experiments. Second, a large discrepancy between the model and the observed
data leads to rejection of most measurements above 10 hPa by the background
quality control procedure (see Sect.

The 4D-Var OmF bias is generally less than 3 % in the pressure range 100–7 hPa, except for an 8 % OmF bias at 70 hPa in the tropics. The EnKF delivers even smaller OmF biases in the whole pressure range and at all latitudes. Both data assimilations result in almost identical OmF standard deviations, except in the Antarctic polar vortex region where the EnKF errors are slightly larger below 20 hPa.

Water vapour is a long-lived tracer in the whole stratosphere, with a
photochemical lifetime still longer than 1 month at the stratopause

H

Both data assimilations mostly correct the OmF bias and standard deviation
errors with respect to the CTM. Their OmF biases do not exceed 2% except
for the OmF bias by the 4D-Var, which reaches 3 % at 1 hPa, i.e. the level
where the ozone deficit described in Sect.

The relative error statistics shown for other species are difficult to
interpret in the case of N

In the lower stratosphere the two satellite data sets and the CTM experiment
are in good agreement. Above 10 hPa the mixing ratios retrieved from Aura
MLS are much larger than those from ACE-FTS, and above 5 hPa they become
pressure independent, which is not realistic. As expected, the CTM experiment
agrees much better with the ACE-FTS N

Mean N

Verification of non-observed species from CTM (green), 24 h
forecasts from EnKF (red) and 4D-Var (blue) assimilation against MIPAS IMK
data (black dots): mean CH

Finally, the forecasts of two non-observed species issued from both data
assimilation systems will be validated: CH

All the EnKF experiments done so far used a brute force species localization; in other words, the sample covariance between species is set to 0. This type of localization should not be confused with the localization based on distance for the same species, which we keep. Now we examine what happens when we keep the sample cross-covariance intact.

To this end, we conducted an experiment in which we assimilate O

OmF bias (top) and standard deviation (bottom) between Aura-MLS data
and O

Figure

We have conducted a comparison of an EnKF and 4DVar data assimilation system
using a comprehensive stratospheric chemical transport model. We considered
4D-Var and EnKF configurations that are normally used for chemical data
assimilation applications. Both data assimilation systems have online
estimation of error variances based on the Desroziers method and share the
same correlation model for all prescribed error correlations (i.e. the
background error covariance for 4D-Var, initial error and model error for
EnKF) so that each data assimilation system is nearly optimal and can also be
compared to each other. A previous comparison study by

In the context of atmospheric chemistry, EnKF and 4D-Var differ in a number of ways. While 4D-Var, built on the assumption of a perfect model, tries to find a strong constraint solution that fits observations over a 24 h window, EnKF provides estimates at each model time step but allows for modelling error (mainly as a background error covariance). Furthermore, while 4D-Var infers information based on error correlation between observed and non-observed species, EnKF introduces noise between weakly chemically related species; so far in practice, these cross-species error covariances are set to 0. So the question is: to what extent is the chemical modelling an important component of the analysis? The implementation of a multi-species sequential chemical data assimilation is challenged by the need to properly tune and automate the estimation of a large number of input error parameters.

The comparison done in this paper shows that, in general, there is not a significant improvement in the OmF statistics of the system when the cross-correlation between species is kept (4D-Var) versus the EnKF system where the cross-species error correlation has been filtered out. Differences do occur, however, when there is an important chemical modelling error or when there are large biases between model and observed values.

For example, the BASCOE CTM has an important model O

The effect of large observation biases has a very different impact. For
example, the EOS Aura-MLS N

We have also examined the need to have cross-species localization in an EnKF.
Our study shows that the simultaneous assimilation of O

An important aspect of this study is the implementation of an online estimation of error variance parameters. The estimation of observation error variance and, in addition, the background error variance for 4D-Var is done at each observation vertical level, using the Desroziers method. The variance parameters being estimated are in fact very robust over time, showing little variability one day to the next. Finally all the experiments were done with comparable wall clock time for EnKF and 4D-Var settings.

The study has also some limitations. An acknowledged difficulty often encountered in chemical data assimilation is the situation in which both the model and the observations suffer from significant biases. This is the case, for example, with the BASCOE CTM CO and ClO when using the Aura-MLS data sets. Solving this problem represents a challenging task that we have not conducted here and would necessitate a dedicated study. Another limiting factor is the correlation length used in this study. We have not attempted to estimate it but rather have used what appears to be a reasonable value from past 4D-Var experiments. The estimated error variances and thus the weight given to the observations are also linked to the correctness of the error correlation, and this issue could also be investigated further. A future development of the BASCOE chemical data assimilation system would be a hybrid 4D-EnKF approach using the ensemble of models to construct a 4-D background error covariance matrix.

Other possibilities may be considered to properly compare two essentially
different data assimilation systems. For example, the 4DEnKF

Readers interested in the BASCOE code can contact the developers through

S. Skachko and R. Ménard designed the experiments and S. Skachko carried them out. Q. Errera developed the codes of 4D-Var and the Desroziers method. S. Skachko and Y. Christophe developed the EnKF code. S. Chabrillat and Y. Christophe worked on the CTM code. S. Skachko prepared the manuscript with contributions from all co-authors.

This research was financially supported at BIRA-IASB by the Belspo/ESA/PRODEX programme. The authors thank the ISSI (International Space Science Institute) for the funding of two work meetings in Bern to prepare the present article within the international “Study group on the added-value of chemical data assimilation in the stratosphere and upper-troposphere”. The authors thank three anonymous referees for their useful comments that have essentially improved the article and Samuel Remy for editing the paper.Edited by: S. Remy Reviewed by: three anonymous referees