Several data assimilation methods have been used in the field of atmospheric
chemistry and air quality in many studies

Hence, it will become increasingly difficult to disentangle the merits of data assimilation schemes, of models, and of their numerical implementation in a successful high-dimensional data assimilation study. That is why we believe that the increasing variety of problems encountered in the field of atmospheric chemistry data assimilation puts forward the need for simple low-order models, albeit complex enough to capture the relevant dynamics, physics, and chemistry that could impact the performance of data assimilation schemes. Low-order models, also called toy models, are models of reduced dimension meant to capture the prominent characteristics of the dynamics of larger models, but at a much lower computational cost. They are not meant to be realistic, but their study provides insights into the larger models and their dynamics. Their low numerical cost also comes with the ability to compute reliable statistical scores in various regimes and hence to validate methods with greater confidence. Moreover, they can be distributed and used with the goal to benchmark data assimilation methods since their baseline performance can easily be reproduced.

The Lorenz-95 (L95) model is a very popular low-order meteorology model

The equations of the model are those of Eq. (

This model will be called L95-T
in the following. Because the meteorological and tracer state vectors are
simulated together, it is an

Hence, L95-T represents an instructive model for more ambitious CCMMs

However, in order to develop a qualitatively representative low-order CCMM, nonlinear chemistry must be added. The
primary goal of this article is to extend the L95-T model with a simple photochemical kinetic mechanism. To that end, we
will use the generic reaction set

Hence, the secondary goal of this work is to illustrate the usefulness of
low-order CCMMs to better understand the application of specific data
assimilation techniques to CCMMs and CTMs. The data assimilation method we
shall use is the iterative ensemble Kalman smoother (IEnKS). It was
introduced and developed by

It was shown with the L95 and L95-T models that the IEnKS outperforms the ensemble Kalman filter (EnKF) and 4D-Var for filtering applications (i.e. present-time estimation and forecasting), especially in strongly nonlinear conditions. It was also shown to outperform the 4D-Var and the standard ensemble Kalman smoother for smoothing (i.e. reanalysis). As for any EnVar method, the toll for applying these methods to high-dimensional models is to use ad hoc techniques to regularise the error statistics obtained by empirical ensemble statistics that are prone to sampling errors. As a consequence, localisation and possibly inflation are required when implementing the IEnKS in high-dimensional systems.

If

In Sect.

In this section, we substitute the tracer part of L95-T for a reduced-order
photochemical kinetic mechanism to form the low-order coupled chemistry
meteorology model L95-GRS. We will first describe the resulting model, then
we will evaluate its ability to reproduce major physical and chemical
characteristics of the processes considered. All the parameters and equations
described in the following, with additional details, are gathered in
Appendix

The photochemistry module is based on the GRS of

Since

To further reduce the GRS scheme and improve the efficiency of its numerical implementation, we use the quasi-steady-state approximation
(QSSA) for the radical pool species

GRS is coupled to the L95 model. As for the L95-T model, the L95 variables
are seen as wind speeds that advect the GRS chemical species. The objective
is, therefore, to create a simplified model that is able to reproduce the
temporal variability of ozone chemistry at a regional to transcontinental
scale. There is a total of

The transport equations for species

When L95-GRS is seen as a global low-order model, the photolysis rate constant

Time evolution of the L95-GRS variables at
one grid point.
The L95
variables, flagged “Wind”, are shown with the original Lorenz unit, while the concentration unit is

As

The L95-T model is integrated in time using a fourth-order Runge–Kutta (RK4)
scheme with a time step of

Similarly, the L95 subsystem of the L95-GRS model and the transport part are integrated with the RK4 scheme. A first-order splitting of this integration and the chemistry integration is performed, integrating first the L95 and species transport part, followed by the GRS integration.

The chemical reactions of L95-GRS have a wide range of rates, which leads to
numerical stiffness. Hence, the RK4 scheme is an inadequate solver to
integrate the chemistry, even though it is more precise. An implicit or
semi-implicit scheme is required. That is why the GRS chemical scheme is
integrated with a second-order Rosenbrock method, following

Furthermore, an adaptive time stepping has been implemented that adjusts the time step to the instantaneous stiffness of the reaction rates. However, it has often been proven unnecessary in the free model run (i.e. without data assimilation) in conjunction with the QSSA used for the radical pool.

The typical integration time step for the chemistry is

Maximal ozone concentration (in

The outcome of a free run (after spin-up) at a grid point is shown in
Fig.

This model, even if not chaotic, is highly nonlinear, exhibiting distinct
chemical regimes. This can be seen in Fig.

In the

Time evolution of the L95-GRS variables over the whole
domain in the case of a continent/ocean division.
The L95 variables, flagged “Wind”, are shown with the original Lorenz unit, while the concentration unit is

To emphasise the impact of the transport of the chemical species by the wind
in the model, an experiment was performed, where the domain was split into
a

So far, the wind kinetics (amplitude and variability) has been determined by the original L95 model characteristics. In
the reference experiment, the waves of the wind extend over several days. The concentrations are driven by this wind
kinetics but vary within those waves according to the photochemical daily cycles. However, other types of behaviour are
possible with L95-GRS by choosing differently the timescale of the L95 model. If time within the L95 model is rescaled
by

Time evolution of the L95-GRS variables at
one grid point
with a time rescaling
of

In this section, we experiment on the use of data assimilation techniques for
forecasting and reanalysis with the tracer model (L95-T) beyond the
preliminary results of

A typical offline model is a CTM where the
meteorological fields have been generated externally and are given as an
input to the model. These fields usually stem from operational meteorological
prediction centres or from any independently run meteorological model. On the
other hand, online models consistently process meteorology, chemistry, and
transport of species all together, but at a higher numerical cost. The choice
of an offline or online approach is a crucial issue as far as modelling is
concerned

The L95-T model stands as a well-suited simple tool to experiment on this
issue. In the following, we apply the quasi-static IEnKS to L95-T using
either an offline or an online approach. A distinction is made between

The full online data assimilation system for the L95-T model. Even though the L95 subsystem of the model does not depend on the tracer subsystem, it should be kept in mind that information propagates both ways in advanced data assimilation methods, as long as the error covariance matrices are defined over both subsystems.

The offline data assimilation system for the L95-T model. The L95 subsystem
is run separately. The IEnKS is applied to L95 with a DAW length

We conduct synthetic data assimilation experiments, applying the IEnKS to the
L95-T model. A simulation of L95-T that represents the truth is generated,
with

The ensemble size of the IEnKS has been chosen to be

We consider several practical variants of the offline data assimilation
system for the tracer model. In the first offline system, called

Because the uncertain winds are a source of model error for the offline system, we also implement a multiplicative inflation on top of the IEnKS-N. It is applied on the prior by a rescaling of the anomalies. We choose the inflation that leads to the best RMSE.

In a last variant of the offline model, called

The full online data assimilation system is also run for comparison (experiment

Average RMSEs of the L95-T data assimilation
system using the IEnKS, as a function of the DAW length (in units of

The performance of these systems as a function of the DAW length is reported
in Fig.

First of all, the online system has a very significant edge over the offline
systems because of the two-way information flows, both for the concentration
variables and for the wind variables. This shows that concentration
observations can significantly improve meteorological forecasts, in agreement
with the results of

The extrinsic model error due to the uncertain winds must be accounted for in the tracer subsystems. Otherwise, the
ensemble of the tracer subsystem collapses (the ensemble method diverges). In the absence of any correction for model
error, we observe that the estimation is close to a free run, with an average filtering RMSE of about

Average filtering analysis RMSEs of the wind variables
(left) and concentration variables
(right) of the L95-T, as a function of the scavenging ratio for the ensemble transform EnKF (IEnKS with

Yet, as expected, accounting for model error offers better performance. Let
us first consider cases 1a, 1b, and 1c that use the best estimate of the mean
wind and apply multiplicative inflation to account for model error.
Configuration Offline 1a, i.e. when

With configuration Offline 1b, where the mean wind estimate comes from an
EnKF (

With configuration Offline 1c, the tracer data assimilation system is based
on an EnKF, while the wind estimation gets better as

In the light of these results, we understand that the improvement that is
observed in configuration 1a comes from the reduced uncertainty in the wind
fields in the first place. Note that as

With configuration Offline 2, model error is addressed by not only the
multiplicative inflation but also the ensemble of winds in the forecast
steps. Each wind member is ascribed to a tracer member. This is similar to
stochastic parametrisation where one changes the model input parameters for
each member of the CTM

One lesson is that a variational analysis over a long DAW is useless for the

The scavenging ratio

A synthetic experiment where the scavenging ratio

The average filtering RMSE of the concentration variables and of the wind
variables are plotted in Fig.

The performance of the EnKF and of the IEnKS is now studied with the L95-T model when the observations of the tracer
concentrations are sparser. The set-up of this synthetic experiment remains unchanged except for the density of the
observations. The wind variables are observed on all grid points while only some of the observations of the tracer
concentrations are assimilated. The observations of the concentrations are chosen to be evenly spread. The number of
observations is a divisor of

Average filtering analysis RMSEs of the wind
variables (left) and concentration
variables (right) of the L95-T, as a function of the number of concentration observations for the IEnKS with
several DAW lengths. The case

The IEnKS is now applied to the L95-GRS model introduced in Sect.

Twin experiments are conducted where each chemical species is observed. The
observations are drawn from the truth every

At first, the number and distribution of observations of the concentration
variables have been varied following the same set-up as in
Sect.

Average filtering analysis RMSEs of the L95-GRS
variables, as a function of the number of concentration observations for the
IEnKS with three DAW lengths. The case

To be more realistic, further experiments will assimilate sparse
concentration observations. We choose to keep eight observations in the
domain per species, that is to say at 1 every

Average filtering and smoothing analysis RMSEs of the
L95-GRS variables, as a function of the DAW length (in units of

In atmospheric chemistry, there is a strong dependency of the model on the
values of the various forcings, such as the boundary conditions

To estimate a set of model parameters

The estimation of the main parameters of the L95-T model (forcing of the L95
and emission rate of the tracer) with various data assimilation methods,
including the IEnKS, has been experimented upon by

For the emission rates: the ensemble is initialised around the truth by
adding an unbiased Gaussian noise of standard deviation

For

Rather than the single data assimilation (SDA) version of the IEnKS presented
in Appendix

Time evolution (

Let us first mention that the RMSEs of the state variables are barely changed
by the joint state and parameter estimation, as well as by the use of
a different variant of the IEnKS in this experiment. Hence, the results in
Fig.

It could be possible to estimate chemical reaction rates, for instance, the

Estimating covariances from a limited size ensemble of state vectors produces spurious long distance correlations
between variables. This degrades the estimation of the error statistics and can lead to divergence in ensemble data
assimilation methods. To address this issue, localisation is used in high-dimensional systems implementing ensemble
methods. There are two main localisation methods known as

We tested covariance localisation on the L95-GRS model using the DEnKF data assimilation method from

Average filtering analysis RMSEs of the
L95-GRS variables, as a function of the ensemble size for the DEnKF without
localisation or with optimally tuned localisation radius. The L95
variables, flagged “Wind”, are shown with the original Lorenz unit, while the concentration unit is

The aim of this article is to introduce low-order models on which to test advanced data assimilation methods in order to gain insights on some of the many difficulties encountered in data assimilation applied to meteorology and atmospheric chemistry. Amongst them, the questions of inflation, localisation for ensemble methods, model error, online and offline modelling, or nonlinearities have been addressed.

Building on the L95-T model, where the transport of a tracer is coupled to the L95 model, we introduced a new model, L95-GRS, where the tracer part is replaced with a simplified ozone chemistry. The L95-GRS model shows important peculiarities typical of tropospheric ozone chemistry. It has been adjusted to simulate pollutant concentrations of realistic magnitude. Ozone precursors can experience long-range transport by the meteorology and lead to ozone episodes far from the pollutant sources. It is possible to tune the wind magnitude in order to modify the time- and space scale of the model. Moreover, it has stiff equations that require the use of the same numerical tools as high-dimensional CTMs. Last but not least, it shows a nonlinear response to the emission rates of the ozone precursors. It thus includes several of the hardships of high-dimensional chemistry models without the high numerical cost. As such, it can be used to experiment upon and validate new data assimilation methods in the context of atmospheric chemistry modelling and coupled chemistry meteorology modelling.

To illustrate the use of advanced data assimilation methods on these models, and specifically ensemble variational methods, we first performed new experiments on the L95-T with the iterative ensemble Kalman smoother (including the ensemble Kalman filter). We showed that this model is suitable to test online and offline strategies for data assimilation, as well as to emulate model error stemming from a meteorological field, or an ensemble forecast of meteorological fields.

More specifically, we experimented on the offline version of the L95-T model, where the meteorology and the tracer subsystems are integrated and assimilated separately. This decoupling introduces model error into the tracer subsystem. In this context, having an ensemble of analyses from a data assimilation on the meteorology as an input to the tracer subsystem gives us a representative sample of this model error. By doing so, we have avoided the use of inflation and obtained optimal performance. We noticed as well that, for data assimilation purposes, the coupling of the two subsystems is only relevant when they have similar evolution timescales. In the case where the tracer subsystem evolves too quickly or too slowly compared to the meteorology, the coupling of these two parts fails to improve the results of the data assimilation compared to an offline case.

The use of data assimilation methods was also illustrated with the L95-GRS model. The iterative ensemble Kalman smoother performs well despite the nonlinearities of the model and even if the observation network is sparse. In particular, the model can help testing parameter estimation techniques with multiple parameters usually met in CCMMs and CTMs. The use of localisation was also successfully tested with L95-GRS. By making this wide range of experiments, we concluded that the L95-GRS model is suitable to test advanced data assimilation schemes.

A broad class of models could be developed by exchanging the L95
meteorological part with another low-order model. The L95 model has
anti-correlations in space and time that are not observed in more realistic
models. It could be replaced by its continuous extension, the Lorenz 2005-II
model

The L95-GRS model depends on several key species-dependent chemical and physical parameters that introduce many time- and space scales in the data assimilation system and that impact its observability and controllability. These parameters are likely to be representative of those of realistic CCMMs and CTMs. We have only investigated the impact of a few of those parameters, fixing the others. But a more general parameter-wise exploration of data assimilation systems built on L95-GRS is desirable.

Finally, following this study, we are planning to test the IEnKS on the
Polair3D CTM of the research and operational Polyphemus modelling platform

The code for the models L95-T and L95-GRS can be downloaded from the
following website:

The following pseudo-algorithm specifies a variant of the IEnKS, called the
single data assimilation (SDA) IEnKS

A cycle of the lag-L

Solve

Equations for the Lorenz variables

Hourly values of

The quasi-steady-state approximation (QSSA) consists in replacing Eq. (

The authors are grateful to Christian Seigneur and Yelva Roustan for sharing their expertise on the reduced chemistry scheme. They are also thankful to Stéphane Vannitsem and two other anonymous reviewers as well as the Editor, Adrian Sandu, for their useful comments and suggestions. This study has been partially supported by the INSU/LEFE project DAVE.Edited by: A. Sandu