Introduction
In recent years, data assimilation (DA) of atmospheric constituents has
become a key tool for providing more accurate forecasts and reanalyses of the
atmospheric composition. The increasing availability of chemical observations
from both satellites and ground-based instruments allowed to reduce the
uncertainty of atmospheric chemistry models in a large number of
applications. Utilization of DA can be found in the modeling of volcanic ash
, in operational air quality forecasts at continental scale
or in the reanalysis of the global atmospheric
composition at decennial scale . Data
assimilation can also be used to infer surface fluxes of long-lived chemical
compounds . A review of the utilization of
data assimilation for the atmospheric composition can be found in
and .
The main goal of DA is to reduce the uncertainties of a model through a
timely combination of model results and observations. This is generally done
by means of correcting the so-called “control variables” of the given
model. The choice of the control variables should reflect the largest source
of uncertainty of the considered model. In atmospheric chemistry, control
variables are typically associated with the model initial state
or chemical emissions .
However, inaccurate identification of the model uncertainty can lead to a
wrong adjustment of the model through DA, even though the spread between
model predictions and assimilated observations is reduced .
The choice of the control variable, and the approximate knowledge of its
uncertainty (also named background error covariance), is therefore critical
for the design of an appropriate assimilation algorithm and to ensure correct
results of DA.
Principal sources of uncertainty of atmospheric chemistry models include the
model initial condition, model ancillary data or parameters, model physical
parameterization, chemical mechanism, etc. .
Since different chemical species can be sensitive to different physical and
chemical processes, the main sources of uncertainties can also differ from
species to species. For example, long-lived species like CO2 or CO are
mostly sensitive to uncertainties in surface fluxes, which can be corrected
using variational algorithms in combination with long assimilation windows of
several weeks . This is possible since the
chemical reactivity and the sensitivity to the initial condition are
negligible for long integration of the model. Uncertainty in the transport
processes is also generally neglected . Data
assimilation of short-lived gases like tropospheric O3 or NO2, which
are encountered in air quality applications, is instead trickier. O3 and
NO2 are involved in rapid chemical reactions and sensitive to several
model parameters ranging from reaction rates, emissions of primary species,
clouds and radiation, boundary layer mixing, etc. It is much more difficult in
this case to identify a single and predominant source of uncertainty in the
model predictions.
In most air quality operational models a pragmatic choice is currently made
by setting the control variable to the initial state of the measured species
and using short forecast/assimilation cycles (e.g., 1 hour). Since
ground-based measurements are generally available at hourly frequency, most
used sequential DA algorithms, such as optimal interpolation (OI), 3D-Var or
ensemble Kalman filters (EnKFs) , all provide a strong
constraint on model trajectories . This strategy gives robust
results for operational analyses because chemical fields are corrected every
hour. Since the control variable correspond to the assimilated observations,
this also permits to estimate the background error covariance from previous
validation of the model against observations and
keeps from the difficult diagnosis of the true model uncertainties.
However, the model dynamics are neglected in OI or 3D-Var DA schemes.
Attempts of using ensembles of model analyses to specify the background error
covariance within 3D-Var dynamically did not show clear improvements over the
static case . Specification of cross correlations
between interacting chemical species in the 3D-Var background error
covariance matrix is also particularly difficult, because chemical
interactions depend on the local concentrations and on meteorological
conditions. As a consequence, multivariate chemical DA with 3D-Var schemes
has not been yet documented in the literature. In EnKF systems, the forecast
model is used to propagate and estimate the background error covariance, but
ad-hoc adjustments are necessary to avoid the collapse of the ensemble
variance and obtain realistic covariance matrices for 1 h forecasts
. As a result, costly algorithms such
as EnKF or 3D-Var hardly give better results than more simple OI for chemical
reanalyses . More importantly, very little improvement is
obtained, regardless of the employed DA algorithm, for the next-day model
forecast . Forecasts of reactive gases such as O3 or NO2,
but also other pollutants such as aerosols mixtures (PM10 and
PM2.5), depend weakly on the initial condition and are more sensitive to
model settings such as surface emissions or physical parameterizations.
Current operational systems can achieve accurate reanalyses of observed
chemical species through DA but parameter estimation and, more generally,
model errors must be taken into account in DA to improve chemical forecasts
of reactive gases and particles.
Some studies evaluated more advanced DA algorithms to jointly correct surface
emissions of precursor species and initial condition of observed species. For
example, employed a 4D-Var scheme in combination with
assimilation windows of 24 h to assimilate O3, NOx and SO2
measurements. A similar study has been done also in the context of a
toy model experiment by , where only emissions of precursor
species are adjusted to improve O3 forecasts. Results seem promising but
still rely on the assumption that the model is almost perfect, i.e.,
that there are no additional sources of uncertainties in the model forecast
other than the controlled variables (the initial state and the selected
emissions). This can lead to the overcorrection of control variables when
other non-negligible model errors exist, for example, due to the
meteorological forcing, photochemistry coefficients, dry or wet deposition.
Concerning EnKF implementations, some authors also tested joint optimization
of the chemical state and precursor emissions
. EnKF naturally includes
model uncertainties in its formulation, which can be added through stochastic
perturbation of model parameters during the ensemble forecast
. However, EnKF corrects the model trajectories
sequentially. In a typical air quality context, the emissions of O3
precursor species (e.g., NOx and VOCs) in the early morning or night can
affect the concentration of observed species (e.g., O3) in the early
afternoon, when the photochemistry takes place. When using EnKF, the
information made available by afternoon measurements cannot be used to
correct the model at previous hours.
Based on above-mentioned facts, we determine that the following capabilities
are needed to further improve DA in air quality applications:
permit simultaneous assimilation and optimization of multiple chemical species (multivariate DA), with possible chemical interactions and very different lifetimes;
include model error and account for disparate sources of model uncertainty;
allow long assimilation windows to make best use of the information content of frequent air quality observations.
This can be accomplished by using the so-called “weak constraint”
4D-Var , which is an extension of the 4D-Var algorithm
that accounts for the model error. On top of the operators already needed to
perform the strong constraint 4D-Var (e.g., the tangent linear and adjoint
codes of the forecast model), the formulation of the weak constraint 4D-Var
requires the definition of the model error covariances Q, which
can be difficult to estimate in real applications .
Recent studies have shown that the linear operators needed in the weak
constraint 4D-Var formulation can be approximated through an ensemble of
model forecasts. For example, the 4D-Var-EnKF uses an
ensemble approach to mimic the tangent linear and adjoint model for the
minimization of the weak constraint 4D-Var cost function. The iterative
ensemble Kalman smoother (IEnKS, ) is a nonlinear
4DEnVar formulated under perfect model assumptions, which can also be used
to estimate erroneous model parameters through an augmented state formalism
. A major asset of IEnKS for chemistry applications is
that it can also account for strong nonlinearities of the forecast model.
The 4DEnVar method uses an ensemble of nonlinear
model trajectories to estimate both the error covariances for the initial
condition and the model error, as well as to approximate the tangent linear
and adjoint model. These approaches are generally referred in the literature
as ensemble variational EnVar , as opposed to hybrid
methods, which make use of ensembles only to specify error covariances
matrices in variational algorithms . EnVar methods
have a major advantage for atmospheric chemistry applications: they avoid the
construction of tangent linear and adjoint codes of the forecast model, which
are still lacking for most of the operational CTMs, or are becoming very
difficult to be maintained due to the rapid evolution of models and computer
architectures.
The main advantage of the 4DEnVar method is that it permits to account for a
generic model error through the addition of stochastic perturbations during
the model integration step (like in EnKF). Moreover, it focuses exclusively
on the estimation of the model state, which is the only variable that is
directly constrained by observations. This avoids the difficult specification
of Q still needed in the 4D-Var-EnKS. As all ensemble-based
methods, 4DEnVar also naturally supports multivariate chemical DA, with the
cross-covariance terms between chemical species being automatically obtained
from the ensemble of the nonlinear model forecasts.
Variants of the 4DEnVar have been already tested in real numerical weather
prediction (NWP) applications . The method has proven to be
affordable for large-scale operational NWP models, even though the skills of
the operational hybrid 4D-Var are not yet matched. To the knowledge of the
authors, EnVar-type methods have not yet been implemented in air quality or
atmospheric chemistry models and only one study has already examined the
potential of EnVar methods for chemical DA . Note that,
operational NWPs are already based on well-matured 4D-Var DA systems, whereas
very few atmospheric chemistry models are based on such systems. Therefore, there is more room
for improvement from the EnVar type of algorithms in air quality models than in
NWP. Hence, the main objectives of this study are
to present a new atmospheric chemistry toy model built for assessing and comparing performances of several DA algorithms;
examine the potential and limits of the 4DEnVar algorithm for air quality analyses, compared to the generally used 3D-Var;
present a new procedure based on 4DEnVar to improve chemical forecasts on the next day.
The purpose is to examine state-of-the-art DA algorithms in the
reactive gases/air quality context and, therefore, to guide future
developments for the operational DA systems. Four gaseous species with very
different lifetimes and chemical mechanisms, currently well observed either
from satellites or from ground-based instruments, are considered for this
study (CO, O3, NO2 and CO2). Using a simplified model in this
context permits faster implementation of complex DA algorithms, cheaper
numerical experiments and more straightforward interpretation of the DA
results . The latter is particularly true compared to DA
experiments done using real observations, with generally unknown error
statistics. Compared to already-mentioned simplified models
, which are, respectively, 0-D and 1-D, the
newly proposed model is 3-D and uses the same tropospheric chemistry scheme
of operational air quality models. This allows us to reproduce more features of
real models, for example, the complex interactions of reactive chemistry and
large-scale advection or the effect of boundary conditions. This also permits
to better examine typical issues of DA within large systems, like the
emergence of sampling errors due to the finite size of the ensemble and the
consequences of localization techniques. Additionally, the use of 3-D fields
and operators eases the estimation of numerical costs and possible
bottlenecks of the DA algorithm in terms of operational implementation. We
remind readers that the objective of this study is to demonstrate the
applicability of a DA algorithm that could outperform currently implemented
methods in operational centers, but with an acceptable compromise between
computational costs and precision. Finally, the toy system has been
implemented using the library for data assimilation OOPS (Yannick Trémolet,
personal communication, 2015)
to ease the exchange of assimilation algorithms or toy models between
scientists.
The paper is outlined as follows. The developed atmospheric chemistry model
will be presented in Sect. . A summary of the data
assimilation algorithms employed in this study is given in
Sect. . The first section of the numerical results
(Sect. ) presents a number of DA experiments done under
the hypothesis of the perfect model. A detailed comparison between 4DEnVar and
3D-Var is presented, as well as a sensitivity study on the principal
parameters of the 4DEnVar algorithm, i.e., the ensemble size and the
localization choices. In Sect. , the effects
of a model error are investigated using the 4DEnVar algorithm. A statistical
comparison of the 4DEnVar and 3D-Var performances on multiple cycles of
analyses and forecasts is presented in Sect. . Finally,
conclusions are given in Sect. .
Model description
A new atmospheric chemistry low-order model has been developed for this study
and named QG-Chem. The objective was to reproduce typical features of
chemical fields from large-scale chemical transport models (CTMs), but
maintaining the computational cost low enough to allow a comfortable usage on
a personal computer. The meteorological forcing is computed using a two-layer
quasi-geostrophic (QG) model, representative of midlatitude mesoscale
dynamics . The QG wind field is used to advect the
chemical species, which makes QG-Chem a coupled meteorological–chemistry
model. This choice permits to examine the behavior of DA in the presence of
complex gradients of wind fields and vorticity. Since all DA algorithms make
strong assumptions on the model dynamics (Sect. ), it is
important to test them in the presence of advection patterns that can be
found in real applications. Nevertheless, the focus of this study remains on
atmospheric chemistry. Therefore, a detailed tropospheric chemical mechanism
has been considered. Aspects of DA concerning the coupling between
meteorology and chemistry are also left for future work, with the present
using QG-Chem in a CTM-like mode. The details of the meteorological and
chemical models are given next.
Quasi-geostrophic meteorology
The two-layer QG model is a geophysical fluid model composed of two
atmospheric layers of fixed depth and potential temperature. It is a simple
model of the atmosphere at midlatitudes, whose main forcings are represented
by the Coriolis force and the orography or surface heating. The governing
equation is the conservation
DqiDt=0
of the potential vorticity q=(q1,q2) expressed in nondimensional variables :
q1=∇2ψ1-F1(ψ1-ψ2)+βyq2=∇2ψ2-F2(ψ2-ψ1)+βy+Rs,
where the subscripts 1 and 2 stand for the top and bottom layer,
respectively. ∇2 is the two-dimensional Laplacian, Rs
represents orography or heating, β is the (non-dimensionalized)
northward variation of the Coriolis parameter at a fixed latitude, F1 and
F2 couple the layers together being a function of Coriolis force, layer
depths, gravity and typical length scale. The stream function
ψ=(ψ1,ψ2), whose horizontal derivatives give the horizontal wind
field (ui,vi), can be considered as the model state vector.
The code of the QG model that is distributed with the OOPS DA library have
been used for this study. The
depth of the two layers, the resolution of the horizontal grid and the
integration time step Δt are the main model parameters that can be
set at runtime. The dimensional scaling and model orography are fixed, as
well as the extension of the domain, which is 12 000 km in the zonal
direction and 6300 km in the meridional direction. The domain is cyclic in
the east–west direction, i.e., the model fields are periodic in this
direction. The stream function is set to climatological values at meridional
walls (Dirichlet boundary conditions). For all the experiments presented in
this study, a coarse resolution of approximately 750 km (16×8 grid
points, respectively, for the east–west and north–south directions) has been
used. We remind readers that the focus of this study is to test chemical DA
algorithms in a toy model framework. Therefore, there is no stringent
requirement on the realism of the meteorological fields and no need to
reproduce a real atmospheric situation. The only desired property is to
obtain wind fields that exhibit typical patterns of the complex atmospheric
circulation. A summary of the QG model parameters used in this study is
detailed in Table .
Tropospheric chemistry
The state vector of the QG model has been extended to include chemical
species. The regional atmospheric chemical mechanism (RACM)
, which describes 96 chemical species with about 300
reactions, has been implemented. This chemical scheme has been developed for
air quality modeling and is currently used by a number of operational models
in Europe . Photochemistry and its diurnal cycle are
included via look-up tables, assuming global clear-sky conditions. Surface
fluxes of chemical species (emissions and dry deposition velocities) are
assigned at runtime and are kept constant during the temporal integration of
the model. Chemical species are advected by the QG wind field using the
semi-Lagrangian scheme used for solving the QG governing equation
(Eq. ).
After the advection of the species, their concentrations are updated by
addition of the chemical tendencies. They are computed by solving the stiff
ODE system that describes the adopted chemical mechanism. The ODE system is
of the nonlinear form:
∂C/∂t=f(C)=P(C)-L(C)×C
where C represents the local species concentrations, P(C) and L(C) the
production and loss terms. The stiffness of the systems comes from the wide
range of values that can take the loss terms leading to a large range of
chemical lifetimes.
The above system is integrated using the adaptative semi-implicit scheme
(ASIS). ASIS is a one-step semi-implicit scheme with prognostic time steps. To
solve the system of linear equations associated with the semi-implicit scheme,
ASIS uses the generalized minimal residual method which
appears to be very competitive in terms of computation time with good
convergence properties. The ASIS solver is mass conservative and adapts its
sub-time step to the adopted tolerance errors as described by
. In our application, the ASIS solver uses a 7 s minimum
sub-time step, an absolute error tolerance of 104 molecules cm-3 and
a relative tolerance error of 0.01. Therefore, a common integration time step
(dt) for both the dynamical and chemical solvers is used, which is set to
dt=10 min to ensure reliable chemistry solutions. The dry deposition is
computed for each species before the application of the ASIS solver, i.e.,
using the main model time step dt. Concentrations are updated according to
Ci(t+dt)=Ci(t)-λiCi(t)dt,
where i denotes the chemical species and λi is the corresponding
deposition timescale in s-1, proportional to the deposition velocity.
The chemical field is set equal to climatological values on the N–S
boundaries, which correspond to Dirichlet-type conditions. The values of
surface pressure and temperature used by the chemical mechanism are fixed
globally and do not depend on the QG fields. These modeling choices let this
study focus on the following main processes of air quality models: emissions,
chemistry and transport. Therefore, only the bottom layer of the QG-Chem
model will be analyzed throughout this study. A summary of the configuration
of QG-Chem is given in Table .
QG-Chem model parameters and nondimensional scaling factors. The parameters marked by * are fixed globally and only relevant for the chemical mechanism.
Characteristic
Description
Geographical domain
12 000 km (E–W) ×
6300 km (N–S)
Zonal resolution
750 km (16 grid points)
Merid. resolution
790 km (8 grid points)
Top layer depth
6 km
Bottom layer depth
4 km
Typical horizontal scale
1000 km
Typical velocity
10 m s-1
Coriolis parameter F
10-4
Merid. gradient of F (β)
1.5×10-11
Orography
Gaussian hill (2 km alt.)
Chemical mechanism
RACM
Surface pressure*
1000 hPa
Temperature*
24.9 ∘C
Boundary layer thickness*
1.2 km
Description of the case study
A model run of 20 days is performed prior to DA experiments, starting from
the initial condition given in Table , which
corresponds to a homogeneous, relatively clean atmosphere and zonal
circulation. Emissions (Table ) are taken from the study
of , FLUX case, representative of the urban environment
of Paris. Spatial heterogeneity of model concentrations is obtained by
scaling the reference emissions in Table by 0.01, 1 and
0.25, respectively, on the western, central and eastern parts of the domain
(Fig. ). Deposition velocities from the French air quality
model MOCAGE , averaged over the Paris region during the
month of July 2010, are used and set constant over the QG-Chem domain. Values
are reported in Table . The chemical concentrations at
the meridional boundaries are set to the same values as in
Table . Hence, the presence of N–S boundaries can
eventually counterbalance the growth of long-lived species by advection of
clean air masses from outside the domain. This configuration relates to
regional air pollution modeling mainly concerning the type and amplitude of
chemical emissions, spatial heterogeneity of sources and presence of
boundaries.
Initial conditions used to initialize the truth simulation. The
initial fields are constant over the QG-Chem domain. Same values are also
assigned to the meridional boundaries of QG-Chem during all simulations.
Values are equal to zero for chemical species that are not listed below.
Chemical concentrations are expressed in volume mixing ratio units (vmr).
Variable
Value
Meteorology
(m s-1)
(u1,v1)
(40, 0)
(u2,v2)
(10, 0)
Chemistry
(vmr)
O2
0.2095
O3
30×10-9
CO2
310×10-6
OH
1.5×10-12
HO2
1×10-13
H2O2
1×10-12
N2O
310×10-9
NO
0.2×10-9
NO2
0.1×10-9
HNO3
0.5×10-9
HNO4
0.1×10-9
CH4
1.6×10-6
CO
150×10-9
Cl
1×10-12
HOCl
1×10-13
HCl
1×10-12
Br
1×10-13
BrO
1×10-13
HBr
1×10-13
Results for four key species are considered through the study: nitrogen
dioxide (NO2), ozone (O3), carbon monoxide (CO) and carbon dioxide
(CO2). The first three are of concern for air quality, since they have an
elevated toxicity and their concentration is strongly related to
anthropogenic emissions. The chemical reactivity and typical tropospheric
lifetime of NO2, O3 and CO is, however, very different. NO2
typically arises from the oxidation of nitric oxide (NO) in combustion
processes. It is highly reactive, lasting in the atmosphere from a few hours in
summer to several days in winter, and it is an ozone precursor. O3 is a
secondary gas, mainly formed by the reaction of nitrogen oxides and
hydrocarbons under sunlight. In the troposphere, it has a typical atmospheric
lifetime of 2–3 weeks. CO is produced by partial oxidation of carbon
compounds, which can occur in industrial or natural combustion processes. It
also participates in O3 chemistry and has an atmospheric lifetime of about
1–2 months. Finally, CO2 is of major concern for its effect on climate
and has the longest lifetime among all the considered species
(> 30 years). However, CO2 fluxes are not activated in the experiments,
which makes this gas behave like a passive tracer in our study.
Surface chemical emissions used for all QG-Chem experiments. Values
are equal to zero for chemical species that are not listed below.
Geographical scaling is further applied to these values as shown in
Fig. .
Species
Extended name
Value
(109 molec cm-2 s-1)
NO
Nitric oxide
121.29
CO
Carbon monoxide
2500
CH4
Methane
802
ETH
Ethane
6.25
HC3
Alkanes, alcohols, esters and alkynes
37.67
HC5
Alkanes, alcohols, esters and alkynes
44.43
HC8
Alkanes, alcohols, esters and alkynes
19.14
ETE
Ethene
22.33
OLT
Terminal alkenes
39.67
OLI
Internal alkenes
6.37
TOL
Toluene
9.02
HCHO
Formaldehyde
5.77
ALD
Acetaldehyde and higher aldehydes
14.45
KET
Ketones
5.7
XYL
Xylene
14.55
CSL
Cresol and other aromatics
3.68
Averaged model fields (24 h) on day 20 are shown in
Fig. . We note the presence of zonal gradients of chemical
concentrations for species that are strongly related to surface emissions
(e.g., NO2, O3 and CO), which are higher in the central part of the
domain (Fig. ). The developed cyclonic circulation also
allows the accumulation of longer-living pollutants (O3, CO) in
correspondence with the central low-pressure system, whereas patterns of short-living
gases (NO2) maintain a stronger similarity with the geographical
distribution of the sources. The maps also display the influence of
meridional boundary conditions, which produce local minima in O3 and CO
fields in correspondence with the advection of clean air masses from outside
the domain. The average model trajectory during 24 h shows significant
differences among all considered species. Since all species are advected and
surface fluxes are constant in time, these features arise from the complex
chemical interactions and photochemistry. Note, for example, the daylight
increase of O3 as a consequence of NO2 production from NO emissions
during nighttime and daytime photolysis. In contrast, CO shows an almost
linear increase in time, due to a constant surface emission and longer
lifetime. Most of the numerical experiments that are shown later in this
study start on the above-discussed day.
QG-Chem horizontal domain: scaling factor for the chemical surface
emissions (in colors), location of the synthetic observations used in the
assimilation experiments (black circles) and locations for which time series
of DA experiments are displayed (crossed boxes A and B). The numerical grid
is displayed using white lines.
Meteorological and chemical fields of QG-Chem on day 20.
Time-averaged fields for a 24 h period on the left, time series of
domain-averaged values for the same 24 h period on the right. The wind field
and the concentration of the four chemical species of interest (CO, NO2
and O3, with CO2 not shown since it is constant and equal to 310 ppmv) are shown from
top to bottom.
Data assimilation algorithm
We considered two data assimilation algorithms in this study: 3D-Var and
4DEnVar. The first is the simplest type in the family of variational DA
algorithms and currently the most used in operational chemical assimilation
systems . It is taken as a reference against which the
benefits of more complex (and costly) algorithms can be assessed. 4DEnVar is
an hybrid algorithm that combines benefits of variational and ensemble
methods. It is already used in a number of NWP models
and was tested in the framework of
meteorological toy models . A summary
description of the two algorithms is given below, as well as some specific
aspects relative to the atmospheric chemistry implementation presented in
this study. In the third section, a method based on the postprocessing of
4DEnVar output is proposed to correct model biases.
The Object Oriented Prediction System (OOPS), a generic software framework to
develop data assimilation systems (Y. Trémolet, personal
communication, 2015), was
used to implement and run all the DA experiments described in this study.
3D-Var
The 3D-Var analysis can be computed after a model forecast time step by means of minimizing the quadratic cost function J :
J(δx)=12δxTB-1δx+12(δyo-Hδx)TR-1(δyo-Hδx)
for the increment δx used to correct the previous forecast
xb (also named background), i.e., x=xb+δx, where x is the control variable
(e.g., the 3-D chemical state). Here, B and R are the
background and observation error covariance matrices, H the
linearized observation operator that transforms an increment of the control
variable into an increment in the observation space,
δyo the difference between the observations vector
yo and the previous forecast
δyo=yo-H(xb),
using the nonlinear observation operator. The minimization of J can be
achieved with standard techniques for the solution of large linear systems:
the B-preconditioned conjugate gradient algorithm has been used in
this study . The result of the minimization is the analysis
increment δxa (3-D). To advance in time, the analysis
xb+δxa is used as the new initial
condition for the following forecast step and so forth. The relative
simplicity, efficiency and robustness of the 3D-Var algorithm make it very
suitable for operational models. Its practical implementation requires mainly
the development of a covariance model for B .
Multivariate chemical assimilation can be performed with 3D-Var by extending
the 3-D control variable x to contain multiple 3-D model variables
(chemical species).
In this study the control variable x is set to represent the complete
model state, i.e., the stream function ψ plus the 96 chemical species.
The covariance matrix B is modeled through the sequential
application of 1-D square root correlation operators and a diagonal matrix,
representing the background error standard deviation:
Bδx=BT/2B1/2δxB1/2=Σ1/2Cz1/2Cx1/2Cy1/2Cv1/2,
where Cz, Cx, Cy and Cv
are, respectively, the vertical, zonal, meridional and multivariate correlation
operators and Σ is the variance (diagonal matrix).
Cx and Cy are isotropic homogeneous correlation
operators providing Gaussian spatial structures. The other correlation
operators are represented by symmetric positive-definite matrices. The
following parameters are used to set B: one horizontal-length
scale that defines the decorrelation scale for the zonal and meridional
coordinates, one value for the vertical correlation and one value for the
chemical correlation between each couple of model variables. The variance is
specified using one global value for each variable. The resulting
B is uniform and homogeneous on the horizontal plane. More complex
B models could be introduced to account, for example, for spatial
variability and heterogeneity of the background error covariance. However,
this is out of the scope of the present study, which is intended to reproduce
typical operational chemical DA settings, where B is usually
specified using a single variance and correlation length for each chemical
species.
Surface dry deposition velocity used for all QG-Chem experiments.
Values are equal to a fixed and low deposition velocity of
10-5 m s-1 for chemical species that are not listed below.
Species
Extended name
Value
(10-3 m s-1)
HNO4
Perinitric acid
1.167
MACR
Methacrolein and other unsaturated monoaldehydes
0.421
HO2
Hydroperoxy radical
8.665
TOL
Toluene
0.035
H2O2
Hydrogen peroxide
16.32
DCB
Unsaturated dicarbonyls
2.735
XYL
Xylene
0.035
GLY
Glyoxal
6.686
HNO3
Nitric acid
15.31
MO2
Methyl peroxy radical
0.931
N2O5
Dinitrogen pentoxide
17.69
OP1
Methyl hydrogen peroxide
2.152
HONO
Nitrous acid
1.430
NO2
Nitrogen dioxide
1.435
ALD
Acetaldehyde and higher aldehydes
0.638
OP2
Higher organic peroxides
1.348
KET
Ketones
0.622
HCHO
Formaldehyde
1.500
PAA
Peroxyacetic acid and higher analogs
1.317
ONIT
Organic nitrate
0.106
TPAN
Unsaturated PANs
1.047
PAN
Peroxyacetyl nitrate and higher saturated PANs
1.099
O3
Ozone
3.557
NO3
Nitrogen trioxide
1.316
MGLY
Methylglyoxal and other alpha-carbonyl aldehydes
2.804
SO2
Sulfur dioxide
4.463
HKET
Hydroxy ketone
2.954
UDD
Unsaturated dihydroxy dicarbonyl
12.31
CSL
Cresol and other aromatics
1.323
Only surface observations are considered in this study. Therefore, the
observation operator H is represented by the bi-linear
interpolation of model values to the observation location. A diagonal
observation error covariance R is used, as it is the case in most
real DA systems.
One drawback of 3D-Var is that DA results rely strongly on the background
error covariance B, which should depend on the previous
assimilation cycles and forecast errors (flow dependence). In practical
applications, B is usually set constant, estimated from
verification of previous forecasts (climatological) and/or tuned to provide
the best fit of the analyses against independent observations. In the case of
multivariate chemical assimilation, the estimation and validity of
climatological error covariances between chemical species has not yet been
demonstrated. As a consequence, multivariate corrections are normally
neglected (Cv=I). This simplification is also used in
this study. Finally, 3D-Var provides a correction to the control variable
each time the system is observed (every hour in air quality applications).
This does not allow to exploit the dynamical information contained in
observation time series and prevent all estimation of model error terms.
4DEnVar
The 4DEnVar algorithm can solve the main drawbacks of 3D-Var by introducing
the temporal dimension in the quadratic cost function (Eq. ),
similarly to what the classic 4D-Var algorithm does .
However, compared to the latter, it avoids the introduction of the tangent
linear and adjoint codes of the forecast model. Following the notation in
, the 4DEnVar cost function can be written as
J(δx¯)=12δx¯TB¯e-1δx¯+12(δy¯o-H¯δx¯)TR¯-1(δy¯o-H¯δx¯),
where all underlined terms are now time dependent (4-D). The control variable
x¯ becomes the temporal trajectory of the model state
(ψ plus the 96 chemical species in this study). The cost function is
computed for an assimilation window that can span several hours or days. The
assimilation window is discretized with an arbitrary number of sub-windows,
which defines the temporal dimension of the 4-D vectors and matrices in
Eq. (). The minimization of J returns a 4-D vector
δx¯a, which provides the analysis
trajectory for the entire assimilation window.
The forecast error covariance B¯e is
estimated from an ensemble of perturbed model trajectories:
B¯e=1L-1(x¯1′,…,x¯L′)(x¯1′,…,x¯L′)T,
where x¯′ are the four-dimensional perturbation and L
denotes the size of the ensemble. It results that
B¯e describes spatial (3-D), multivariate and
temporal covariances at once. An analogy with the 4D-Var weak constraint cost
function shows that B¯e represents a
numerical approximation of the linearized forecast model and model error
covariance . As for the 3D-Var case, the minimization
of J is achieved using a B-preconditioned conjugate gradient
algorithm (algorithm no. 3 in ).
H¯ and R¯ are block diagonal
matrices whose blocks are the operators H and R defined
in Sect. for each sub-window.
A nice property of the 4DEnVar algorithm is that the effect of the model
error covariance, generally denoted by Q , can
be introduced easily by adding physical perturbations during the computation
of the ensemble of forecasts. This lets the model dynamics develop the
complex covariances that derive from physical perturbations, without the need
to specify and implement a covariance model for Q. Whenever the
sole initial condition of the ensemble is perturbed at the beginning of the
assimilation window, the 4DEnVar approximates the 4D-Var algorithm in the
strong constraint formulation. The flexibility of the model error
specification in 4DEnVar is a strong asset for atmospheric chemistry DA,
where the sources of uncertainty can be highly variable and a function of the
chemical species.
The main drawback of 4DEnVar in practical applications to large-scale models
is related to the finite size of the ensemble. As a consequence, the 4-D error
covariance B¯e is not fully ranked and
covariance terms may contain statistical noise. The effect of a noisy
covariance is the appearance of spurious analysis increments far away from
assimilated observations, or between physically uncorrelated model variables
(in the multivariate case). Since with 4DEnVar temporal covariances are also
estimated through the ensemble, this effect concerns also the time dimension.
This is a typical issue with all ensemble-based methods and large systems,
and demands the introduction of a localization operator, which
attenuates non-local increments. The localization is applied to
Be by
B¯=B¯e∘C¯,
where ∘ denotes the Schur product (entry-wise product) and
C¯ is a 4-D correlation operator that damps non-local
covariances. The numerical implementation of Eq. () is
made under the following approximations: the same 3-D (and multivariate)
correlation operator C is used for all 4DEnVar sub-windows. It
follows that C¯ is a block matrix with all elements set
equal to C. Hence, in order to specify C¯,
we could use the covariance operator described in Eq. () by
setting the variance terms to one. The choice of parameters of the
correlation operators is discussed in the results section. This
simplification, also called static localization, significantly reduces the
numerical cost of the algorithm at the price of
degraded precision when increasing the length of the DA windows
. The development of localization procedures that are more
consistent with the dynamics of the forecast model is an ongoing research
topic and possible applications to QG-Chem
will be considered in a future study.
Diagnosis of model error and forecast correction with 4DEnVar
One of the objectives of this study is to retrieve model error information
from the 4DEnVar solution, which can be potentially used to improve the
chemical forecasts for the next day. The 4DEnVar analysis increment
δx¯a accounts for the correction of both
the initial condition and the model forecast (model error). However, the
control variable δx¯a only contains the
model state. It follows that the correction of the initial condition is
simply given by δxa(t=0), i.e., the first element
of the 4-D vector δx¯a. The correction of
the model error contributes to the values of δxa(t>0) but a diagnostic procedure has to be applied to
retrieve it.
We can compute a forecast trajectory x¯f using
the nonlinear model starting from the updated initial condition
xb(t=0)+δxa(t=0). Subtracting
it from the 4DEnVar analysis at t>0, we obtain
Δx¯(t>0)=x¯a(t>0)-x¯f(t>0).
This difference contains information about the contribution of the model
error in the 4-D state, as explained below. Let us first define the model
error η¯ and analysis error
e¯a as
η(t)=x*(t)-M(t-1)→(t)(x*(t-1))ea(t)=xa(t)-x*(t),
where x*(t) is the truth at time t, M(t-1)→(t) is the integration of the model from time (t-1) to time
t, with the temporal discretization matching the length of 4DEnVar
sub-windows (Sect. ). From Eq. (), we
have
η(t+1)=x*(t+1)-M(t)→(t+1)(x*(t))=x*(t+1)-M(t)→(t+1)(η(t)+M(t-1)→(t)(x*(t-1)))=x*(t+1)-M(t)→(t+1)(η(t)+M(t-1)→(t)(η(t-1)+M(t-2)→(t-1)(x*(t-2))))=…≃x*(t+1)-M(0)→(t+1)(x*(0))-∑j=1t∏l=j+1t+1Mlη(j),
where we have used the Taylor expansion and assumed that the forecast model
can be linearized inside each sub-window: M(t-1)→(t)∼Mt. This equality can be rewritten as
x*(t+1)-M(0)→(t+1)(x*(0))≃η(t+1)+∑j=1t∏l=j+1t+1Mlη(j).
Using the definition (Eq. ) and the linear assumption on
the model, Eq. () yields
Δx(t)=xa(t)-M(0)→(t)(xa(0))=xa(t)-x*(t)+x*(t)-M(0)→(t)(xa(0)-x*(0)+x*(0))=ea(t)+x*(t)-M(0)→(t)(ea(0)+x*(0))≃ea(t)+x*(t)-M(0)→(t)(x*(0))-∏l=0tMlea(0).
Substituting the equality (Eq. ) into the latter, we get
Δx(t)≃ea(t)-∏l=0tMlea(0)+η(t)+∑j=1t-1∏l=j+1tMlη(j).
Therefore, if the error (ea(t)-∏l=0tMlea(0)) is small, the vector Δx(t) represents the contribution of the model error
η¯ on the 4-D state, accumulated through the
preceding 4DEnVar sub-windows. Hereafter, we will name Δx(t)
effective model error, to be distinguished from the model error,
which is generally associated with η¯.
In a first instance, Δx¯ can be used to diagnose the
underlying presence of model error in the forecast. Furthermore, if the model
error is stationary along multiple DA windows (bias), Δx¯ could be used to correct the forecast for the next
window:
x¯̃fi+1=x¯fi+1-Δx¯i,
where the superscript denotes the assimilation window index and the
x¯̃f is the corrected forecast.
Validation of operational air quality models shows that biases contribute to
a large part of the model uncertainties
, which justifies the
implementation of the proposed bias correction procedure. Note that this
procedure is compatible with model biases that show an hourly variability but
are stationary on successive days, as it is found in most of air quality
models . Note also that the computation of the
effective model error is blind to the type of the underlying model error
(e.g., chemical emissions or physical parameterizations). The two main
requirements needed to make a useful estimation of Δx¯ are (i) analysis errors smaller than model errors and
(ii) an approximate knowledge of the sources of model error, necessary to
generate informative ensembles.
Alternatively from the proposed procedure, model error terms
η(t) could be diagnosed for each sub-window and
then applied to the next-day forecast on an hourly basis. However, this
correction method is more intrusive, because it must be applied during the
nonlinear model forecast and was not considered for this study.
Results and discussion
Numerical experiments are described and discussed in this section. The
objective is to assess the performances of the 4DEnVar algorithm for the
assimilation of the four key species: NO2, O3, CO and CO2. In all
experiments, a model simulation with unperturbed parameters (same as the one
in Sect. ) represents the truth. The experiments are
performed during the meteorological–chemical situation described in
Sect. . Synthetic observations are generated from the
truth by applying H (Sect. ) and by adding a normally
distributed error (Table ). Four observation locations
are considered for the experiments (Fig. ), where chemical
species are observed hourly. The relatively low density of the observation
network allows us to assess DA skills at unobserved locations. Model forecasts
are produced by perturbing only the initial condition (perfect model), the
surface emissions (model error) or both. DA is performed with either 3D-Var
or 4DEnVar, and the obtained analyses are compared to the truth. The
meteorology is never observed nor perturbed, which corresponds to use QG-Chem
in a CTM-like mode.
Operational air quality centers collect hourly observations and perform DA
typically once per day . Therefore, a 24 h assimilation
window is adopted when using the 4DEnVar algorithm, with 1 h sub-windows
matching the observations' frequency. For the same reason, 24 sequential
cycles of 1 h are adopted with 3D-Var. The main processes affecting
air quality forecasts (daily emissions, evolution of the mixing layer,
photochemistry) have a period of approximately 24 h. Therefore, a 24 h
window permits to account for errors in main model processes. The utilization
of longer windows is theoretically possible with 4DEnVar, assuming that the
linearization of the model perturbations remains valid. However, the
numerical cost of the minimization increases with the windows' length when
keeping fixed the duration of the sub-windows. The cost–benefit ratio of
longer windows will deserve further investigations in real applications.
Section presents results based on a perfect model
hypothesis and compares 3D-Var and 4DEnVar performances for chemical
reanalyses. Section introduces model errors and is
focused on the estimation of the effective model error using 4DEnVar.
Section summarizes the results on a larger number of DA
windows: 4DEnVar results using the model bias correction procedure are
compared to 3D-Var results for both reanalyses and forecasts.
Perfect model experiments
Experiments considering a perfect model are presented in this section for one
assimilation cycle of 24 h. Emphasis is placed on the reanalysis
capabilities of DA. A 24 h long forecast is produced by perturbing only the
initial condition. Initial perturbations are computed applying
B1/2 (Eq. ) to a Gaussian uncorrelated
noise field N(0,I) at t=0. Since chemical concentrations can
span different orders of magnitudes in the atmosphere, the standard deviation
values set in B1/2 depend on the chemical species
(Table ). The background error standard deviations have
been chosen to be about 10–20 % of the average field values
(Fig. ). The same horizontal correlation length has been
set for all species (750 km). Multivariate correlations in
B1/2 are switched off so that perturbations of chemical species
are not correlated at t=0. Chemical correlations can, however, arise at
t>0 due to chemical couplings.
Background (σB) and observation
(σO) error standard deviation used in DA experiments
expressed in volume mixing ratio units and, within brackets, in percentage of
the average field value in Fig. .
Species
σB
σO
NO2
1.2×10-10 (16 %)
0.9×10-10 (11 %)
O3
12.2×10-9 (21 %)
3.2×10-9 (5 %)
CO
64.8×10-9 (14 %)
8.1×10-9 (2 %)
CO2
40.5×10-6 (13 %)
20.2×10-6 (6 %)
DA is applied to correct the 24 h long perturbed forecast. The same
covariance matrix B that was used to produce the initial
perturbation is applied at each cycle of the 3D-Var analysis (1 h). Hence,
the background error covariance used in DA is perfectly known at t=0, which
represents the best possible case in DA. At t>0 the true B
depends on the observations assimilated at previous steps and on the model
dynamics, which makes the use of a fixed B within 3D-Var a raw
approximation. However, this is the typical setting of operational
air quality models.
With 4DEnVar, an ensemble of 16 forecasts is generated perturbing the initial
condition with the same B as above. Therefore, the ensemble size
is small compared to the dimensions of the system (16×8×97=12 416
variables). When using small size ensembles, the localization of the sample
covariance is necessary. In this study, horizontal localization is applied to
the 4-D ensemble covariance using Eq. (), by setting a
horizontal length scale equal to the double of values used for B
(1500 km). Multivariate localization is performed using a correlation
coefficient of 0.5, which was chosen empirically. A sensitivity analysis
concerning the ensemble size and the localization choices is presented later
in this section. Since we use QG-Chem in a CTM-like configuration and the two
layers are chemically not coupled, the vertical terms of the covariance or
localization matrix are always set to zero in this study, without any impact
on the presented results.
Univariate assimilation
This section compares results of 3D-Var and 4DEnVar
DA in univariate settings, i.e., one independent assimilation experiment is
performed for each of the four chemical species. The controlled species
corresponds to the species that is measured and that is perturbed at the
initial time. With 3D-Var, this is obtained setting all terms of B
to zero except for the 3-D covariance of the selected species. With 4DEnVar
the same is obtained by setting all multivariate localization coefficients to zero.
Figure compares the temporal trajectories of the
analysis of each species obtained from 3D-Var and 4DEnVar. Each figure
provides the temporal trajectories at the two grid points shown in
Fig. : one located in the polluted region and in
correspondence with assimilated observations (grid point A) and one in the
cleaner region and slightly displaced from the measurements location (grid
point B).
In addition to the comparison of the two DA algorithms, these experiments
also permit to assess the impact of the initial perturbation on chemical
forecasts. First of all, we remark that O3 forecasts are very close to the
truth values after 24 h, which is a consequence of the fact that O3 is
strongly controlled by precursor emissions and photochemistry. The memory of
the initial condition is rapidly lost for O3, as it was also demonstrated
within regional air quality models . This is not
the case for CO2 and CO, which have a longer lifetime. Therefore, the
initial perturbation is advected by the wind field and the spread between the
forecast and the truth lasts longer in time. NO2, which lasts a few hours
in a summertime atmosphere, is practically not sensitive to the perturbation
of the initial condition (Fig. ). Note also that
the chemical concentrations are always lower at the clean location (grid
point B) than at the polluted one (grid point A), except for CO2 that is
neither emitted nor chemically produced. Again, this is a consequence of the
geographical variability of the emission factors (Fig. ).
DA results with a perfect model hypothesis. Temporal trajectories
for the four assimilated species are shown from top to bottom (O3, CO, CO2 and NO2). Forecast, 3D-Var/4DEnVar analyses and truth are shown in each plot
for the grid points A (left plots) and B (right plots) depicted in
Fig. .
DA results with a perfect model hypothesis. From top to bottom: the
RMSE gain (Eq. ) is displayed, respectively, for O3, CO,
CO2 and NO2 assimilation experiments. Blue color means that DA lowered
the RMSE and red color means that DA increased the RMSE. Plots on the left are
obtained using 3D-Var, on the right using 4DEnVar.
Summary statistics of RMSE gain for perfect model DA experiments in
Fig. . Signs have been inverted compared to
figures to show positive values when DA reduces the RMSE.
3D-Var
4DEnVar
Species
Max. gain (%)
Min. gain (%)
Avg. gain (%)
Max. gain (%)
Min. gain (%)
Avg. gain (%)
O3
17.11
-7.67
0.92
27.03
-6.68
2.55
CO
17.15
-3.33
1.22
18.82
-4.11
1.62
CO2
13.08
-10.60
1.76
13.62
-7.51
1.93
NO2
57.39
-6.15
0.95
48.07
-10.52
1.21
We remark that the 4DEnVar provides in general better analyses than 3D-Var
for all species. First, it can be observed from
Fig. that the analysis time series obtained with
the 4DEnVar are smoother than those resulting from 3D-Var because daily
trajectories are optimized at once with 4DEnVar. The sequential aspect of
3D-Var, instead, makes the analysis more sensitive to the random observation
errors. This introduces the observed jumps in the analyses.
Figure provides the root mean square error (RMSE)
gain (in %) for every grid point (i,j) of the model domain,
RMSE(i,j)gain=1X¯(i,j)RMSE(i,j)anl-RMSE(i,j)fct.
Here, X¯(i,j) is the average concentration of the truth values
(Fig. ), RMSEfct and RMSEanl are the
absolute values of RMSE (in chemical concentration units) for the forecast
and analysis, respectively. The RMSE at the location (i,j) is defined as
RMSE(i,j)=1Nx¯(i,j)*-x¯(i,j)Tx¯(i,j)*-x¯(i,j),
where N=24 is the total number of sub-windows,
x¯(i,j)* and x¯(i,j) denotes
the temporal trajectories of the truth and the analysis (or forecast) at the
grid point (i,j), respectively. In Fig. the blue
color means that RMSE values after DA have reduced (DA improved the forecast)
and the red color means that RMSE values after DA have increased (DA degraded the
forecast). Therefore, we can see from these figures that with 4DEnVar,
improvements of the RMSE at unobserved locations are more pronounced than
those of 3D-Var (more widespread blue regions and less red regions in RMSE
gain). This is likely due to a better description of the background error
covariance, which is flow dependent and closer to the true forecast error
covariance within 4DEnVar. For example, the background error covariance used
to assimilate NO2 with 3D-Var is highly overestimated for t>0. This
happens because, when the model is perfect, the NO2 field rapidly
converges to the truth after a few hours (Fig. ).
In this specific case, the RMSE is degraded even at observed locations
(Fig. , bottom left plot). This is an instructive
example of the effects of incorrectly specified B within 3D-Var.
Finally, note that for other species also 3D-Var is capable of decreasing
RMSE at unobserved locations. This effect is a result of the advection of the
analysis increments, and, as expected, is more pronounced with long-lived
species (CO, CO2) than with more reactive or emitted gases (O3,
NO2).
Impact of ensemble size (Ne) on the RMSE gain of the 4DEnVar
analyses. Results are shown for the four chemical species: O3, CO, CO2
and NO2. For each plot, values of maximum, minimum and average RMSE gain
(Table ) are shown in different color, compared to
reference results obtained with 3D-Var. Positive values mean better RMSE gain
with 4DEnVar than with the reference 3D-Var. Average RMSE gain has been
multiplied by 10 to better highlight the differences among the experiments.
Same as Fig. but for the horizontal length
scale of the localization operator (“noloc” when no localization is
applied).
Same as Fig. but for the multivariate
correlation coefficient of the localization operator (with 0 corresponding to
univariate DA and 1 to a full multivariate unlocalized case).
Table reports the minimum, maximum and average of the
field values in Fig. , with the sign inversed to
display positive values for positive gain of DA, and negative otherwise. It can
be seen that with 4DEnVar, the maximum degradation of RMSE (i.e., the minimum
gain in absolute values) is always smaller by a factor 2 to 5 than the
maximum gain, and the average RMSE gain is always positive (i.e., DA improves
the forecast). The appearance of local but relatively small RMSE degradation
can be tolerated in atmospheric chemistry, because the chemical system has a
dissipative behavior and errors in the model state cannot grow during the
forecast step. Since the error covariance of the initial condition is
perfectly known in the experiment setup, the degradation of the 4DEnVar
analysis is a consequence of the algorithm hypotheses (e.g., the
linearization of the forecast model) or of the numerical implementation
(e.g., the finite size of the ensemble and the localization approximations).
Multivariate assimilation
A second set of DA experiments has been performed,
but perturbing and assimilating the four chemical species at the same time.
Therefore, one 24 h long forecast and the corresponding analysis has been
computed for all species, instead of four independent analyses as before. In
the case of 3D-Var, elements of B related to the four assimilated
species are set using the same parameters as in Sect. .
Elements related to unobserved species are kept at zero as well as for
cross-variable correlations. This leads to a multi-species assimilation,
which is not yet multivariate. Effects on unobserved variables or between
species are still permitted by chemical couplings in the forecast model.
The gain on RMSE obtained with 3D-Var for the four species (not shown) is
very similar to those obtained in Sect. with
independent experiments.
With 4DEnVar, the corresponding multi-species assimilation has been tested by
setting the cross-variable correlation coefficients to zero in the
localization operator. In addition, a multivariate case has also been
examined by setting the correlation coefficients to 0.5. In both cases
results (not shown) were found to be very similar again to those in
Fig. .
These results indicate that, when the initial condition is solely taken as a
source of uncertainty, the chemical coupling between species does not
influence DA much. This is confirmed by the ensemble standard deviation of
the 4DEnVar experiments in Sect. . For each of the four
assimilation experiments the average ensemble standard deviation has been
computed for all species (perturbed and not perturbed at t=0). The ensemble
standard deviation of unperturbed species stays below 1 % of the local
concentration (not shown), compared to typical values of about 10–15 % for
the species that are perturbed initially. Therefore, the perturbation of the
sole initial state does not affect significantly the chemical balance. This
also justifies neglecting the cross correlations between chemical species in
operational systems that assimilate hourly observations sequentially. This
result was expected for weakly reacting species like CO2 or CO but was not
evident for reacting gases such as O3 and NO2. A possible reason is
that the amount of NO2 produced hourly by the oxidation of emitted NO is
much larger than the applied initial perturbation. Therefore, the O3
photochemical production, which happens later during the day, is not much
influenced by the perturbation of NO2 at midnight. It would be interesting
to verify if similar results also hold when larger perturbations are applied
during daytime. A wider exploration of different chemical regimes is left
for a future study.
4DEnVar was capable of providing similarly good results as in
Sect. when enabling the cross-variable covariances
(and the respective localization). This means that the noise of chemical
cross covariances due to the small ensemble size (16 members) did not degrade
the results. This leaves hope for an effective multivariate chemical
assimilation when the role of chemical couplings becomes larger
(Sect. ).
Ensemble size and localization
This section examines the impact of the principal parameters of the 4DEnVar
algorithm, i.e., the ensemble size (16 members in previous experiments) and
the localization length scale, on the analysis RMSE. These two aspects are
closely linked with the dimension of the system and the eigenspectrum of the
error covariance matrices . For instance, for a strongly
decaying spectrum relatively few ensemble members are enough to provide a
good approximation of the error covariance matrix. If this is not the case, a
larger ensemble size allows in principle less severe localization. However,
no theoretical formulation exists already for the 4DEnVar that links these
parameters, e.g., the localization scale with the ensemble size. Moreover,
the number of parameters of the localization operator (e.g., the horizontal
length scale) might depend on the chemical species. A full exploration of the
parameter space becomes rapidly unpractical even in a simplified model
framework. Finally, the approximations made in the implementation of the
localization operator (Sect. ) can potentially have a larger
effect than the choice of the parameters themselves. Therefore, an empirical
approach has been used in this paper, and we postpone a detailed theoretical
analysis to future work.
In the empirical approach, first the ensemble size is examined using fixed
but reasonable values for the localization operator
(Sect. ). The ensemble size can be one of the main
limiting factors in operational forecast centers and the smallest ensemble
providing better results than 3D-Var for all species was taken. Second, the
impact of horizontal and chemical localization are examined keeping the
selected ensemble size fixed.
Results for the case study are shown in Fig. (varying
ensemble size), Fig. (varying horizontal
localization) and Fig. (varying cross-variable
localization). The minimum, maximum and mean RMSE
gain (Table ) of the analysis are considered to compare
4DEnVar and 3D-Var experiments. Hence, the plots display the differences
between the RMSE gain of 4DEnVar minus the 3D-Var one, with the sign
opportunely adjusted to display positive values when 4DEnVar beats 3D-Var,
negative values otherwise. In general, the desired case is represented by all
bars (mean, maximum and minimum gain) being positive. Nevertheless, a
positive average gain with the sporadic occurrence of negative
maximum/minimum gains can be tolerated, if the values for the minimum gain
bar do not become too negative. Since the minimum RMSE gain is already
negative (Table ), negative values for the corresponding
bars in Figs. ,
and mean that the degradation of the 4DEnVar analysis
is larger than the 3D-Var analysis, which is not desired.
Results are very similar using 64 or 128 members, for all species,
suggesting that some convergence of the assimilation scores is achieved with
more than 64 members (Fig. ). As expected, the accuracy
starts to decrease using less than 32 members, with most of the gain of
4DEnVar over 3D-Var being lost using only 8 members. In a few cases, the RMSE
gain occasionally decreases when increasing the size of the ensemble. This is
due to statistical fluctuations when the ensemble sizes are small. To avoid
misinterpretation of statistical noise, the comparison of 3D-Var and 4DEnVar
is repeated in Sect. for a larger number of DA windows.
We remark finally, that results with 16 members satisfy the requirements
expressed above: better average RMSE gain than 3D-Var for all species and a
limited number of cases of RMSE degradation. Hence, an ensemble size of 16
members is retained. We remind readers that the objective of this study is to
demonstrate the applicability of a DA algorithm that could outperform
currently implemented methods in operational centers, but with an acceptable
compromise between computational costs and precision. Therefore, even if an
ensemble size of 32 or 64 would have represented a more accurate option for
this study, we found it more valuable to assess the potential of 4DEnVar when
computational resources might be limited.
The choice of the horizontal localization scale is more delicate, because it
is intimately linked to the model dynamics and depends on the ensemble size and
on the assumptions made to construct and apply C¯
(Sect. ). Figure shows that
horizontal localization is necessary to obtain meaningful results with
4DEnVar. Second, increasing the localization scale to values as high as
3000 km, compared to the 750 km of the initial perturbation scale, has the
effect of improving the maximum RMSE gain but also degrading significantly
the minimum gain. This is not desired, as explained above. The best results,
considering employing the same homogeneous and global localization scale for
all chemical species, can be found for the value of 1500 km.
The configuration of a multivariate localization, or chemical localization in
our study, presents the same issues as the spatial one. A similar approach as
in the above paragraph has been taken. The numerical experiments described in
Sect. , which considered multivariate cases, are used
to compute error differences in Fig. . Again, we remark
that without any localization the statistical noise of the ensemble
covariance significantly degrades the results compared to 3D-Var. A
localization coefficient of 0.5 reduces efficiently the effect of noisy
correlations in the 4-D ensemble covariance, leaving the possibility of
describing multivariate effects. We remind readers that multivariate effects
were found to be very small in perfect model experiments presented until now,
but can be much larger when a model error is introduced
(Sect. ).
We conclude that a simple localization scheme, based on global and
empirically tuned parameters, provides already encouraging results for the
application of 4DEnVar to large-scale chemistry models. The development of
more sophisticated localization operators
and more rigorous methods to estimate their parameters ,
represents a current subject of research, and will be the topic of a future
study.
Model error experiments
In the previous section, we have shown how 4DEnVar is able to match or even
outperform 3D-Var results when the model is perfect. However, the main
interest of 4DEnVar for atmospheric chemistry arises when the model is not
perfect, i.e., a case that is more easily addressed using ensemble-based
methods. In this section, 4DEnVar experiments are conducted in the presence of
a model error term. A typical source of uncertainty in CTMs is represented by
anthropogenic or biogenic emissions . In the
case of reactive species like NOx, erroneous emissions can impact
strongly the formation of secondary species such as O3
. Errors in surface emissions already produce
complex and rich dynamics and will be used as a test bed to
investigate the effects of model error in DA. Other sources of uncertainties,
e.g., chemistry parameters or meteorology, will be addressed in a future
study, considering that the same methodology used here can be applied.
O3 effective model error estimation in the univariate case (only
O3 assimilated). Temporal trajectories during day 2 for the forecast, the
analysis, the truth, the true effective model error and the effective model
error estimated using Eq. (), for the pixel A located
close to an observation. From top to bottom the following experiments are
shown with the uncertainties being introduced: (i) only in the O3 initial
condition (as in Sect. ), (ii) only in the forecast
model (through surface emissions perturbation) and (iii) both in the O3
initial condition and in the forecast model.
Temporal averages (24 h) of the estimated effective model error for
the three experiments as in Fig. . From left to
right and from top to bottom, experiments (i), (ii) and (iii) are shown. On
the bottom right plot the true effective model error in the experiments (ii)
and (iii) is shown (no model error was activated for the first experiment).
Similar single-cycle DA experiments are conducted as in
Sect. . The main difference with the previous section
is that experiments are done here during the spin-up period of the model
(days 2–4). This allows us to examine the model error estimation during the
pollution build-up period, when the chemical system is in a transient phase
and daily cycles of reactive gases are not yet stationary. This represents a
more challenging and realistic situation for testing the bias correction
procedure (Eq. ), which is fully consistent only in
stationary conditions. The true NO emissions (Table ) are
perturbed by a multiplicative factor, which is sampled from a log-normal
distribution with mean and sigma equal to 0.5 and 0.8, respectively. Forecast
emissions are increased by a multiplicative factor of 2.35, whereas the
log-normal distribution has been used to generate emission perturbations for
the ensemble of forecasts. The emissions perturbation is constant in time but
not in space, due to the geographical variability of emission factors
(Fig. ).
The main objective of this section is to illustrate the application of the
bias estimation procedure (Sect. ) on chemical
fields. Chemical interactions alone can already give rise to complex temporal
dynamics, which can produce unattended behavior within typical hypotheses of
most DA schemes . Therefore, a simplified model setup is used
in this section by means of deactivating the advection of chemical species.
This reduces QG-Chem to a collection of 0-D chemistry models and allows us to
focus on the effects of model errors in chemistry. Aside from this, the same
exact configuration as before is used for 4DEnVar (initial condition error,
ensemble size, localization). DA results in a more general case, when both
chemistry and advection are activated, will follow in
Sect. . Univariate O3 DA is presented in
Sect. . Results of a multivariate DA experiment are
presented in Sect. , to examine the combined effects
of model error and chemical couplings. Finally, the impact of the model bias
correction procedure is evaluated on 48 h forecasts of several species in
Sect. .
Univariate assimilation
Three DA experiments are shown in Fig. :
(i) activating only the initial perturbation, (ii) activating only the model
error and (iii) with both perturbations activated. Only O3 observations have
been assimilated and the chemical localization coefficients are set to zero
to compute univariate analyses.
With the NO emissions increased by a factor of 2.35, the forecast produces
higher concentrations of O3 (Fig. , middle
plot). We remark also that the effects of model error in O3 appear later
in the day, due to the photochemistry, whereas the perturbation of the
initial condition has a larger effect in the first part of the day. In this
specific case, it is also interesting to note that, when both perturbation are
applied, a compensation of the two errors cancels out the differences between
the forecast and the truth later in the day. This is an example of
compensating errors in atmospheric chemistry, which might be hard to detect
when comparing model results to observations.
The 4DEnVar analyses agree well with the truth in all three cases, similarly to
what was obtained in Sect. . This satisfies the main
requirement for a meaningful computation of the effective model error
(Sect. ). The estimation of the effective model error
with Eq. () is also displayed in
Fig. . We remind readers that the effective model
error is expressed in the same physical units of the model state, i.e.,
chemical concentration units (ppbv). The true effective model error
is computed by subtracting the truth from a forecast initialized from the
truth, and it is also added to the figures.
The effective model error is zero in the first experiment, which is correctly
diagnosed by the procedure. In the second experiment, the estimated effective
model error is approximately zero until 10:00 UTC and grows to positive values
of about 8 ppbv later in the afternoon, which is coherent with the
underlying perturbations and chemical mechanism.
Multivariate effective model error estimation. Temporal trajectories
for day 2, as in Fig. but for NO2, NO and CO
species (not assimilated), obtained assimilating only O3. Uncertainties
were both in the O3 initial condition and in the forecast model (NO
emissions), corresponding to the third experiment in
Fig. . Only difference with the previous experiment
in is that a non-zero multivariate localization coefficient is used here.
The temporal average of the effective model error
(Fig. ) shows that the larger errors are
diagnosed in the center of the domain, coherently with the characteristics of
the emissions errors (global scaling of NO emissions). The geographical
patterns and the differences from the true effective error (bottom right plot
in Fig. ) are a result of the observations
location and localization scale. With the proposed procedure, the effective
model error estimation relies on the localization scale that has been used
for the 4DEnVar analysis. Therefore, the effective model error approaches
zero moving far from assimilated observations, no matter what the spatial
patterns of the true model error are. In the third experiment, the temporal
behavior of the effective model error is well captured at the observed
location. However, significant differences with the second experiment are
visible in the spatial distribution of the error. Ideally, the estimation of
the effective model error should provide exactly the same results in the
second and third experiment. Differences arise because DA is not perfect due
to small ensemble size, linearization hypotheses, observation number and
observation errors. Also, when adding degrees of freedom to the sources of
uncertainty and keeping constant the observation network, DA becomes more
challenging. For example, if observations of O3 were
available only in the late afternoon, no discrimination between the two
sources of error could have been achieved in the third experiment. However,
thanks to the hourly frequency of O3 observations, the temporal and
large-scale features of the model error have been retrieved also in the third
case.
Multivariate assimilation
The effects of O3 assimilation on other chemical species are shown in
Fig. . These were obtained by repeating the third
experiment of Sect. but setting the chemical
localization coefficient to 0.5 instead of zero. Compared to the results
discussed in Sect. , multivariate corrections are now
very significant. For example, analyzed NO and NO2 concentrations are
almost halved and the initial forecast errors greatly reduced. On the other
hand, CO, which was not perturbed initially nor is it strongly coupled to
NO / NO2 / O3 concentrations, is not modified at all by the DA. This shows
that multivariate aspects of chemical DA can be well captured by the 4DEnVar
algorithm, even with a small ensemble.
Temporal averages (24 h) of the estimated effective model error (in
ppbv) for the multivariate DA experiment in
Fig. . Estimated effective model error for NO2
on top and NO on bottom. On the right, the true effective model error is
displayed for both species.
The effective model error can be computed for all the variables of the state
vector, and is plotted in Figs. and
. The error fields of NO2 and NO reproduce
well the temporal and spatial features of the perturbation on NO emissions
that was implemented in the forecast. Note that since NO is rapidly converted
into NO2 during the night, the initial linear increase of the effective
model error is only observable on NO2. On the other hand, during daytime,
a strong presence of the effective model error is found for both NO and
NO2. However, the NO2 and NO errors trajectories are not linear during
daytime, due to the complex photochemistry.
Strong nonlinearities of the chemical system cannot be taken into account by
the current implementation of the 4DEnVar algorithm. When the NO / O3
relationship was strongly nonlinear, inaccurate analyses and, therefore,
estimations of the effective model error have been found (not shown). This
difficulty could be overcome by introducing external loops within 4DEnVar,
similarly to how it was already done with the IEnKS algorithm
. This represents the objective of a future study.
Alternatively, the combined assimilation of O3 and NO2 or a more severe
chemical localization can reduce the occurrence of analysis errors in
nonlinear regimes.
Impact on chemical forecasts
The analysis computed in the previous section has been used to initialize
chemical forecasts for the following 48 h. This was done to evaluate the
potential of the forecast bias correction procedure
(Eq. ). The corrected forecasts (CF) of O3, NO2
and NO are compared to the initial 3-day forecast (OF), the 48 h
forecast initialized from the latest available analysis without any
correction (AF) and the truth (Fig. ).
Impact of model bias correction on 48 h forecasts. Temporal
trajectory of free forecast (3 days), analysis (24 h), truth and 48 h long
forecasts initialized from the latest available analysis, for the pixel A in
Fig. . DA of O3 is performed only during the first
24 h (as in Fig. ). Curves are displayed in dark
blue when no bias correction is applied to the final forecast and are
salmon colored when the effective model error estimation in
Fig. is used to correct the final forecast
(Eq. ).
DA RMSEs for seven cycles of analyses/next-day forecasts where all
considered species are perturbed and assimilated, and with model error
enabled. ϵmin, ϵmax and
ϵavg (in %) are the minimum, maximum and average values of
the relative RMSE (Eq. divided by the truth average)
on the QG-Chem domain.
Species
Reanalysis
Forecast
3D-Var
4DEnVar
3D-Var
4DEnVar
ϵmin
ϵmax
ϵavg
ϵmin
ϵmax
ϵavg
ϵmin
ϵmax
ϵavg
ϵmin
ϵmax
ϵavg
O3
2.8
25.7
14.3
2.0
26.4
13.3
5.1
23.9
14.4
4.9
21.7
11.3
CO
1.3
42.8
17.1
1.1
27.7
13.1
6.7
90.8
37.3
7.2
50.1
22.1
NO2
7.7
88.9
41.2
4.1
69.4
23.3
23.6
117.8
74.0
11.3
90.3
37.2
CO2
3.1
15.3
8.6
1.4
17.5
8.6
1.7
13.7
5.7
1.9
13.6
6.9
First, the AF converges very rapidly (in about 12 h) to the OF for all
species, confirming the limited effects of the state correction on the
chemical forecasts for the next days . On the other hand, the
CF is very close to the truth during the first 24 h, indicating that the
hypothesis of a stationary effective model error in successive days (but
hourly variable) seems appropriate in this case. During the third day, a
positive forecast correction is still achieved. However, chemical
concentrations have evolved too much to be efficiently corrected using the
bias estimated on the first day. Estimating and correcting the forecast
tendencies, instead of the state, could provide a better result for the third
day.
Accounting for model errors, either bias or tendencies, for the next-day
forecast requires the model uncertainties to be stationary. A stationary
error has been used in this study since surface emissions were taken constant
in time. Most of regional air quality models seem largely affected by
stationary errors and assuming the persistence of the
bias during 24 h looks reasonable. However, the main sources of
uncertainties of real CTMs need first to be identified to allow a meaningful
effective model error estimation. Therefore, the implementation of 4DEnVar to
a real CTM is necessary to further demonstrate its potential for air quality
forecasts.
Statistical comparison between 4DEnVar and 3D-Var
Single-window DA experiments have been examined so far, to better analyze
new aspects of 4DEnVar DA for atmospheric chemistry. In this section, DA
experiments are conducted for multiple consecutive days, to provide a
statistically robust comparison between 3D-Var and 4DEnVar. A general case
including initial condition and model errors for all four species is
considered, and both reanalyses and 24 h forecasts for the next day are
evaluated. The same values as before (multivariate experiments in
Sect. ) are used for the initial perturbation and for
the algorithm settings. Compared to Sect. , all emissions
in Table are now perturbed, using a log-normally
distributed scaling factor.
Synthetic observations are generated from a 7-day truth simulation.
The four species are assimilated for a period of 7 days, starting on day
20. However, at the end of each cycle of 24 h (forecast, analysis and next-day
forecast), the initial condition is reinitialized with the truth
concentrations. Initial condition perturbations are recomputed, advancing the
seed of the pseudo-random generator. Therefore, the seven daily cycles are
statistically independent. This is done to increase the experiment
statistics, without having to deal with the propagation of the analysis
covariance through consecutive cycles. This aspect will be investigated in a
future study.
The relative RMSE between the experiments and the truth is computed at each
grid point for the 7-day period as in Eq. ().
Results are summarized in Table : minimum, maximum
and average values of the relative RMSEs computed over the QG-Chem domain are
reported for 3D-Var and 4DEnVar experiments.
We confirm preceding findings concerning the reanalyses capabilities of
4DEnVar, which provides superior results to 3D-Var for all species. Results
are particularly good for species strongly related to emissions (CO and
NO2), which show reduced average RMSE values compared to 3D-Var results.
This is a consequence of precisely accounting for emission uncertainties
within 4DEnVar. A similar result was obtained by . They
showed that by using an ensemble forecast of the meteorology, thus partially
accounting for model error, the root mean square error of IEnKS (a nonlinear
4DEnVar method) on the low-order online tracer model L95-T was improved by 25
to 50 %. Moreover, the 4DEnVar maximum RMSE is also similar or lower than
in 3D-Var reanalyses, which suggests that the occurrence of degraded results
compared to 3D-Var (negative bars in Fig. ) is not
systematic.
Similar conclusions can be drawn for the RMSE of the next-day forecast. We
remind readers that the bias correction procedure has been used with 4DEnVar,
whereas no correction is applied with 3D-Var. CO and NO2 forecasts, show
significantly lower RMSE with 4DEnVar, due to the forecast bias correction.
O3 forecast improvements are also observed, even if smaller than those for
the two precursors. With 4DEnVar, forecasts of CO2 are slightly worse than
with 3D-Var. Since CO2 is not affected by the considered model error and
it is not chemically coupled to other species, the relative bias correction
term should be strictly equal to zero. However, the small ensemble size and
use of localization introduce statistical noise in the effective model error
estimation, which can impact the forecast correction. This issue can be
mitigated by selectively setting to zero the chemical localization
coefficient between species that are chemically uncoupled. However, results
remained on par with 3D-Var in this study.
We can conclude that the forecast of species related to surface emissions,
either directly or through chemical couplings, can be significantly improved
when the model error is considered within DA. The forecasts of CO2, which
is a passive tracer in our study, seem instead dominated by the number of
assimilated observation, no matter which DA algorithm is employed. However,
this is not the case in real applications, where CO2 concentration is
modulated by uncertain anthropogenic emissions and natural sinks.
We finally remark that EnKF, which also takes advantage of the information
available from the ensembles, could have represented an alternative and
possibly more accurate reference than 3D-Var for this study. However, it is
significantly more costly than 3D-Var, it introduces some difficulties like
the definition of an optimal inflation procedure
and was not yet available in the OOPS
DA library at the time of the study. A more comprehensive comparison of DA
schemes of increasing complexity and cost within QG-Chem is left for a future
study, the present being focused especially on the bias correction procedure
using 4DEnVar.
Conclusions
The objectives of this study were (i) to develop a new toy model framework to
test advanced DA algorithms for atmospheric chemistry and
(ii) demonstrate the potential of the 4DEnVar method for air quality or more
general tropospheric chemistry applications. In particular, we addressed
the questions of how to jointly assimilate chemical species with very
different lifetimes and possible chemical couplings, and how to account for
model error to improve forecasts for the next day.
An atmospheric chemistry reduced-order model (QG-Chem) has been developed,
based on quasi-geostrophic meteorology and a detailed tropospheric chemistry
scheme. It has been used to simulate the complex spatiotemporal patterns of
reactive gases (NOx, O3) and long-lived species (CO, CO2) under
the effects of emissions, chemistry and transport. QG-Chem has been coupled
to a generic library for data assimilation (OOPS) and has been proven to be
well suited to perform a large number of DA experiments. Concerning temporal
aspects and assimilated observations, the experiments have been designed
based on the implementation of chemical DA in operational air quality
forecast centers.
A number of DA experiments have been conducted to compare 4DEnVar analyses
with 3D-Var analyses in a perfect model hypothesis, in a univariate and a
multivariate setting. The sensitivity of 4DEnVar results to the ensemble size
and localization method was also assessed. Results with 4DEnVar are generally
better for all chemical species even when using a small ensemble size of 16
members, provided that ensemble localization, even if basic, is applied. This
suggests that considering the linearized model dynamics to derive a flow-dependent
background error covariance can be beneficial for chemical
reanalyses. Thanks to 4DEnVar, this can be obtained without need of tangent
linear and adjoint codes of the complex CTM. Multivariate effects were found
to be not significant when a perfect model is used, suggesting that
multivariate DA goes together with model error for atmospheric chemistry
applications.
Clear advantages of using an ensemble method, which is significantly more
costly than 3DVar, have been found when model errors were introduced. It has
been shown that 4DEnVar is able to take into account heterogeneous errors in
chemical emissions and complex chemical couplings to cross correct precursor
species (e.g., NO and NO2), based only on hourly observations of secondary
species (O3).
Using DA windows that are long enough (24 h in this study), 4DEnVar allowed to
account for the different timescales of the chemical mechanism and the
corresponding effects in DA. For example, the delayed impact of NO emission
errors in afternoon O3 concentrations was correctly accounted for by 4DEnVar.
The contribution of model errors to the 4-D chemical state can be estimated
at the cost of an additional forecast, providing quantitative information on
possible forecast biases, and, in the case of stationary errors, a method to
correct next-day forecasts. This has been tested with success in several
independent DA windows, showing that 24 h forecasts of NO2 and CO can be
twice as accurate with 4DEnVar as with 3D-Var. Better results have been
also found for O3 forecasts.
We conclude that 4DEnVar is potentially of high interest for chemical DA. We
determine the main benefits to be the implicit specification of a
flow-dependent and multivariate error covariance matrix, the possibility of
accounting for model errors through stochastic perturbation and the
opportunity for using DA windows long enough to catch typical features of
model errors in air quality models. The computational cost of 4DEnVar is
higher than 3D-Var, but a small ensemble of at least 15 to 20 members remain
affordable within most operational centers.
The application to a real CTM remains necessary to evaluate if the advantages
of 4DEnVar shown in this study hold in real applications, where model
uncertainties are not perfectly known, as well as observational ones. Aspects
related to the vertical propagation of the information, which have been
neglected with QG-Chem, could also represent an additional challenge in real
systems. Finally, research on algorithmic aspects of 4DEnVar is needed to
implement more accurate localization operators, to account for nonlinear
chemical regimes and to correctly propagate the analysis covariance through
successive DA windows. QG-Chem could represent a useful tool for these type
of studies.