Construction of Non-diagonal Background Error Covariance Matrices for Global Chemical Data Assimilation

Chemical data assimilation attempts to optimally use noisy observations along with imperfect model predictions to produce a better estimate of the chemical state of the atmosphere. It is widely accepted that a key ingredient for successful data assimilation is a realistic estimation of the background error distribution. Particularly important is the specification of the background error covariance matrix, which contains information about the magnitude of the background errors and about their correlations. As models evolve toward finer resolutions, the use of diagonal background co-variance matrices is increasingly inaccurate, as they captures less of the spatial error correlations. This paper discusses an efficient computational procedure for constructing non-diagonal background error covariance matrices which account for the spatial correlations of errors. The correlation length scales are specified by the user; a correct choice of correlation lengths is important for a good performance of the data assimilation system. The benefits of using the non-diagonal covariance matrices for variational data assimilation with chemical transport models are illustrated.


Introduction
Chemical data assimilation attempts to optimally use noisy observations along with imperfect model predictions to produce a better estimate (in some optimal sense) of the chemical state of the atmosphere.This optimally estimated state better defines the spatial and temporal fields of key Correspondence to: A. Sandu (sandu@cs.vt.edu) chemical components in relation to their sources and sinks.This information is critical for improved studies of the atmospheric composition.Chemical data assimilation could also, in principle, improve estimates of emission inventories, of model boundary conditions, or of important model parameters like wet deposition velocities or photolysis rates.
The close integration of observational data is recognized as essential in weather/climate analysis and forecast activities.Consequently, considerable experience with data assimilation have been accumulated in the field of numerical weather prediction (Daley, 1991;Courtier et al., 1998;Rabier et al., 2000;Kalnay, 2002;Navon, 2009).In this work we focus on chemical data assimilation, i.e., on assimilation of observations of pollutant levels in the atmosphere.Chemical data assimilation poses specific challenges related to the multiphysics nature of the system, the stiffness of chemical kinetic equations, the sparseness of chemical observations, and the uncertainty in the levels of anthropogenic and natural pollutants emitted into the atmosphere.
Previous studies have employed various approaches to assimilating observations of trace gases for improved tropospheric chemistry representations.The base concepts of the variational approach to chemical data assimilation, and the construction of adjoint chemical transport models are discussed in detail in Sandu et al. (2005); Carmichael et al. (2008).Early work in chemical data assimilation using variational techniques has been reported in Fisher and Lary (1995); Elbern andSchmidt (1999, 2001).Since then there is a growing body of literature with applications of 4D-Var chemical data assimilation.Adjustment of gas phase chemical tracer initial conditions has been studied in Chai et al. (2007), Zhang et al. (2008) and Purser et al. (2003).Adjustment of pollutant Published by Copernicus Publications on behalf of the European Geosciences Union.
emissions through 4D-Var chemical data assimilation has been discussed in Chai et al. (2009).Data assimilation studies involving particle measurements to improve aerosol fields have been performed in Hakami et al. (2005); Henze et al. (2009).
Suboptimal Kalman filters have been employed successfully for chemical data assimilation (Menard et al., 2000;Lamarque et al., 2002;Segers et al., 2005;Clark et al., 2006;Pierce et al., 2007;Parrington et al., 2009).The use of the ensemble Kalman filter (EnKF) in chemical data assimilation has been studied in Constantinescu et al. (2007b,c,d).Data assimilation has been used to improve initial conditions, emissions, and boundary values.Besides the initial conditions, improvements in boundary values lead to improved air quality forecasts.Comparisons of the the performance of different techniques for chemical data assimilation have been performed (Lahoz et al., 2007;Wu et al., 2008).
Other studies and observations relevant for chemical data assimilation include (Palmer et al., 2003;Bowman et al., 2009;Jones et al., 2009).It is widely accepted that a key ingredient of successful data assimilation is a realistic estimation of the background error distribution.Particularly important is the specification of the background error covariance matrix, which contains information about the magnitude of the background errors and about their correlations.Background covariance matrices impact how the information from observations is spread both spatially and among the different types of analysis variables.
The construction of background covariance matrices is challenging due to poorly characterized background errors, and to the very large dimension of the state space of realistic atmospheric models.As a consequence, many chemical data assimilation studies to date have used diagonal background covariance matrices.An early covariance error modeling approach Hollingsworth and Lönnberg (1986) partitions differences between observations and the background into background errors and observation errors.A popular approach to approximate the background covariance matrix is the NMC method (Parrish and Derber, 1992), in which the differences between several forecasts verifying at the same time are used to approximate the background error.This method has been successfully applied to chemical data assimilation (Chai etal., 2006).The analysis-ensemble method (Fisher, 2003) runs the analysis system several times for the same period with randomly-perturbed observations.Differences between background fields for different runs provide a surrogate for a sample of background error.Additional methods for covariance modeling include the estimation of background error statistics from innovation statistics, digital filters, and the diffusion operator (Fisher, 2006).The statistical structure of forecast errors is used to construct error covariances in Ingleby (2001).Derber and Bouttier (1999) consider errors in the spectral space, split covariances into vertical and horizontal, and construct correlations are homogeneous, isotropic, and non-separable.An alternative approach con-structs autoregressive models of background errors based on the short-term linearized model dynamics (Constantinescu et al., 2007a).Multivariate, multidimensional background error covariance matrices that maintain the geostrophic and hydrostatic balance have been proposed in Akella and Navon (2009).
A popular ansatz is that the background error correlations decay exponentially in space.This ansatz allows the construction of simple error correlation models, and is the basis of the covariance localization technique used in ensemble Kalman filtering (Gaspari and Cohn, 1999;Ott et al., 2004;Constantinescu et al., 2007b).Experimental studies with chemical transport models support this assumption; for example, in Chai etal. (2006) it has been shown that ozone error correlations decrease follow, on average, an exponential decay curve.Sub-optimal Kalman filters based on covariance models that impose an exponential decay of correlations with distances have been used in the assimilation of chemical constituents (Khattatov et al., 1999;Pierce et al., 2007).
In the troposphere, ozone is an important greenhouse gas and a major pollutant, which adversely impacts air quality.Its distribution is highly heterogeneous, reflecting the combined influence of atmospheric transport and local chemical sources and sinks.Until recently, observations of the threedimensional structure of tropospheric ozone have been limited.The Tropospheric Emission Spectrometer (TES) satellite instrument, launched in 2004, produced the first continuous, global profile retrievals of tropospheric ozone.Similar observations are now available from other satellite instruments, such as the Infrared Atmospheric Sounding Interferometer (IASI).Assimilating these data into atmospheric models provides a powerful means to obtain an improved understanding of the processes controlling tropospheric ozone.Parrington et al. (2008) was the first to assimilate the TES ozone profile retrievals, but they did not account for horizontal correlations in the background error.
We propose here a computationally efficient approach for constructing (background) error covariances that account for spatial correlations in both horizontal and vertical directions, and assess its impact on the assimilation of tropospheric ozone profiles from TES.The construction is based but not restricted to the ansatz of exponential decay of error correlations.The correlation lengths in the latitudinal, longitudinal, and vertical directions can be specified according to the application requirements.Due to the large number of state variables an explicit representation of the full background covariance matrix is impractical.The proposed strategy constructs a multi-dimensional correlation matrix from tensor products of one-dimensional correlation matrices.This avoids the explicit construction and storage of full covariance matrices, and allows the needed linear algebra operations to be performed very efficiently.
The paper is organized as follows.Section 2 reviews variational data assimilation techniques, and Sect. 3 presents the GEOS-Chem global chemical transport model.The algorithm for constructing multidimensional covariance matrices is discussed in Sect. 4. Section 5 presents assimilation results of TES ozone profiles with the global chemical transport model GEOS-Chem, and illustrates the benefits of nondiagonal covariances in both three and four dimensional variational data assimilation settings.

Variational data assimilation
Variational methods solve the data assimilation problem in an optimal control framework (Sasaki, 1958;LeDimet and Talagrand, 1986;Courtier and Talagrand, 1987;Lions, 1971;Sandu and Zhang, 2008;Khattatov et al., 2000).Specifically, they attempt to find the control variable values (e.g., initial conditions) which minimize the discrepancy between the model forecast and observations; the minimization is constrained by the governing dynamic equations.In this discussion, for simplicity of presentation, we focus on discrete models (in time and space) where the initial conditions are the control variables.
Data assimilation combines the following three sources of information.
1.The apriori, or background state x b represents the best estimate of the true state x t available before any measurements are taken.This estimate is assumed unbiased, and the random background (estimation) errors ε b are typically assumed to have a normal probability density with a background error covariance matrix B 2. The model encapsulates our knowledge about physical and chemical laws that govern the evolution of the system.The model evolves an initial state x 0 ∈ R n at the initial time t 0 to future state values x i ∈ R n at future times t i , The size of the state space in realistic chemical transport models is very large.For example, a GEOS-Chem simulation at the 2 • × 2.5 • horizontal resolution has n ∈ O 10 8 variables.

Observations x obs
i ∈ R m of the state are taken at times t i , 1 = 1,...,N The observation operator H maps the state space onto the observation space.In many practical situations H is a highly nonlinear mapping (as is the case, e.g., with satellite observation operators).Usually the observations are sparsely distributed, and the number of observations is small compared to the dimension of the state space, m n.
The observations are corrupted by measurement and representativeness errors ε obs i .The observation errors at each time are assumed to be independent of background errors, and independent of the observation errors at other times.They are typically assumed to have a normal distribution with mean zero and covariance R i , Based on these three sources of information data assimilation computes the posterior estimate x a of the true state; x a is called the "analysis".

Three dimensional variational (3D-Var) data assimilation
In the 3D-Var data assimilation the observations (3) are considered successively at times t 1 ,...,t N .The background state (i.e., the best state estimate at time t i ) is given by the model forecast, starting from the previous analysis (i.e., best estimate at time t i−1 ): The discrepancy between the model state x i and observations at time t i , together with the departure of the state from the model forecast x b i , are measured by the 3D-Var cost function: While in principle a different background covariance matrix should be used at each time, in practice the same matrix is reused throughout the assimilation window.The 3D-Var analysis is computed as the state which minimizes (5) Typically a gradient-based numerical optimization procedure is employed to solve (6).The gradient ∇J of the cost function (5) is Note that the gradient requires computation of the linearized observation operator H about the current state.
Preconditioning is often used to improve convergence of the numerical optimization problem (6).A change of variables is performed, for example, by shifting the state and scaling it with the square root of covariance: The optimization is then carried out on the new variables x i . www

Four dimensional variational (4D-Var) data assimilation
In strongly-constrained 4D-Var data assimilation all observations (3) at all times t 1 ,...,t N are simultaneously considered.The control parameters are the initial conditions x 0 ; they uniquely determine the state of the system at all future times via the model equation ( 2).The discrepancy between model predictions and observations at all future times t 1 ,...,t N , together with the departure of the initial state from the background state, are measured by the 4D-var cost function: Note that the departure of the initial conditions from the background is weighted by the inverse background covariance matrix, B −1 , while the differences between the model predictions H(x i ) and observations x obs i are weighted by the inverse observation error covariances, R −1 i .The 4D-Var analysis is computed as the initial condition which minimizes (9) subject to the model equation constraints (2) The model (2) propagates the optimal initial condition (9) forward in time to provide the analysis at future times, x a i = M t 0 →t i x a 0 .The optimization problem (10) is solved numerically using a gradient-based technique.The gradient of (9) reads The 4D-Var gradient requires not only the linearized observation operator H , but also the transposed derivative of future states with respect to the initial conditions.The 4D-Var gradient can be obtained effectively by forcing the adjoint model with observation increments, and running it backwards in time.The construction of an adjoint model requires considerable effort, time, and know-how.

GEOS-Chem
In this paper we specifically consider GEOS-Chem (http: //geos-chem.org),a global three-dimensional chemical transport model (CTM) driven by assimilated meteorological fields from Goddard Earth Observing System(GEOS-4) at the NASA Global Modeling and Assimilation Office (GMAO).The model was first described in Bey et al. (2001).GEOS-Chem accounts in detail for emissions from both natural and anthropogenic sources, for gas phase chemistry, aerosol processes, long range transport of pollutants, troposphere-stratosphere exchanges, etc. GEOS-Chem is widely used by research groups world-wide for performing global atmospheric chemistry studies.

Construction of the background error covariance matrix
A correct characterization of the background errors is necessary for obtaining a meaningful analysis, i.e., for the success of the data assimilation procedure.Under the usual assumption, the background errors are normally distributed where their probability density is described by the background state x b and the background error covariance matrix B. In variational data assimilation both x b and B enter directly into the formulation of the cost function; errors in their specification directly impact the analysis result (Daescu, 2008).
A non-diagonal background error covariance matrix allows the information from local observations to spread out in space to contribute to corrections of state variables in neighboring locations; similarly, it allows observations of certain components of the state vector to contribute to corrections of other components.These contributions are more prominent in case of 3D-Var since state variables are corrected every subsequent observation interval.The spread of information results in a smooth analysis state, and allows different sets of observations to complement each other.In case of 4D-Var, the state variables are corrected only at the start of the assimilation window using information from all observations combined.The spreading of information is achieved through background error covariance matrix and transport of adjoint sensitivities.
Despite these advantages, most chemical data assimilation studies to date have employed diagonal background covariances.Little work has been devoted to date to modeling off-diagonal terms (Chai etal., 2006;Constantinescu et al., 2007a).This is due to a number of practical difficulties that arise in the construction of background covariance matrices.The "true" state is, fundamentally, unknown, and so are the "true" errors; surrogate states have to be used to mimic forecast errors.Ensembles can be employed to estimate error correlations; however, the number of ensemble members is necessarily very small and only low rank approximations of the covariance matrix can be obtained.Localization is often employed to remove spurious correlations and to improve the rank of the resulting matrix (Gaspari and Cohn, 1999).The large number of state variables make the construction and storage of full covariance matrices impractical.
We next discuss the proposed approach to constructing a background error covariance matrix B that accounts for both vertical and horizontal correlations without explicitly constructing the full covariance matrix.We explain the construction of the matrix in the two-dimensional case, i.e., for capturing horizontal correlations; the extensions to correlations in three dimensions and to correlations among multiple state variables are immediate.Our target application is global chemical data assimilation using GEOS-Chem.
Consider a uniform latitude-longitude grid and denote by x the longitude and by y the latitude level.A grid point (x i , y j ) has longitude coordinate x i , i = 1,...,n x , and latitude coordinate y j , i = 1,...,n y .The state vector contains the state values at all gridpoints ordered latitude-first: (x 1 ,y 1 ),..., x 1 ,y n y ,(x 2 ,y 1 ),..., x 2 ,y n y , x n x ,y 1 ,..., x n x ,y n y (12)

Directional error correlation matrices
The one-dimensional correlation between errors at two locations (x i , y k ) and (x j , y k ) situated at the same latitude y k is modeled as where x is the correlation distance in the longitude direction.For a uniform lat-lon grid the distance between x i and x j depends only on min(|i − j |,n x − |i − j |).This distance also depends on the y k ; for this reason Eq. ( 13) defines a different longitudinal correlation matrix C k x ∈ R n x ×n x for each latitude y k .Due to the periodicity along each latitude circle the point x 1 is strongly correlated with both x 2 and x n x , etc.The periodicity is captured by the distance function in (13).
Similarly, the one-dimensional correlation between errors at two locations (x k , y i ) and (x k , y j ) situated at the same longitude x k is modeled as where y is the correlation distance in the latitude direction.Equation ( 14) defines a single latitudinal correlation matrix C y ∈ R n y ×n y .For a uniform lat-lon grid this correlation matrix is the same for each longitude x k ; consequently the superscript k is dropped.To simplify the construction the correlations due to the periodicity along a meridional circle are ignored.Otherwise, error correlations across the poles would lead to correlations between errors at all longitudes; such cross-correlations are not captured by ( 14).
The cost function and gradient calculations described in Eqs. ( 5), ( 7), (9), and (11) require the inverse of the background error covariance matrix; this involves the inverses of the correlation matrices in longitudinal and latitudinal directions.The construction of the directional correlation matrices C x and C y does not guarantee that they are non-singular.To avoid a possible singularity we take a convex combination between the identity matrix and tensor product correlations as follows: and The above procedure brings a shift in the spectrum and ensures the positive definiteness of C x and C y .In all our experiments both θ x and θ y are chosen to be equal to 0.2.The longitudinal correlation matrix between all the points on the two-dimensional grid ( 12) can be constructed from the one-dimensional longitudinal correlation matrices as follows With some abuse of notation we extend the use of the Kronecker product symbol ⊗ in the above equation in order to highlight the structure of the two-dimensional longitudinal correlation matrix.
Similarly, the latitudinal correlation matrix between all the points on the two-dimensional grid (12) can be constructed from the one-dimensional latitudinal correlation matrices as follows  The structure of the one-dimensional correlation matrices is represented in Fig. 1.The longitudinal correlation C k x is represented at latitude y k = 20 • N; note that due to the periodicity along each latitude circle not only the elements near the diagonal, but also the elements in the corners of the matrix have non-zero values.The latitudinal correlation C y does not account for periodicity (along each meridian the grids 1 and n y correspond to the South and to the North pole, respectively).
Figure 2 represents contour lines of the longitudinal correlation C x for points at different latitudes.The correlation length x is short (top panel), medium (middle panel), and large (bottom panel).Note that the same correlation length x translates into a larger number of correlated grid points at higher latitudes.

Two-dimensional covariance matrices
Formally the full background error correlation matrix C ∈ R n x n y ×n x n y (which accounts for both latitudinal and longitudinal correlations) is constructed via the following relation Note that this (huge) matrix is never explicitly formed.One needs to form and store only n y one-dimensional longitudinal correlation matrices ( 13) and a single one-dimensional latitudinal correlation matrix ( 14).Note that the diagonal entries of the tensor product matrix ( 17) are all equal to one.The tensor product matrix ( 17) is not symmetric.
A symmetric version of the two-dimensional correlation matrix can be constructed as follows.Any symmetric positive definite matrix C has a matrix square root C 1/2 such that The matrix square root is not uniquely defined; in particular it can be symmetric or not depending on the decomposition method used as described in Eqs. ( 21) and ( 22).Let I n x ×n x ⊗ C y 1/2 be a square root of the longitudinal correlation matrix.The symmetric two-dimensional correlation matrix can be constructed as: Let σ i,j be the standard deviation of the error at location (x i ,y j ) and = diag 1≤i≤n x ,1≤j ≤n y σ i,j the diagonal matrix with all standard deviations at all grid points ordered according to ( 12).The two-dimensional covariance matrix is constructed from the correlation matrix ( 17) by scaling it from left and right with the diagonal matrix of standard deviations Similarly, a symmetric version of the covariance matrix can be constructed from the symmetric correlation (18) as follows: • . (20)

Efficient covariance matrix function calculations
The symmetric positive definite one-directional longitudinal correlation matrix has a matrix square root C 1/2 y .A symmetric square root, the inverse of the symmetric square root, and the matrix inverse can be obtained via the singular value decomposition (SVD) while a nonsymmetric square root can be obtained via a Cholesky decomposition By the properties of the Kronecker product we have that the square root, the inverse square root, and the inverse of the two-dimensional longitudinal correlation matrix can be constructed in terms of the same matrix functions applied to the one dimensional longitudinal correlations: Consequently, the symmetric covariance (20) can be implemented as • using either of the one-dimensional square roots.
Similarly, different powers of each one-dimensional latitudinal correlation matrix can be obtained via a singular value decomposition: By the properties of the extended Kronecker product we have that the square root, the inverse square root, and the inverse of the two-dimensional latitudinal correlation matrix can be constructed in terms of the same matrix functions applied to the one dimensional latitudinal correlations: ( We now use these relations to build functions of the covariance matrices.The inverse of the background covariance is needed in the formulation of the variational cost function.The inverse of the non-symmetric covariance ( 19) is The inverse of the symmetric covariance (20) matrix is Finally, the symmetric covariance (20) has a (nonsymmetric) matrix square root

Efficient linear algebra operations involving the covariance matrix
Matrix vector operations involving B can be performed effectively by exploiting its structure.Consider a vector of concentrations (or concentration errors) u i,j -indexed by latitude and longitude, but stored as a state vector with the convention (12).Consider the non-symmetric covariance matrix.The covariance matrix-vector product v = B • u can be computed in stages.Each stage produces a temporary result which is a two-dimensional vector.
Similarly the inverse covariance matrix-vector product v=B −1 • u can be computed as follows

Expression Computation
α= −1 • u α i,j = u i,j /σ i,j , for i = 1,...,n x , j = 1,...,n y . β= Similar procedures can be developed for the symmetric covariance time vector products.The square root (27) times vector product v = (B sym ) 1/2 u is computed as: y • α i,1:ny , for i = 1,...,n x v= • β v i,j = σ i,j γ i,j , for i = 1,...,n x , j = 1,...,n y . (31) All the above implementations are based on repeated operations involving the one-dimensional covariance matrices and their square roots.These operations are very efficient since all the linear algebra operations (matrix-vector multiplication, SVD, Choesky factorization, the solution of linear systems) are performed on small dimensional matrices (n x × n x or n y × n y ).

Numerical experiments
For numerical experiments, we employ GEOS-Chem v7-04-10 adjoint code (Singh et al., 2009b), capable of performing both 3D-Var and 4D-Var data assimilations with real data.We assimilate Tropospheric Emission Spectormeter (TES) satellite ozone profile retrievals into the model and validate the generated analyses through an independent observation dataset provided by direct ozone profile measurements from Ozonesondes.The numerical optimization method used in all experiments is the limited memory bound-constrained BFGS (Zhu et al., 1997).This quasi-Newton approach has become the "gold standard" in solving large scale chemical data assimilation problems (Sandu et al., 2005).

Experimental setting
Simulations with GEOS-Chem v7 adjoint can be carried out at 4 • × 5 • and 2 • × 2.5 • resolutions.We have used 4 • × 5 • resolution in all our experiments.There are 46 × 72 latitudelongitude grid boxes at this resolution, and 55 vertical levels; near the equator and at ground level each grid box covers an area of about 400 km × 500 km.The current GEOS-Chem model does not capture well the dynamics of the upper troposphere and of the stratosphere.Therefore, we performed data assimilation for only the first 23 model levels (for up to about 50 hPa).The model has been modified to use the linearized ozone (linoz) scheme for a better estimation of ozone exchanges at troposphere-stratosphere boundary.This scheme is going to be used in the next release of the standard GEOS-Chem model.
The 3D-Var data assimilation experiments were performed over the months of July and August 2006, starting at 00:00(GMT) on 1 July.The TES satellite data was read once every 4 simulation h; the observation operator called at model time t (h) reads in all the measurements collected within the interval t − 2 (h) to t + 2 (hours).3D-Var data assimilation treats all observations in this interval as instantaneous, and assimilates them in the same optimization run.In all our 3D-Var experiments, we performed 8 iterations per analysis since the cost function decreased significantly within the first few iterations.It is important to note that 3D-Var does not involve any model adjoint calculations; gradients require only the adjoint of the observation operator.The optimization adjusts ozone concentrations.
The 4D-Var data assimilation experiments were performed over a 5-day assimilation window starting at 00:00(GMT) on 1 August 2006 and ending at 00:00(GMT) on 6 August of the same year.The background initial conditions were generated through a free GEOS-Chem model run.There were 12 optimization iterations performed in order to improve the ozone initial condition.Each iteration during 4D-Var assimilation includes a forward model and a backward model adjoint run.TES satellite profile retrievals are read every 4 h during the model adjoint run, and the cost function and adjoint gradients accumulate the impact of all 4-h data sets throughout the assimilation window.
NOTE: The TES data can be biased by as much as 10% (Nassar et al., 2008).We removed this bias as estimated by Nassar et al. (2008) before assimilating the data.

Computational costs
As described in Sect. 1, the construction of the background error covariance matrix B impacts the result of the data assimilation.If one considers no correlation among different model grid points, or among different chemical species, B turns out to be diagonal.However, this approximation is inaccurate as the ozone errors are highly correlated spatially (Constantinescu et al., 2007a,c,d) and correlated to errors in other chemical species; this correlation is not discussed in this paper.In Sect.4, we have introduced an efficient methodology to construct a non-diagonal background error covariance matrix, B. Its inverse, B −1 , needed in 3D-Var (5) and in 4D-Var (9) cost function formulations, can be obtained either via a Cholesky decomposition or via a singular value decomposition.(Note that by the "computation of the inverse" we mean the solution of a linear system).
Table 1 illustrates the computational cost of data assimilation compared to the cost of free running model for a 24-h simulation.All the simulations are performed on a Dell Precision T5400 workstation with 2 quadcore Intel(R) Xeon(R) processors with clock speed 2.33 GHz and a RAM of 16 GB shared between the two processors.Performing 3D-Var with a non-diagonal error covariance matrix whose inverse is computed by Cholesky decomposition is only about 1.4% more expensive than the 3D-Var with a diagonal covariance matrix.The calculation of the inverse via the Cholesky decomposition is considerably more efficient than the calculation via a singular value decomposition, as expected.
3D-Var framework is built on top of GEOS-Chem v7 package which uses Sparse Matrix Vectorized GEAR (SMVGEAR) solver for chemistry.However, to construct the adjoint of chemistry required by the 4D-Var, we implemented KPP solver (Damian et al., 2002) into GEOS-Chem which not only provides a suite of high performance chemical solvers to choose from but also generates automatically the continuous and discrete adjoint codes (Daescu, 2000(Daescu, , 2003;;Sandu et al., 2003).A detailed discussion on interfacing KPP with GEOS-Chem and comparison with na- tive SMVGEAR solver for accuracy and computational performance is presented in Eller et al. (2009).As pointed out in Henze et al. (2007), the computational cost of Rosenbrock solver increases significantly with the tolerance levels; higher tolerances use smaller internal time steps requiring more computation.In our experiments, we have set RTOL = 10 −3 and ATOL = 10 −2 to achieve moderate to high accuracy.
The 4D-Var assimilation is considerably more expensive than the 3D-Var.The use of the non-diagonal B (with Cholesky decomposition) in 4D-Var causes a minimal to zero increase in the computational time when compared to the diagonal B case.

Tropospheric Emission Spectrometer (TES) observations
TES (Beer et al., 2001;TES, 2006), one of four science instruments aboard NASA's Aura satellite, measures the infrared-light energy (radiance) emitted by Earth's surface, and by the chemical tracers in the atmosphere (http://tes.jpl.nasa.gov).Vertical profiles of chemical concentrations are retrieved from the radiance measurements using an off-line inversion process.In this work we assimilate the retrieved ozone vertical profiles.Figure 3 shows the location of TES profiles for two days.A-priori information about the vertical concentration profile of the species of interest is needed to solve the retrieval inverse problem (the prior information does not come from the measurement).Let x prior be the prior vertical ozone concentration profile (in volume mixing ratio units), and let z prior = logx prior .Let z radiance (= logx true ) be the atmospheric profile as resulting directly from the radiances.
The vertical ozone profile retrieval can be expressed according to the formula Here A is the averaging kernel matrix, G is the gain matrix, and η is the spectral measurement error (assumed to have mean zero and covariance S η ).More details can be found in Bowman et al. (2002); Jones et al. (2003); Worden et al. (2004).The corresponding TES observation operator 3 is linear with respect to the logarithm of the concentrations, but nonlinear with respect to the concentration profile: where L is an interpolation operator that transforms x from the GEOS-Chem N-level vertical grid to the TES profile retrieval P -level grid.
For this reason several chemical data assimilation studies based on TES retrieved profiles (Jones et al., 2003;Bowmann et al., 2006;Parrington et al., 2009) have opted to perform the suboptimal Kalman filtering step in the logarithm of the concentrations: For variational data assimilation the forcing calculation is carried out in concentrations.For this reason, an adjoint of the observation operator needs to be derived to update the gradients as described in Eqs. ( 7) and ( 11) Here, H (x) T is a matrix and v = R −1 H(x) − x obs .The TES averaging kernel A is usually a non-symmetric matrix, and the result of A T • v is fed to the interpolation operator to construct the diagonal matrix with the i-th element being 1/(Lx) i .The term (∂L/∂x) T is the adjoint of the interpolation operator and brings entities from the TES profile retrieval domain back to the GEOS-Chem model domain.

Ozonesonde observations
For validation, we use the ozonesonde profiles measured by the INTEX Ozonesonde Network Study 2006 (IONS-6) (http://croc.gsfc.nasa.gov/intexb/ions06.html,Thompson et al., 2007a, b) for the month of August, assuming that these measurements provide values close to the true state of the atmosphere.The ozonesonde observations are not used in data assimilation, and therefore provide an independent data set against which the analysis results are validated.There are 418 ozonesondes launched from 22 stations across North America as shown in the Fig. 3.A detailed description of the number of ozonesondes launched per station with longitude and latitude information can be found in Parrington et al. (2008).

Impact of non-diagonal background error covariance in 3D-Var assimilation
In order to demonstrate the benefits of including spatial correlations in the background error covariance matrix, we first compare the tropospheric ozone concentrations generated through 3D-Var analysis against the ozonesonde observations.The left panel of Fig. 4  of Fig. 4 shows the mean relative errors in model predicted ozone concentrations (the relative differences between the forecast/analysis profiles and the ozonesonde profiles), averaged over all ozonesonde launches.The rightmost panel provides an estimate of the variability of ozonesonde against the variability of ozone concentration predicted through different assimilation techniques.
In all our experiments, correlation lengths in latitudinal direction varied in proportion with correlation lengths in longitudinal direction.A value of 500 for x implicitly indicates y is 400, and refers to correlation between two neighboring grid boxes both in East/West and North/South directions.
The results indicate that 3D-Var is sensitive to the correlation length used in the construction of the background error covariance matrix (a zero correlation length corresponds to a diagonal matrix).Note that the assimilation results using a non-diagonal B with higher correlation length are superior to those using a diagonal B in the lower and mid troposphere.Above 180 hPa, however, the errors in the assimilated ozone fields are larger for the non-diagonal case.This could be attributed to the fact that a uniform correlation length across all vertical levels is only a very coarse approximation of the real error correlations.Higher correlation lengths might be smearing off the ozone in the upper troposphere leading to an overestimate.
To further understand the effect of using non-diagonal background error covariance matrices in 3D-Var we consider the corrections obtained with different correlation lengths (i.e., the differences between the assimilated ozone fields and forecast, or the non-assimilated ozone fields).Panels (a)-(d) of Fig. 5 show the global spatial distribution plots of these differences.The assimilation with non-diagonal covariance matrices generate much smoother analyses; note that the point-wise values of the increments is smaller, and that the corrections are distributed over larger areas.Panels (e)-(f) of Fig. 5 compare directly the 3D-Var analyses obtained using a diagonal B and a non-diagonal B with a correlation length of 1000 km.The corrections in the non-diagonal case are smoother.

Impact of non-diagonal background error covariance in 4D-Var assimilation
We now study the effects of using non-diagonal background error covariance matrix in 4D-Var data assimilation.We compare the analyses ozone concentrations generated by 4D-Var with different background error correlation lengths against the ozonesonde observations.The left panel of Fig. 6 shows the forecast, analysis and ozonesonde measured ozone concentrations averaged over the two months assimilation window.The model ozone fields are interpolated to the space-time location of each ozonesonde launch for comparison.The center panel shows the relative errors of model predictions with respect to ozonesonde data, averaged over all ozonesonde launches.The right panel provides the standard deviations of these errors.
The results indicate that 4D-Var is also sensitive to the structure of the background error covariance matrix.The use of non-diagonal correlations can lead to improved analyses.The best analysis is obtained with a correlation length of 500 km (about one grid cell near the equator).Note that 4D-Var accounts for all the data available within the assimilation window.analyses obtained using a diagonal B and a non-diagonal B with a correlation length of 500 km; the large localized corrections over North America provided by the diagonal B are smoothened out when the non-diagonal B is employed.

Determining the correlation length through experiments
The correlation length is a very important parameter that impacts the quality of the assimilation when using non-diagonal error covariance matrix.The value of the correlation length depends on various factors such as the lifetime of the tracer under consideration, the grid resolution, the pressure level, and the wind velocity.We propose a method to determine experimentally a value of the correlation length that is appropriate for the model and data at hand.Recall the construction of one dimensional correlation matrices in Eqs. ( 13) and ( 14).Our aim is to determine the number of grid cells (in each direction) where the errors are correlated.For this we use adjoint sensitivity analysis.Specifically, we initialize the adjoint variable to 1 in a specific cell at the end of a given window (and to zero everywhere else), perform a backwards adjoint simulation, and analyze the adjoint fields at the beginning of the window.The error in the specific cell is correlated with errors in those grid cells where the adjoint values are above 1/e.The length of the time window depends on the time scale of the model under consideration.
Here we consider a time window of 8 h.We run the forward GEOS-Chem model starting on 1 July, 00:00 GMT for 20 h.The adjoint variable for ozone at 20:00 GMT are initialized to 1 in a subset of the grid points ((i, j ) chosen such that i mod 10 = 1 and j mod 10 = 1).Adjoint variables for all other grid points and species are initialized to zero.The gap in the initialization helps avoid the interactions between adjoint "plumes" initialized at different locations.The ozone adjoint variable field is analyzed at 12:00 GMT to find out the number of grid cells where the value is greater than or equal to 1/e.In our current setup, we use the same correlation length for all pressure levels and thus consider the spread only at ground level.
The procedure can be easily extended to considering different correlation lengths for different vertical levels and for different geographic areas.
Figure 8 shows that the ozone adjoint variables have spread (on average) over one to two grid cells in both longitudinal and latitudinal direction.For the 4 • × 5 • model resolution each grid box is of size 400 km × 500 km near the equator.The adjoint sensitivity analysis indicates that the correlation lengths should be chosen in the ranges of x ∈ [500 km, 1000 km] and y ∈ [400 km, 800 km], respectively.This confirms the best correlation lengths empirically observed in the data assimilation results reported in Figs. 4  and 6.

Conclusions
This paper presents an efficient methodology to construct non-diagonal background error covariance matrices for data assimilation.The two-or three-dimensional covariance matrices are not formed explicitly.Rather, multi-dimensional correlations are represented by tensor products of one dimensional correlation matrices along longitudinal and latitudinal directions.The technique can be easily extended to include correlations in the vertical direction as well.Highly efficient linear algebra operations are obtained by performing successive matrix-vector products, Cholesky decompositions, etc. with one-dimensional correlation matrices.The correlation lengths are important parameters that need to be www.geosci-model-dev.net/4/299/2011/Geosci.Model Dev., 4, 299-316, 2011  specified for each directional correlation.The performance of the data assimilation system depends on a correct specification of the correlation lengths.We propose an adjoint sensitivity analysis approach to guide the choice of proper correlation lengths; the approach implicitly accounts for factors such as chemical activity, grid resolution, etc.
The proposed methodology can model non-isotropic error correlations, since different correlation lengths can be used for different directions.Moreover, different correlation lengths can be used in different parts of the computational domain.The covariances can be recomputed periodically (e.g., at the beginning of each assimilation window)  after re-evaluating the correlation lengths estimates.The model does not account for correlations across different variables, e.g., for correlations among error in different chemical species.
The approach to construct non-diagonal covariance matrices has been tested using the 3D-Var and 4D-Var data assimilation frameworks developed for GEOS-Chem.The experiments assimilate observations from TES satellite ozone profile retrievals, and validate the results against an independent data set provided by IONS ozonesondes.The change of the covariance matrix formulation in data assimilation from diagonal to non-diagonal adds only a negligible computational overhead.At the same time, the inclusion of spatial correlations with appropriately chosen correlation lengths leads consistently to improved analyses in both the 3D-Var and the 4D-Var settings.
Future work will compare the proposed background covariances against other modeling methods available in the literature.We will also consider the vertical error correlations which can help spread the information from higher altitudes, where the TES sensitivity is greater, to lower altitudes, where accurate ozone predictions are of interest from an air quality perspective.

Fig. 1 .Fig. 2 .
Fig. 1.Mesh representation of the one-dimensional longitudinal and latitudinal correlation matrices.The latitude-longitude model grid resolution is 4 • × 5 • (about 400 km × 500 km near the equator) and the correlation lengths are x = 1500 km and y = 1200 km.

Fig. 4 .
Fig.4.The impact of non-diagonal background error covariances in 3D-Var data assimilation.Left panel: mean ozone concentrations at ozonesonde locations for 3D-Var analyses and free model trajectories.Center panel: relative mean errors of predicted ozone concentrations with respect to ozonesonde measurements.Right panel: standard deviation of absolute values of errors with respect to ozonesonde measurements.The data is averaged over all ozonesonde launches.These plots were generated from 2 months simulation from 00:00 GMT on 1 July to 23:00 GMT in August 2006 and compared against ozonesonde data available for the month of August.

( a )
Absolute difference between the 4D-Var analysis using (b) Absolute difference between the 4D-Var analysis using diagonal B ( x = 0 km) and the free model run.non-diagonalB ( x = 500 km) and the free model run.(c)Absolute difference between the 4D-Var analysis using (d) Absolute difference between the 4D-Var analysis using non-diagonal B ( x = 1000 km) and the free model run.non-diagonalB ( x = 1500 km) and the free model run.(e)Absolute difference between the 4D-Var analyses (f) Relative difference between the 4D-Var analyses using diagonal B and non-diagonal B ( x = 500 km).diagonalB and non-diagonal B ( x = 500 km).

Fig. 5 .Fig. 6 .
Fig. 5. Differences in global ozone concentrations at 23:00 GMT on 31 August 2006 averaged over the first 10 GEOS-Chem vertical levels.(a)-(d) Differences between the 3D-Var analysis fields and the model forecast (solution without data assimilation); the analyses use different correlation lengths between 0 km and 1.500 km.(e)-(f) Absolute and relative differences between 3D-Var analyses using diagonal and non-diagonal background covariance matrices.

( a )
Absolute difference between the 4D-Var analysis using (b) Absolute difference between the 4D-Var analysis diagonal B ( x = 0 km) and the free model run.usingnon-diagonal B ( x = 500 km) and the free model run.(c)Absolute difference between the 4D-Var analysis using (d) Absolute difference between the 4D-Var analysis non-diagonal B ( x = 1000 km) and the free model run.usingnon-diagonal B ( x = 1500 km) and the free model run.(e)Absolute difference between the 4D-Var analyses using (f) Relative difference between the 4D-Var analyses using diagonal B and non-diagonal B ( x = 500 km).diagonalB and non-diagonal B ( x = 500 km).

Fig. 7 .
Fig. 7. Differences in global ozone concentrations at 00:00 GMT on 6 August 2006 (end of assimilation window) averaged over the first 10 GEOS-Chem vertical levels.(a)-(d) Differences between the 4D-Var analysis fields and the model forecast (solution without data assimilation); the analyses use different correlation lengths between 0 km and 1500 km.(e)-(f) Absolute and relative differences between 4D-Var analyses using diagonal and non-diagonal background covariance matrices.

( a )
Adjoint ozone variables are initialized to one (b) Adjoint ozone variables spread after in selected grid cells.8 h of backward sensitivity run.

Fig. 8 .
Fig. 8. Ground level ozone adjoint variable values are initialized to one on 1 July 2006, 20:00 GMT, every tenth grid point in longitudinal and latitudinal directions.An 8-h backward adjoint integration spreads the adjoint fields, and helps identify grid cells where ozone errors are correlated.
Singh et al.:Non-diagonal background error covariance matrices for data assimilation This is built out of tensor products involving the square roots of the one-dimensional correlation matrices.The inverse of the square root matrix (27) is ⊗ I n y ×n y .(27) www.geosci-model-dev.net/4/299/2011/Geosci.Model Dev., 4, 299-316, 2011 306 K.

Table 1 .
Timing results for GEOS-Chem free model run, 3D-Var and 4D-Var data assimilations with diagonal and non-diagonal B for a 24-h simulation starting on 1 July 2006.