Balance constraints are important for background error covariance (BEC) in
data assimilation to spread information between different variables and
produce balance analysis fields. Using statistical regression, we develop a
balance constraint for the BEC of aerosol variables and apply it to a
three-dimensional variational data assimilation system in the WRF/Chem
model; 1-month forecasts from the WRF/Chem model are employed for BEC
statistics. The cross-correlations between the different species are
generally high. The largest correlation occurs between elemental carbon and
organic carbon with as large as 0.9. After using the balance constraints,
the correlations between the unbalanced variables reduce to less than 0.2. A
set of data assimilation and forecasting experiments is performed. In these
experiments, surface PM

Aerosol data assimilation in chemical transport models has received an increasing amount of attention in recent years as a basic methodology for improving aerosol analysis and forecasting. In a data assimilation system, the background error covariance (BEC) plays a crucial role in the success of an assimilation process. The BEC and the observation error determine analysis increments from the assimilation process (Derber and Bouttier, 1999; Chen et al., 2013).

However, accurate estimation of the BEC remains difficult due to a lack of
information about the true atmospheric states and also due to computational
requirement arising from the large dimension of the BEC (typically

One important role that the BEC plays in meteorological data assimilation is to spread information between different variables to produce balanced analysis fields, which employ balance constraints to convert original variables into new independent variables. Balance constraints have been employed in atmospheric and oceanic data assimilation, such as geostrophic balance or temperature–salinity balance (Bannister, 2008a, b). To incorporate balance constraints, the model variables are usually transformed to balanced and unbalanced parts. The unbalanced parts as control variables are can be assumed independent, and the balanced parts are constrained by balance constraints (Derber and Bouttier, 1999). Instead of using an empirical function as a balance constraint, balance constraints are also derived using regression techniques (Ricci and Weaver, 2005). Although distinct empirical relations between some variables (such as temperature and humidity) may not exist, the regression equation can also be estimated as balance constraints (Chen et al., 2013).

In current aerosol variational data assimilation with multiple variables,
balance constraints are not yet incorporated in the BEC. The state variables
are assumed to be independent variables without cross-correlation. However,
the aerosol species are frequently highly correlated due to their common
emission sources and diffusion processes. For example, the correlations in
terms of the

Recently, several studies have suggested that the BEC with balanced cross-correlation should be introduced into aerosol variational data assimilation (Kahnert, 2008; Liu et al., 2011; Li et al., 2013; Saide et al., 2013). Kahnert (2008) exhibited cross-correlations of the 17 aerosol variables from the Multiple-scale Atmospheric Transport and Chemistry (MATCH) model. He found that the statistical cross-correlations between aerosol components are primarily influenced by the interrelations between emissions and by interrelations due to chemical reactions to a much lesser degree. Saide et al. (2012, 2013) incorporated the capacity to add cross-correlations between aerosol size bins in GSI for assimilating observations of aerosol optical depth (AOD) data. The cross-correlations between the two connecting size bins for each species were considered using recursive filters, whereas the cross-correlation is not considered for the other size bins that are not connecting.

In this paper, we explore incorporating cross-correlations between different
species in BEC using balance constraints. The balance constraints are
established using statistical regression. We apply the BEC with the balance
constraints to a data assimilation and forecasting system with the MOSAIC
scheme in WRF/Chem. The MOSAIC scheme includes a large number of variables
with eight species, and flexibility of eight or four size bins. The scheme
of four size bins is used in our studies. The four bins are located between
0.039–0.1, 0.1–1.0, 1.0–2.5, and 2.5–10

The paper is organized as follows: Sect. 2 describes the 3DVAR system and the
formulation of the BEC. Section 3 describes the WRF/Chem configuration and
estimates the correlations among the emissions. The statistical
characteristics of the BEC, including the regression coefficient of the
cross-correlation, are discussed in Sect. 4. Using the BEC, experiments of
assimilating surface PM

In this section, we present a formulation of the BEC with cross-correlation between different species using a regression technique. Then, the cost function with the new BEC is derived and the calculating factorization of the BEC is described.

The control variables of the data assimilation are obtained from the MOSAIC
(4-bin) aerosol scheme in the WRF/Chem model (Zaveri et al., 2008). The
MOSAIC scheme includes eight aerosol species, that is, elemental carbon or
black carbon (EC

For a 3DVAR system, the cost function (

In Eq. (1) or Eq. (2),

In this study, the cross-correlations between different species are
considered by introducing control variable transforms (Derber and Bouttier,
1999; Barker et al., 2004; Huang et al., 2009). We divide the model aerosol variables into
balanced components (

These unbalanced parts can be considered to be independent because they are
residual and random.

According to Li et al. (2013), the correlation matrix of the unbalanced
parts (

In this section, we describe the configuration of WRF/Chem, whose forecasting products will be employed in the following BEC statistics and data assimilation experiments. In addition, the cross-correlations of emission species from the WRF/Chem emission data are investigated to understand the cross-correlation between different species of the BEC.

WRF/Chem (V3.5.1) is employed in our study. This is a fully coupled online
model with a regional meteorological model that is coupled to aerosol and
chemistry models (Grell et al., 2005). The model domain with three spatial
domains is shown in Fig. 1. The horizontal grid spacing for these three
domains are 36 km (80

Geographical display of the three-nested model domains. The innermost domain covers the Los Angeles basin; the black point denotes the location of Los Angeles.

The emission source is necessary for running the WRF/Chem model. It is an
important factor for the distribution of the aerosol forecasts. The analysis
of the correlations among the emission species can help us to understand the
BEC statistics. The emission species is derived from the emission file that
is produced by the NEI'05 data for each model domain. Only the emission data
for the innermost domain is used to calculate the correlation among the
emission species. The emission file contains 37 variables, including gas
species and aerosol species. An aerosol species also comprises a nuclei mode
and accumulation model species (Peckam et al., 2013). From these aerosol
emission species, five lumped aerosol species are calculated, which is
consistent with the variables in the data assimilation. These five lumped
species are E_EC (sum of the nuclei mode and the accumulation mode of
elemental carbon PM

Cross-correlations between emission species of E_EC, E_ORG, E_NO3, E_SO4 and E_PM25. The emission species data are derived from the NEI'05 emissions set for the innermost domain of the WRF/Chem model.

Figure 2 shows the cross-correlations of the five lumped aerosol emission species. All cross-correlations exceed 0.5. This result reveals that the emission species are correlated, which may be attributed to the common emission sources and diffusion processes that are controlled by the same atmospheric circulation. The most significant cross-correlation is between E_EC and E_ORG with a value of approximately 0.8. This high correlation demonstrates that the emission distributions of these two species are very similar. Their emissions are primary in urban and suburban areas with small emissions in rural areas and along roadways (not shown). As shown in Fig. 2, the lowest cross-correlation is between E_ORG and E_SO4; the latter emissions are primary in the urban and suburban areas with few emissions in rural areas and roadways (not shown).

Cross-correlations between the five variables of the BEC. These
variables are

With the configuration of the WRF/Chem model described in Sect. 3.1, forecasts for 1 month (from 00:00 UTC on 15 May to 00:00 UTC on 14 June 2010) were performed for the balance constraints and the BEC statistics. Forecast differences between 24 h forecasts and 48 h forecasts are available at 00:00 UTC; 30 forecast differences are employed as inputs in the NMC method. For this method, 30 forecast differences are sufficient; however, a longer time series may be more beneficial for the BEC statistics (Parrish and Derber, 1992).

Using the 30 forecast differences between 24 and 48 h forecasts, we can
obtain

Regression coefficients of balance operator

Figure 3 shows the cross-correlations of the five full variables and the unbalanced variables. In Fig. 3a, the cross-correlations of the full variables exceed 0.3 and most of them exceed 0.5. In Fig. 3b, however, the cross-correlations of the unbalanced variables are less than 0.2. Some of the cross-correlations are close to zero, which indicates that these unbalanced variables are approximatively independent and can be employed as control variables in the data assimilation system.

Using the original full variables and the unbalanced variables obtained by
the regression equations, the BEC statistics are obtained. Figure 4 shows the
vertical profiles of the standard deviations of the original

Vertical profiles of the standard deviation of the variables.

For the correlation matrix of

Same as Fig. 4, with the exception of the horizontal
auto-correlation curves of the variables. The horizontal thin line is the
reference line of

For the vertical correlation between

Vertical correlations of the five variables of the BEC. The left column represents the full variables, and the right column represents the unbalanced variables.

To exhibit the effect of the balance constraint of the BEC, the data
assimilation experiments and 24 h forecasts for nine cases are run using
WRF/Chem model. The surface PM

Two types of observation data are employed in our experiments. The first type
of observation data consists of hourly surface PM

The topography of the innermost domain and the locations of surface monitoring stations (black dots). The red square is the location of Los Angeles

Aircraft flight tracks during the time window of data assimilation for nine cases. The color of the track indicates the aircraft height.

The periods of flight during CalNex 2010 and the initial time of assimilation.

The initial time of data assimilation cases are designed according to the
period of flights, shown in Table 2. The time window of assimilation for
the flight data is

Figure 9 shows the horizontal increments of EC, OC, NO

Figure 10 shows the vertical increments along 35.0

Surface distributions of increments of the five variables of EC, OC,
NO

Same as Fig. 9, with the exception of the vertical sections along
35

Figure 11 shows the scatter plots of the initial model fields vs. the
surface observation for all nine cases. In Fig. 11a, the simulated
concentrations of the Control experiment display a significant
underestimation with a BIAS of

Scatter plots of observed concentrations of PM

To evaluate the effects of the data assimilation, the CORR, RMSE and BIAS during the forecast time are calculated for each case, and their averaged results are shown in Fig. 12. The CORRs of the DA-balance and DA-full experiments are very close (Fig. 12a). But, the difference increases after the 1st hour with a higher CORR in the DA-balance experiment. The CORR of the DA-balance experiment is substantially higher than that of the DA-full experiment from the 2nd hour to the 16th hour. Similar improvements for the RMSE and the BIAS of the DA-balance experiment are observed in Fig. 12b and c, respectively. The improvement for the BIAS in the DA-balance experiment is the most significant among these three statistical measures. The peak value of the improvement for the BIAS (Fig. 12c) is at the 4th hour, and the improvement is distinct until the end of forecasts. These improvements indicate that the balance constraint is positive for the subsequent forecasts, which derives from the balanced initial distribution among species.

The averaged

We examined the BEC in a 3DVAR system, which uses five control variables
(EC, OC, NO

A set of balance constraints was developed using a regression technique and incorporated in the BEC to account for the large cross-correlations. We employ the balance constraint to separate the original full variables into balanced and unbalanced parts. The regression technique is used to express the balanced parts by the unbalanced parts. These unbalanced parts can be assumed independent. Then, the unbalanced parts are employed as control variables in the BEC statics. Accordingly, the standard deviations of these unbalanced variables are less than the standard deviations of the original variables. The horizontal correlation scales of unbalanced variables are closer than that of full variables on the effect of the balance constraints, and the vertical correlations of unbalanced variables show a similar trend.

To evaluate the impact of the balance constraints on the analyses and
forecasts, three groups of experiments, including a control experiment
without data assimilation and two data assimilation experiments with and
without balance constraints (DA-full and DA-balance), were performed. In the
data assimilation experiments, the observations of surface PM

While the improvements increase after the first forecasting hour in the DA-balance experiments, compared with forecasts of the DA-full experiments, the improvements persist to the end of forecasts, and are substantial from the 2nd hour to the 16th hour (Fig. 12). These results suggested that the balance constraints can serve an important role for continually improving the skill of sequent forecasts. Note that some aircraft data are relatively few, and some flight tracks are not around Los Angeles in some cases (Fig. 8). If there are more aircraft observations, the improvements of the DA-balance experiments should be more significant and durable.

The developed method for incorporating balance constraints in aerosol data
assimilation can be employed in other areas or other applications for
different aerosol models. For the aerosol variables in different models,
some cross-correlations between different species or size bins should exist
because their common emissions and diffusion processes are controlled by the
same atmospheric circulation. Although these cross-correlations may be
stronger than the cross-correlations of atmospheric or oceanic model
variables, theoretic balance constraints, such as geostrophic balance or
temperature–salinity balance, do not exist. We expected to discover a
universal balance constraint that can describe the physical or chemical
balanced relationship of aerosol variables, and utilize it in the data
assimilation system. In addition, we expected to expand the balance
constraint to include gaseous pollutants, such as nitrite (NO

This data assimilation system is established by ourself. The code of this system can be obtained on request from the first author (zzlqxxy@163.com).

This research was supported by the National Natural Science Foundation of
China (41275128). We gratefully thank the California Air Resources Board
(