Articles | Volume 15, issue 4
Geosci. Model Dev., 15, 1769–1788, 2022
Geosci. Model Dev., 15, 1769–1788, 2022

Development and technical paper 02 Mar 2022

Development and technical paper | 02 Mar 2022

Implementation of aerosol data assimilation in WRFDA (v4.0.3) for WRF-Chem (v3.9.1) using the RACM/MADE-VBS scheme

Implementation of aerosol data assimilation in WRFDA (v4.0.3) for WRF-Chem (v3.9.1) using the RACM/MADE-VBS scheme
Soyoung Ha Soyoung Ha
  • National Center for Atmospheric Research, Boulder, CO, USA

Correspondence: Soyoung Ha (


The Weather Research and Forecasting model data assimilation (WRFDA) system, initially designed for meteorological data assimilation, is extended for aerosol data assimilation for the WRF model coupled with chemistry (WRF-Chem). An interface between WRF-Chem and WRFDA is built for the Regional Atmospheric Chemistry Mechanism (RACM) chemistry and the Modal Aerosol Dynamics Model for Europe (MADE) coupled with the Volatility Basis Set (VBS) aerosol schemes. This article describes the implementation of the new interface for assimilating PM2.5 and PM10 as well as four gas species (SO2, NO2, O3, and CO) on the ground. The effects of aerosol data assimilation are briefly examined through a month-long case study during the Korea–United States Air Quality (KORUS-AQ) period. It is demonstrated that the improved chemical initial conditions through the 3D-Var analysis can lead to consistent forecast improvements up to 26 %, reducing systematic bias errors in surface PM2.5 (PM10) concentrations to 0.0 (1.9) µg m−3 over South Korea for 24 h.

1 Introduction

Regional air quality forecasting is mainly concerned with air pollutants confined to the boundary layer (up to 1 km from the ground), which is predominantly characterized by near-surface concentrations of particulate matter (PM) with particle diameters less than 2.5 and 10 µm (e.g., PM2.5 and PM10, respectively). Many processes, such as the transport and dispersion of chemical species, that directly affect surface concentrations, strongly depend on weather conditions (Baklanov et al.2017). In particular, frequent haze events with high PM concentrations over Korea are often associated with long-range transport of pollutants, so the changes in local emissions in the upstream areas can affect the chemical composition and the PM concentrations in the region to a great extent (Jo et al.2020). Given the considerable effect of meteorological simulations on the chemical processes and the large uncertainty in the chemical transport models, chemical (or aerosol) data assimilation in the numerical weather prediction (NWP) system coupled with chemistry can make encouraging contributions to short-range air quality forecasting, with better representation of the atmospheric composition at the initial time.

The Weather Research and Forecasting model's community data assimilation system (WRFDA; Barker et al.2012) developed by the National Center for Atmospheric Research (NCAR) was initially designed for meteorological data assimilation to initialize the WRF model (Skamarock et al.2008) using a variational or a hybrid data assimilation technique. Since the NWP model was coupled with chemistry and aerosol dynamics as an integrated forecasting system (WRF-Chem; Grell et al.2005), it has been widely used for regional air quality forecasting, representing two-way real-time interactions between meteorology and chemistry (Grell and Baklanov2011), but WRFDA has remained for use in weather data assimilation until very recently.

Unlike meteorological data assimilation, where most prognostic variables (except hydrometeors) remain the same regardless of the physics schemes in use, aerosol data assimilation is tied to the chemistry parameterization employed in the chemical transport model. That is because each chemical option (for both gas and aerosol chemistry) defines an entirely different set of chemical and aerosol prognostic variables. It implies that PM concentrations consist of different aerosol species, depending on the chemical option used in the WRF-Chem model. Therefore, to assimilate PM measurements, a new interface has to be developed in WRFDA for the particular chemistry option (chem_opt), containing new observation operators (that compute the model correspondents from the specific aerosol species), their tangent linear and adjoint models as well as the background error covariance estimation. In other words, even if the WRF-Chem model supports numerous chemical parameterization options, they cannot be used interchangeably in the WRFDA system because the analysis variables are tied to the aerosol species defined in the chemistry scheme. In practice, users should use the same chemical option between the forecast model and the analysis system, meaning that the particular chemical option should be implemented in the assimilation system in advance. That is why chemical or aerosol data assimilation studies have used a minimal set of chemical options so far and why it is challenging to use advanced chemistry in aerosol data assimilation studies within the variational analysis framework.

Other challenges for chemical data assimilation are large model uncertainties due to the complexity of chemical processes, significant uncertainties in forcing parameters such as emissions, highly nonlinear and non-Gaussian error distribution of chemical species, and expensive computations ascribed to a long list of chemical species – typically with dozens or hundreds of prognostic variables. The latter makes the three-dimensional variational data assimilation (3D-Var) algorithm still attractive, especially in the operational environment, even with its limitations such as static background error covariance and the use of the linearized forecast model during the minimization procedure.

Because of the simplicity and the effective cost, the bulk Goddard Chemistry Aerosol Radiation and Transport (GOCART; Chin et al.2002) aerosol scheme has long been used for aerosol data assimilation (Liu et al.2011, Saide et al.2014, and Ha et al.2020, just to name a few), although it is well known to underestimate ground PM concentrations due to the lack of secondary organic aerosol (SOA) formulation (McKeen et al.2009, Hallquist et al.2009).

To use a more sophisticated chemistry option in aerosol data assimilation, Sun et al. (2020) recently implemented a new interface between WRF-Chem and WRFDA for the Carbon Bond Mechanism version Z (CBMZ; Zaveri and Peters1999) gas chemistry and Model for Simulating Aerosol Interactions and Chemistry (MOSAIC; Zaveri et al.2008) aerosol schemes. They assimilated surface measurements using the 3D-Var technique and demonstrated systematic improvements of air quality forecasting over China up to 24 h.

This study extends the WRF-Chem/WRFDA 3D-Var system for the Regional Atmospheric Chemistry Mechanism (RACM; Stockwell et al.1997) gas-phase chemistry, coupled with the Modal Aerosol Dynamics Model for Europe (MADE; Ackermann et al.1998) and the secondary organic aerosol (SOA) scheme based on a four-bin Volatility Basis Set (VBS) (Ahmadov et al.2012) in version 3.9.1 of the WRF-Chem model. This chemical option is expected to provide a more realistic representation of organic carbon, nitrate, and SOA that often resulted in the PM2.5 underestimation over the East Asian region (Lee et al.2020; Jo et al.2020).

The main goal of the new system development is to facilitate aerosol data assimilation using the RACM/MADE-VBS scheme (chem_opt = 108) in the WRF-Chem/WRFDA system so that the initial conditions can lead to better air quality forecasting, especially in surface PM2.5 concentrations over Korea. This study introduces the new system and examines how the assimilation of surface observations can affect air quality forecasts through month-long cycling experiments for May 2016, the Korea–United States Air Quality (KORUS-AQ) campaign period. During early summertime, air quality was measured in various platforms over the Korean Peninsula and its surroundings, and long-range transport of air pollutants resulted in haze development over Korea for 25–31 May 2016. For the details of the case or the field campaign, readers are referred to several other papers (Ha et al.2020, Peterson et al.2019, and Miyazaki et al.2019).

An overview of the new WRF-Chem/WRFDA system, including new forward operators and background error statistics, is presented in Sect. 2, followed by cycling experiments and the forecast verification against independent observations described in Sect. 3. Finally, conclusions are made in Sect. 4, followed by a discussion on the limitations of this study and suggestions for future research.

2 The WRF-Chem analysis and forecasting system

The WRF-Chem model has many options for gas and aerosol chemistry parameterizations that are fully coupled with meteorology, facilitating aerosol direct and indirect effects through interactions with radiation, photolysis, and microphysical processes in real time (Fast et al.2006). Within the WRF infrastructure, it is coupled with the WRF preprocessing system (WPS) and data assimilation (WRFDA), so it can fully support the analysis and forecasting with the coupled evolution of weather and chemistry. As the processes like advection and diffusion are applied for both chemical and meteorological variables, chemical species are transported based on the model time step (which depends on the grid resolution).

Data assimilation pulls model trajectories toward the observed information at the initial time, but when the model integration starts from the initial state, the forecast model tries to restore its own climatology based on the assumptions and parameters defined in the system. Hence, to prevent the model state from drifting away from the observed state, the analysis (or data assimilation) should be conducted repeatedly, incorporating various observations into the model and updating initial conditions at certain time intervals (e.g., every 6 h). By conducting the analysis and the forecast consecutively (so-called “cycling”), initial conditions and subsequent forecasts can be systematically improved in the long term. For such a unified, synthetic cycling system, the analysis and the forecast systems communicate through an interface that converts between observed variables, analysis (or control) variables, and model prognostic variables.

By default, chemical boundaries are reset based on the idealized profiles specified in the chemistry routines in WRF-Chem. But if the chemical lateral boundary option (e.g., “have_bcs_chem”) is turned on to provide more realistic inflows of chemicals, wrfbdy files also need to be updated for chemical species, typically using the output from global chemical forecasts.

Unlike the cycling for weather forecasting, chemical simulations typically recycle chemical variables from the previous forecasts, which are provided through an auxiliary input stream (e.g., wrf_chem_input) for the initialization (with real.exe). At this time, WRF-Chem also needs several additional input files for various emissions data because the chemical transport model is strongly driven by the forcing parameters throughout the model integration and heavily relies on the quality of the emissions data produced for the region. Once the initial conditions (e.g., wrfinput files) are produced by the initialization, they are used as background (e.g., a priori) for the analysis update at the cycle. (If data assimilation is not carried out, the initial conditions are not updated any further and directly used to initialize the model simulation with the chemistry fields predicted from the previous cycle.) In the presented work, chemical observations are assimilated along with in situ meteorological measurements. But even if weather data are not assimilated, meteorological fields are updated through initial and boundary conditions and the online interactions between aerosol and radiation during the forecast step. Once the assimilation is done, wrfbdy files are updated in the mother domain before the model prediction starts in order to become consistent with the analysis fields in the relaxation zone. But such boundary updates are not applied to the chemical fields because chemical or aerosol observations within five grids (x) from the boundary cells are not assimilated (e.g., no analysis updates near the lateral boundaries).

2.1 The WRF-Chem configuration

The model simulations cover the East Asian region and the Korean Peninsula with 27 and 9 km grid resolution, respectively, in a one-way nesting mode, as shown in Fig. 1. Vertically, 31 model levels are configured up to 50 hPa, with the lowest level located around 173 m in domain 2. Such a coarse vertical resolution may not resolve the observed spatial and temporal variability of atmospheric aerosols, but the configuration is adopted from the current operational setting in the National Institute of Environment Research (NIER) in South Korea.

Figure 1The surface observation network in two model domains. A black box indicates domain 2 (D2) over South Korea, nested down from domain 1 (D1). Colored dots indicate surface PM2.5 observations assimilated on 26 May 2016 at 00:00:00 UTC.

Static geographical fields such as land use, vegetation fraction, albedo, soil temperature, and soil moisture are obtained from the 20-class, 30 arcsec MODIS data through the geogrid program of the WRF preprocessing system (WPS). The initial and lateral boundary conditions for meteorological variables are produced by global forecasts from the UK Met Office's Unified Model (UM) operated by the Korean Meteorological Administration (KMA) every 6 h. For meteorological data assimilation, conventional observations in the National Centers for Environmental Prediction (NCEP) prepbufr data (; last access: 4 March 2021) are employed.

This study focuses on the RACM gas chemistry and the MADE-VBS aerosol parameterization (e.g., chem_opt = 108) in WRF version 3.9.1. The MADE-VBS aerosol scheme defines a superposition of three log-normal modes – Aitken, accumulation, and coarse modes – based on the particle size distribution: an Aitken mode with a median diameter of 0.01 µm, an accumulation mode ranging between 0.01 and 1 µm, and a coarse mode for particles typically larger than 1 µm (with a median around 10 µm). All aerosol particles are assumed to be spherical and internally mixed (Aquila et al.2011). The aerosol species treated are sulfate (SO4=), nitrate (NO3+), ammonium (NH4+), elemental carbon (EC), primary organic matter (POA), anthropogenic and biogenic secondary organic aerosol (SOA), chloride (Cl), sodium (Na), unspeciated PM2.5, unspeciated coarse fraction of PM10 (antha), soil dust, and sea salt. The unspeciated PM2.5 includes the fine fraction of sea salt and mineral dust aerosols.

(Stockwell et al.1997)(Ackermann et al.1998)(Madronich1987)(Chen and Sun2002)(Grell and Dévényi2002)(Iacono et al.2008)(Iacono et al.2008)(Hong et al.2006)(Jimenez et al.2012)(Chen and Dudhia2001)

Table 1Summary of WRF-Chem physics configuration.

Download Print Version | Download XLSX

The dust and sea salt emissions are simulated following the GOCART mechanism (e.g., dust_opt = 13 and seas_opt = 2). Photolysis rates of chemical species are computed in a simplified version of the National Center for Atmospheric Research (NCAR) tropospheric ultraviolet–visible (TUV) model (named the Madronich scheme) (Madronich1987), and the Rapid Radiative Transfer Model for GCMs (RRTMG) is used for both shortwave (ra_sw_physics = 4) and longwave (ra_lw_physics = 4) radiation (Iacono et al.2008). The direct aerosol effect is accounted for through interactions with atmospheric radiation and photolysis. A list of physics and chemistry schemes used in this study is summarized in Table 1.

Anthropogenic emission data are obtained from the KORUS version 2 inventory, originally developed based on the Comprehensive Regional Emissions for Atmospheric Transport Experiment (CREATE-2015) emissions dataset and updated for the KORUS-AQ campaign (Woo et al.2012; Choi et al.2019). They were all emitted at the surface, i.e., without any plume rise or specified vertical distribution. Note that anthropogenic emission data should be produced for the chemistry variables defined in the chemical option. Biogenic emissions are built up online using the Model of Emission of Gases and Aerosol from Nature (MEGAN; Version 2) (Guenther et al.2006), but biomass burning emissions are not used in this study. All the WRF files including anthropogenic and biogenic emissions are processed based on the MODIS land-use datasets (Friedl et al.2002).

For chemical lateral boundary conditions, 6-hourly global outputs from the Community Atmosphere Model with Chemistry (CAM-Chem) model, a component of the Community Earth System Model (CESM) version 2.1, were used (Buchholz et al.2019). These simulations were configured at 0.9× 1.25 horizontal resolution and 56 vertical levels up to 1.9 hPa using an updated tropospheric chemistry mechanism (MOZART-T1; Emmons et al.2020), the Modal Aerosol Model with four modes (MAM4; Liu et al.2016), the anthropogenic and biomass burning emissions from the inventories specified for Climate Model Intercomparison Project 6 (CMIP6), and meteorological fields specified from Modern-Era Retrospective analysis for Research and Applications (MERRA)-2 reanalysis (Molod et al.2015). To make chemical boundary conditions for domain 1, chemical species in CAM-Chem are converted to the RACM gas species in WRF-Chem through the “mozbc” utility (, last access: 28 December 2020).

2.2 WRFDA for WRF-Chem

A new interface for the RACM/MADE-VBS scheme is developed based on version 4.0.3 of the WRFDA system to assimilate PM2.5, PM10, SO2, NO2, O3, and CO measurements on the ground. The variational data assimilation system seeks an analysis solution as the best estimate of the true state by minimizing deviations of model variables (x) from the corresponding observations (y) based on the error statistics of background forecasts and observations. The variational scheme assumes Gaussian and unbiased error distributions, which can be characterized by covariances alone; its solution is thus found a least-squares best fit using the covariances. In practice, when the cost function J(x) is reached to a minimum through an iterative minimization process, the resulting state vector x becomes the analysis solution (Lorenc1986).

(1) J ( x ) = J b + J o = 1 2 ( x - x b ) T B - 1 ( x - x b ) + 1 2 ( y - H ( x ) ) T R - 1 ( y - H ( x ) ) ,

where xb is the background state vector, which is usually obtained from a deterministic forecast from the previous assimilation cycle, and B and R represent background and observation error covariance matrices, respectively. An observation operator H(x) transforms the model states (x) to the observed quantities (y) at observation locations and can be nonlinear.

The WRFDA employs an incremental formulation (Courtier et al.1994) where the model state vector (x) is replaced by increments δx(=x-xb) and the minimization algorithm is constructed as a pair of nested loops. The full, nonlinear model is used at each iteration of the outer loop, while its linearized version – tangent linear and adjoint models – is used in the inner loop to adjust the model trajectory and minimize J iteratively. This approach can keep analysis imbalance to a minimum, making the minimization procedure more efficient. The final analysis xa (=xb+δx) is then used as the initial condition for the model simulation.

In most NWP models, the state vector (x) that contains all the prognostic variables lies in the huge dimensional state space (with typical degrees of freedom greater than O(107)), which makes the computation of the inverse matrix (B−1) prohibitive. As a practical way of solving Jb, a control vector (v) that consists of analysis variables is defined as δx=B1/2v. While forecast errors of model variables are typically correlated through governing equations, control variables are designed to have no cross correlations such that the error matrix is diagonalized. With the control variable transformation, the cost function is rewritten as below.

(2) J ( v ) = 1 2 v T v + 1 2 ( d - HB 1 / 2 v ) T R - 1 ( d - HB 1 / 2 v ) ,

where the innovation vector is defined as d=y-H(xb) and H is a linearized version of H. The square root of the B matrix (B=B1/2(B1/2)T) is decomposed into a series of submatrices for the control variable transform so that the cost function can avoid the inverse calculation of the large B matrix.

(3) B 1 / 2 = U p SU v U h ,

where the Up matrix is called physical or balance transformation (via linear regression), S a diagonal matrix of forecast error standard deviation, Uv the vertical transform, and Uh the horizontal transform matrix.

In weather data assimilation, the control variable transformation has been broadly practiced because meteorological variables follow physical balance equations (such as geostrophic or hydrostatic equations) at large scales (Bannister2008). But it is not straightforward to define multivariate correlations between chemical species or between chemical and meteorological variables due to their complex interactions and chemical reactions that are highly nonlinear and often transient. Therefore, chemical or aerosol species in the model states (x) are directly used as control variables (v) with univariate error covariances in chemical data assimilation.

To compute the background error covariance matrix (B) for atmospheric constituents defined in the RACM/MADE-VBS scheme, the GEN_BE v2.0 (Descombes et al.2015) software is expanded for 39 three-dimensional chemical variables: aerosol sulfate (so4ai and so4aj), nitrate (no3ai and no3aj), ammonium (nh4ai and nh4aj), chloride (clai and claj), primary organic matter (orgpai and orgpaj), elemental carbon (eci and ecj), sodium (naai and naaj), unspeciated PM2.5 (p25ai and p25aj), four-bin anthropogenic and biogenic SOA (asoa1i, asoa1j, asoa2i, asoa2j, …, bsoa4i, bsoa4j) with i and j at the end of each variable name indicating Aitken and accumulation modes, respectively. Also included are three coarse-mode variables – non-reactive anthropogenic aerosol (antha), marine aerosol concentration (seas), and soil-derived aerosol particles such as dust (soila) – and four gas species (SO2, NO2, O3, and CO).

The WRFDA provides various options for estimating the background error covariance through “cv_option” in the namelist. Here, cv_option = 7 is chosen for no balance transformation in the regional simulations, meaning that the chemical and aerosol species are control variables as full fields and no cross correlations are considered between the variables such that Up becomes an identity operator. The horizontal transform matrix Uh is performed using recursive filters (Purser et al.2003), while the vertical transform Uv is carried out via an empirical orthogonal function (EOF) decomposition of the vertical component of the background error covariance.

In the 3D-Var algorithm, the estimation of background error covariance is critical, especially in data-sparse regions. As most surface stations are concentrated in urban areas, the structure of background error covariance determines how to spread out the observed information horizontally and vertically. In aerosol data assimilation, it is of particular importance as the atmospheric constituents are adjusted according to their background errors (e.g., B1/2 in δx=B1/2v).

In this study, chemical simulations are carried out in the WRF-Chem model, starting at 00:00 UTC every day for one month in May 2016, to compute background error covariance statistics for chemical and aerosol species defined in the RACM/MADE-VBS parameterization. Differences between 24 and 48 h forecasts at the same validation time are then used as a proxy for forecast errors in each domain, and a total of 29 sample forecasts for 3–31 May 2016 were used to construct the B matrix using the National Meteorological Center (NMC) method (Parrish and Derber1992), assuming the same model bias and uncorrelated model errors. There are five sequential stages (e.g., stage0–stage4) implemented with different options in the GENBE software. In this study, all the grid points are binned together for each model level, with no latitudinal or longitudinal dependencies in the background error covariance.

Figure 2Vertical profile of background error standard deviations for aerosol species in (a) accumulation, (b) Aitken, and (c) coarse modes, and (d) gas species over domain 2.


Figure 2 displays the vertical profile of the background error standard deviation of each species over domain 2. During the analysis procedure, the error is used to weigh the analysis increment for a given variable, affecting how much the observed information will change the model variable. Depending on the aerosol size distribution, Aitken-, accumulation-, and coarse-mode variables are compared separately. Most aerosol species in the accumulation mode have relatively large background error standard deviations in the boundary layer, and their counterparts in the Aitken mode show 1 order magnitude smaller values, mostly with the maximum at the surface. Among the species, large background errors are found in sulfate, nitrate, ammonium, and unspecified PM2.5, contributing most to PM2.5 concentrations. In the coarse mode, sea salt linearly reduces with height, indicating large contributions (to PM10) at the sea level, but soila is characterized by the high peak in the mid-troposphere, which might be associated with the long-range transport of dust aerosols. In the gas species, ozone (O3) represents large errors near the top (e.g., low stratosphere), while carbon monoxide (CO) is concentrated in the low troposphere. Due to the trivial values, the vertical error structures of SO2 and NO2 are hard to see, but their standard deviations are also relatively large in the boundary layer.

Figure 3Vertical autocorrelations in four major aerosol species in accumulation and Aitken modes (in the first and second rows, respectively), three coarse-mode aerosols (in the third row), and four gas species (in the bottom row) over domain 2, contouring from 0.1 to 0.9 every 0.1 in different colors.


The vertical spread of the observed information at the surface is determined by vertical error correlations, closely associated with the simulated boundary layer height. As the static background error covariance cannot simulate the diurnal variability of the boundary layer, this becomes one of the main limitations of the 3D-Var analysis for air quality applications. Figure 3 depicts the normalized vertical autocorrelations derived from the time-lagged forecasts for four major aerosol species in accumulation and Aitken modes, three coarse-mode aerosols, and four trace gases (from top to bottom panels). Generally, correlation contours tend to spread more in the lower levels, implying that the analysis updates in the lowest level can affect the entire boundary layer. The accumulation- and coarse-mode particles have a wider vertical spread than the Aitken-mode particles with more localized effects. The round pattern around level 22 in most species could be related to the advection with strong upper-level jets. While all the trace gases have relatively large correlations near the surface, ozone and nitrogen dioxide show the largest correlations near the tropopause and stratosphere, respectively.

Figure 4The horizontal length scales of the same species as in Fig. 3.


To examine the horizontal propagation of the increments from point observations, the horizontal correlation length scales of the same species are illustrated in Fig. 4. In accumulation and coarse modes (in the top and the third rows, respectively), the overall vertical structure is similar, with the linear increase down to the surface. The length scale at the surface is specified around 36 km for so4aj, for example, corresponding to four grids in the 9 km domain, meaning that an observation at a point location can affect four surrounding grid points radially. On the other hand, Aitken-mode aerosols have short length scales near the surface, which tend to increase in the upper levels, but their maximum values are smaller than those of their counterparts in the accumulation mode, representing more localized effects horizontally. Trace gases show different vertical distributions with the maximum near the top, except for ozone.

When the RACM/MADE-VBS option (e.g., chem_opt = 108) is chosen, the model equivalent of the observed PM2.5 (XPM2.5) is computed as a total sum of three-dimensional mass mixing ratios of 32 aerosol species in accumulation (j) and Aitken (i) modes predicted in the WRF-Chem model, as below.

(4) X PM 2.5 = ρ d p = 1 N m = i j x m p ,

where ρd is dry density ([kg m−3]) for the unit conversion from aerosol mixing ratios (µg kg−1) to mass concentrations (µg m−3), and N= 16. To be consistent with the way the MADE-VBS aerosol scheme estimates PM2.5 concentrations in the model, the observation operator in WRFDA uses the same equation as in Eq. (4), having individual species in different modes contributing to the PM concentrations equally. If the observed atmospheric composition significantly differs from the one in the model, or particular species change predominantly at certain times, this approach can lead to erroneous results (in both the analysis and forecast).

When PM10 is assimilated alone, the model correspondent is computed by adding three coarse-mode variables – anthropogenic primary aerosol (antha), marine aerosol concentration (seas), and soil-derived aerosol particles such as dust (soila) – into the simulated PM2.5. But in the concurrent assimilation of PM10 and PM2.5, the residuals from PM10PM2.5 are assimilated as a sum of three coarse-mode aerosols, following Peng et al. (2018) and Sun et al. (2020).

Unlike the aerosol analysis that has to update dozens of aerosol species (e.g., unobserved variables) from PM concentrations, the assimilation of trace gases is straightforward because each gas species is the model prognostic variable in most chemical options. Thus, the control variables are simply expanded for four gas species (SO2, NO2, O3, and CO), and the observation operator becomes a simple horizontal interpolation (e.g., bilinear interpolation) of the corresponding variable at the lowest model level.

Table 2A list of new namelist parameters in WRFDA.

Download Print Version | Download XLSX

2.3 Observation processing and measurement errors

In this study, hourly surface observations of six major pollutants (PM2.5, PM10, SO2, NO2, O3, and CO) are used from 379 Korean sites operated by the NIER (, last access: 21 December 2020) and around 1600 sites from the China National Environmental Monitoring Center (CNEMC;, last access: 21 December 2020) during the KORUS-AQ period. All the gas species measured in Korean stations have the same ppmv unit as in WRF-Chem, but all the Chinese sites report the data in µg m−3, requiring a unit conversion as part of observation processing. Using the molecular weight of each gas species (wgas) and the molar volume of a gas (Vm= 22.4 L mol−1) at 1 standard temperature and pressure (STP), the unit of the Chinese data is converted as [ppmv] = [µg m−3] ×Vm/wgas/1000.

As the surface stations are mostly concentrated in the large cities, the hourly data that belong to the same 9 km model grid are randomly split for assimilation and verification; each dataset is then averaged over each grid. As a result, 279 Korean sites are averaged into 219 stations (or grids) for assimilation, while the other 100 sites are averaged to 71 independent observations for evaluation over South Korea. The Chinese data have a lot of missing values, especially for the period of heavy pollution events (24–26 May 2016), and because the verification is made over Korean sites only, they are used without such processing.

Data quality control (QC) is done by setting maximum thresholds of observation values and innovations ((o  f)'s​​​​​​​) during the assimilation procedure. Surface PM2.5 and PM10 observations are rejected when they are greater than 300, 500 µg m−3, respectively, or are different from their model equivalent (e.g., H(xb)) by more than 100 µg m−3. Gas species are also checked with the maximum threshold of 2, 2, 2, and 20 ppmv for the observed SO2, NO2, O3, and CO, respectively. They are also rejected based on the threshold of 0.2 ppmv for the innovations.

Gas-phase pollutants on the ground are assimilated together, as opposed to individual species, using the corresponding model variables as their analysis (or control) variables. Before assimilation, observations for all the gas species are processed to have the same ppmv unit as the model variables, as needed.

For the new assimilation capability, several new parameters are added to namelist.input in WRF-Chem, as summarized in Table 2. To demonstrate the capability of all the new observation operators (that are independent of each other), this study only presents the simultaneous assimilation of all six pollutants using chemicda_opt = 4, as listed in Table 2.

In this 3D-Var analysis, observation errors are assumed uncorrelated such that the observation error covariance matrix R in Eq. (1) becomes diagonal with the observation error standard deviations as diagonal elements. For the gas species, the observation error is simply assigned as 10 % of the observed value regardless of the location. For surface PM concentrations, the observation error is estimated as a sum of the measurement error (ϵo) and the representative error (ϵr) as ϵx=ϵo2+ϵr2, following Elbern et al. (2007). The measurement error increases linearly with the observed value (xo) as ϵo=1.5+0.0075xo, while the representative error is formulated as ϵr=γϵoΔxL, where γ is set to be 0.5, Δx is grid spacing (here, 27 km for domain 1 and 9 km for domain 2), and the scaling factor L is defined as 3 km, as in Ha et al. (2020).

3 Results from cycling experiments

A month-long cycling experiment is conducted, assimilating all the surface observations for six pollutants (PM2.5, PM10, SO2, NO2, O3, and CO) (e.g., chemicda_opt = 4) in both domains every 6 h. The baseline experiment (“NODA”) is first conducted, recycling 6 h forecasts from the previous cycle. The background error statistics are computed from the extended forecasts (e.g., up to 48 h) in NODA. Then, the DA experiment assimilates all the observations to update the analysis every 6 h based on the background error covariance, using the same input data and the same lateral boundary conditions for both meteorological and chemical fields as in NODA.

The UM global forecasts are initialized from the UM analysis at 18:00 UTC every day so that the UM global analysis is used for 18:00 UTC cycles, while the following 6–18 h UM forecasts are employed at 00:00–12:00 UTC cycles the next day, respectively. The UM simulations run by KMA define surface fields at 1.5 m and the soil moisture content at 0–0.1, 0.1–0.35, 0.35–1.0, and 1–3 m soil layers in units of [kgm-2TS-1], where TS indicates the thickness of each soil layer. To ungrib the data correctly, the Vtable and the ungrib source codes in WPS are modified accordingly.

Figure 5Time series of hourly PM2.5 concentration on the ground. The baseline experiment (“NODA”) is plotted in black, while the DA experiment with the analysis every 6 h in red. Observations averaged over all the evaluated stations in South Korea are marked as blue dots, enclosed with the shaded area in light blue for the standard deviation in observations.


Figure 6Time series of observation minus background (omb; dotted gray line) and observation minus analysis (oma; solid black line) in surface PM2.5 (a) and PM10 (b) in the “DA” experiment, in terms of root-mean-square error over all the stations assimilated in domain 1 at each cycle. The averages over cycles are shown in the legend, and the total number of stations at each cycle is marked in blue dots with the y axis on the right.


Figure 5 depicts the time series of hourly surface PM2.5 concentrations, averaged over 71 evaluation sites in South Korea for the last week of May 2016 with heavy pollution events. Observations are marked in blue dots, and the hourly forecasts in DA and NODA are drawn in solid red and black lines, respectively. The NODA experiment concatenates 0–6 h forecasts every cycle, while DA presents the analysis for every 6 and 1–5 h forecasts for other times. The 3D-Var analysis and the subsequent forecasts in DA follow the observations closely, but without data assimilation, 0–6 h forecasts in NODA largely deviate from the measurements, even beyond the observation uncertainty across stations (shaded in light blue).

Figure 7Same as Fig. 6 but for gas species.


Figures 6 and 7 illustrate observation minus background (omb; dotted gray line) and observation minus analysis (oma; solid black line) in DA, as a time series of surface PM concentrations and gas species, respectively, for the entire month. The total number of observations (blue dots in Fig. 6) varies from cycle to cycle, but the time series of omb and oma indicates that the analysis system gets spun up quickly (with the steady trend of oma) and runs reliably throughout the month-long period with the analyses closer to observations than background forecasts for all six pollutants. The number of observations in trace gases is omitted in Fig. 7 because it is very similar to that of PM observations, and the omb in gas species is greatly fluctuating with cycles. Such large oscillations of omb and large differences between analysis and background are often attributable to the considerable errors in the forecast model and/or forcing parameters, which prompt the model state to return to its own equilibrium quickly (e.g., within 6 h). For the rest of the figures, the evaluation is made only for 7–31 May 2016, discarding the first week of cycling as a spin-up period.

Figure 8(a) Pie chart showing the percentage contribution by aerosol species in Seoul, South Korea, in the analysis averaged over 97 cycles from 00:00 UTC on 7 May to 00:00 UTC on 31 May 2016 and (b) deviations from the mean analysis in the analysis at 00:00 UTC on 26 May 2016 over domain 2 in the DA experiment. Surface PM2.5 consists of aerosol sulfate (SO4; so4ai and so4aj), ammonium (NH4; nh4ai and nh4aj), nitrate (NO3; no3ai and no3aj), primary organic matter (POA; orgpai and orgpaj), elemental carbon (EC; eci and ecj), unspeciated PM2.5 (P25; p25ai and p25aj), sodium chloride (NaCl; naai, naaj, clai, and claj), and four-bin anthropogenic and biogenic secondary organic aerosol (ASOA and BSOA, respectively) at the lowest model level.


The MADE-VBS aerosol parameterization has been reported to simulate the chemical composition over the East Asian region reasonably well (Lee et al.2020; Saide et al.2020). As shown in Fig. 8a, the surface PM2.5 analysis is dominated by nitrates (NO3), sulfate (SO4), ammonium (NH4), unspeciated PM2.5, and anthropogenic secondary organic aerosols (ASOA) in Seoul, South Korea (in that order), consistent with the background error estimates shown in Fig. 2. Compared to the analysis averaged over the whole evaluation period, the analysis in the heavy pollution event on 26 May 2016 (Fig. 8b) indicates that major constituents' contributions further increase, particularly by nitrate. Due to the limited information content of atmospheric composition measurements as well as the scarcity of such observations, it is hard to evaluate the fractional aerosol contribution by all the species defined in the MADE-VBS scheme. Hence, these figures are presented only to show the aerosol composition represented by the analysis using the aerosol chemistry and how it changes when surface PM2.5 concentrations are high due to long-range transport of air pollutants. In the WRFDA system, several tuning parameters (such as var_scaling and len_scaling) are supported for further adjustments of each aerosol species, as needed. Using the default setting (e.g., without tuning), the fractional contribution is not substantially changed by assimilation in this particular case study. The overall composition with major constituents seems consistent with those from previous studies (Jo et al.2020; Tian et al.2019).

Figure 9Horizontal distribution of the analysis increments in particulate matter concentrations and four gas species at the lowest model level over domain 1 in the DA experiment, averaged for 7–31 May 2016. The domain average is shown in the top right corner of each panel.

Figure 9 presents the horizontal distribution of analysis increments (equal to the difference between the analysis and the background) in the assimilated variables at the lowest model level, averaged for the evaluation period. Surface PM2.5 concentrations are reduced by assimilation, especially over central eastern China (along 30 N), indicating that they were mostly overpredicted in background forecasts, likely due to the systematic overestimation of anthropogenic emission data. Given that air pollutants in the emission data constitute the majority of the precursors of PM2.5 pollution, surface PM2.5 concentrations could strongly depend on emissions, which might have led to the overestimation in the background forecasts (Chen et al.2019). Therefore, the assimilation of surface PM2.5 tends to counteract the overestimation driven by the emission data over China. On the contrary, PM10 concentrations are predominantly enhanced by assimilation over most areas, presumably because coarse-mode aerosols might not be sufficiently described in both the emission data (through “E_PM_10”) and the model estimate. Among the coarse-mode species, dust aerosols (soila) show the most significant analysis increments over the Jing–Jin–Ji (an abbreviation of the Chinese names of Beijing, Tianjin, and Hebei) metropolitan region (not shown). On the other hand, the analysis does not make meaningful changes in SO2 and NO2 but tends to decrease ozone and increase carbon monoxide over South Korea.

Figure 10 examines how the vertical distribution of aerosol species systematically changes with the assimilation over domain 2. Even if only the surface observations were assimilated, the entire boundary layer is affected by continuous cycling, based on the aerosol forecast error structure. The DA experiment mainly reduces most aerosol species contributing to PM2.5 in the boundary layer. But soil-derived dust and sea salt aerosols are significantly increased in the low troposphere, due to their large standard deviations and eigenvalues estimated in the background error covariance. These coarse-mode variables could be also affected by weather conditions to a greater extent as they could be more sensitive to the large-scale advection like low-level jets. The role of meteorological fields or their interactions with aerosols will be examined in the context of concurrent data assimilation of chemical and meteorological observations in the following study.

Figure 10Vertical profile of aerosol species ([µg kg−1]) averaged over domain 2. The 6 h forecasts in NODA (black) and DA (red) are depicted with solid lines, while the analyses in DA are indicated by the dotted orange line.


The reduction of PM2.5 and the large increase of PM10 in the boundary layer, as shown in Fig. 11, are consistent with previous results. PM2.5 shows that the analysis (orange) and the following 6 h forecast (red) are not much different in the climatological sense (e.g., mean over time and space). But PM10 displays relatively large discrepancies between the two and even bigger differences between NODA and DA, mainly due to the large changes in soila, as shown in the previous figure. Systematic disparities between observations and the model estimates typically imply the deficiencies in the model simulation and/or the forcing parameters. As the focus of this study is the prediction of surface PM2.5 concentrations, no further investigation is made on the systematic errors in PM10 simulations in this study. But generally, the larger the model error gets, the harder it is to make an optimal solution in the analysis. On the other hand, SO2 and NO2 near the surface are slightly increased by assimilation over domain 2, which does not last for 6 h because of their short lifetime. While the changes in the vertical structure of those two fields are confined to the boundary layer, ozone and carbon monoxide experience the adjustments for the entire profile through the cycling.

Figure 11Same as Fig. 10 but for six pollutants assimilated in DA.


Table 3The RMSE and bias errors of 24 h forecasts in NODA and DA experiments.

Download Print Version | Download XLSX

Figure 12Time series of (a, b) root-mean-square error (RMSE) and (c, d) bias as (forecast minus observation) in surface PM2.5 concentration of the hourly forecasts from the analysis at 00:00 UTC from 7 to 31 May 2016. The left panels (a, c) show the errors of forecasts at 27 km resolution verified against 1188 stations over domain 1, and the right panels (b, d) present the 9 km forecast errors with respect to surface PM2.5 observations from 71 independent stations in Korea.


Figure 12 illustrates the time series of rms and bias errors of 0–48 h forecasts with respect to independent PM2.5 observations at the surface. The large initial errors in NODA imply that aerosol species are not properly initialized without assimilation, even if they are recycled every 6 h for the whole month. With data assimilation, initial conditions in the DA experiment are substantially improved over both domains, leading to smaller forecast errors throughout 48 h forecasts. In domain 1, a large overestimation in NODA is significantly reduced by assimilation, but the positive bias remains for 48 h. In domain 2, the systematic bias is mostly corrected in DA up to 36 h forecasts, and the RMSEs are consistently small compared to NODA. The forecast errors are mostly distinguishable for the first 24 h, and the analysis impact typically lasts no longer than 6 h in trace gases like NO2 and SO2, 24 h forecast mean errors are thus summarized in Table 3 for all six pollutants. Compared to the baseline run (NODA), the DA experiment systematically improves surface PM2.5 forecasts in both domains, with the RMSEs decreased by 26 % and 20 % over domains 1 and 2, respectively. The RMSEs of PM10 are reduced only by  14 % in DA, but the systematic underestimation gets mostly diminished over both domains. The assimilation is not very effective in the prediction of gas species except for carbon monoxide, partly due to the model errors and partly due to the observation errors that might need to be further adjusted for better results.

Table 4Air quality index values, as defined in the NIER in Korea.

Download Print Version | Download XLSX

Table 5Categorical forecasts for different air pollution events.

Download Print Version | Download XLSX

Figure 13Same as Fig. 12 but for the forecast accuracy (%) for categorized events, as defined in Tables 4 and 5. The 48 h mean forecast accuracy of each experiment is shown in the legend.


The forecast errors depicted in Fig. 12 are dominated by moderate (or clear-sky) cases in Korea, but air quality forecasting becomes more crucial for heavy pollution events, making the categorical forecast verification important in a practical sense. Table 4 categorizes four different events based on hourly surface concentrations in six pollutants, and Table 5 defines categorical forecasts for different air pollution events. Figure 13 highlights the forecast accuracy for categorized events, verified against independent observations, based on the formulae described below.


where I=a1+a2+b1+b2, II=c1+c2+d1+d2, III=a3+a4+b3+b4, IV=c3+c4+d3+d4, and N=I+II+III+IV. The air quality forecasting operated by the NIER in South Korea is currently evaluated in the same way daily, except for daily mean values (rather than hourly averages). The forecast accuracy rates defined here can be considered as skill scores for categorized events so that the higher they are, the better they become. First of all, NODA shows very stable accuracy rates between 40 % and 50 % for all events. As forecast errors usually grow with time due to the model uncertainty, this means that the forcing parameters consistently constrain the chemical forecasting. With data assimilation (red), the initial accuracy gets doubled up in both domains (up to 80 %), even for high-pollution cases. But the benefit of the analysis quickly disappears with time, implying the challenges with chemical data assimilation using the 3D-Var technique and large uncertainties in aerosol simulations. Note that the DA algorithm used here cannot produce an optimal solution when there are significant errors in the model and/or the forcing parameters as the strong-constraint variational system assumes a perfect model in the optimization. As Bocquet et al. (2015) pointed out, even with the improved analysis, it is hard to compete with forcing parameters such as emissions by which the chemical transport model is strongly driven, making the chemical analysis impact typically limited to the first-day forecasts. The results shown here are consistent with previous studies, illustrating that most of the benefits from data assimilation are limited to the first 24 h forecasts, although the overall forecast accuracy in DA still remains higher than NODA up to 48 h. In domain 2, due to the small sample size, the forecast accuracy for high concentrations is not as consistent (and smooth) as in domain 1, and the results may not be statistically significant. But the false alarm rates for all events are also reduced for 0–24 h forecasts, indicating that the assimilation systematically improves the 9 km simulations (in D2).

4 Conclusions and discussion

This study introduced a new extended version of the WRFDA 3D-Var analysis system for the RACM chemistry and the MADE-VBS aerosol parameterization (chem_opt = 108) in the WRF-Chem forecast model. It is demonstrated that the new analysis capability is successfully implemented for surface observations for six pollutants (PM2.5, PM10, SO2, NO2, O3, and CO) through the cycling experiments over the East Asian region for May 2016.

As specified in the background error covariance estimation, the assimilation of the ground measurements affects the entire boundary layer and the surrounding area (up to four grid points from the observation sites in domain 2). In the assimilation of surface PM concentrations, each aerosol species is adjusted according to its background errors, contributing to the atmospheric composition differently. Inorganic aerosols in the accumulation mode (no3aj, so4aj, and nh4aj), the major constituents of PM2.5 in the RACM/MADE-VBS scheme, are adjusted more than the Aitken-mode aerosols, spreading more widely in both horizontal and vertical. Dust and sea salt aerosols in the coarse mode significantly increase in the boundary layer, leading to a substantial increase in PM10 concentrations on the ground over East Asia. In the assimilation of trace gases, carbon monoxide is mainly increased near the surface, while the surface ozone slightly decreases over the Korean Peninsula. The analysis does not make meaningful changes in SO2 and NO2 because of their short lifetime.

The month-long cycling experiments confirmed that the aerosol assimilation could improve air quality forecasts for 24 h, verified against independent observations. The improvements were evident even in the heavy pollution events (24–26 May 2016) over South Korea, suggesting that the new system can be useful for predicting exceedance and non-exceedance events. Given the lack of interoperability between chemical parameterization schemes, these results validate that the MADE/VBS aerosol scheme can improve air quality forecasting in the context of chemical weather cycling, especially over the East Asian region. The new codes developed here will be included in the next release of the WRFDA system.

Even with the successful demonstration of the new implementation, however, the 3D-Var analysis system can be further improved or examined to maximize the impact of chemical observations. In an effort to optimize the system design, conventional weather data were assimilated at the same time, and chemical boundary conditions were updated every 6 h. Many processes and inputs (e.g., emissions) depend on meteorological conditions, and the large-scale chemical forcing can affect the quality of background forecasts, but their roles on aerosol prediction were not examined, focusing on the demonstration of the new development. Large uncertainties in the forecast model and the emission data were also not examined, as well as the observation errors that may need to be tuned, especially for trace gases. Those important aspects are left behind for future studies. A more appropriate choice of control variables, for example, can enhance the conditioning of the 3D-Var problem (Courtier and Talagrand1990). The observation operator used in this study treated each aerosol species in each mode as an individual control variable, but because the Aitken-mode variables contributed to the particulate matter concentrations only for about 10 %, it might be worth trying to reduce the total number of control variables by either using major constituents or combining Aitken and accumulation modes for each species.

The error cross correlations between meteorological variables and chemical species or between chemical species could not be specified in the current variational data assimilation framework but might also play a significant role in improving air quality forecasting, particularly for long-range transport of air pollutants that often cause heavy pollution events over Korea. For a dynamical estimation of such cross correlations, ensemble-based methods should be introduced (Bocquet et al.2015; Miyazaki et al.2020; Sandu and Chai2011).

Code and data availability

The WRF-Chem v3.9.1 codes are freely available from (NCAR2021)​​​​​​​. The WRFDA source codes developed for this study and the WPS codes modified for the UM grib data can be downloaded from (Ha2021). Real-time air quality observations are available at (Air Korea2020)​​​​​​​ for Korean sites and at (China National Environmental Monitoring Centre2020) for Chinese stations. The NCEP prepbufr data are archived and available at the National Center for Atmospheric Reserach (, National Centers for Environmental Prediction et al.2008). CAM-Chem model outputs for lateral boundary condition files can be downloaded from (last access: 28 December 2020, Buchholz et al.2019)​​​​​​​. The WRF-Chem preprocessor tools such as mozbc and megan_bio_emiss are available at (ACOM/NCAR2020).

Competing interests

The contact author has declared that there are no competing interests.


Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


All the experiments presented here were performed on the Cheyenne supercomputer at the National Center for Atmospheric Research (NCAR). This research was jointly supported by the National Center for Atmospheric Research, which is a major facility sponsored by the National Science Foundation, and a grant from the National Institute of Environment Research (NIER), funded by the Ministry of Environment (MOE) of the Republic of Korea. The author acknowledges the use of the WRF-Chem preprocessor tool (mozbc and megan_bio_emiss) provided by the Atmospheric Chemistry Observations and Modeling Lab (ACOM) of NCAR. The UM grib files were read by kwgrib2, provided by Yonghee Lee at NIER.

Financial support

This research has been supported by the National Science Foundation (grant no. 1852977) and the National Institute of Environmental Research (grant no. NIER-2019-01-02-087).

Review statement

This paper was edited by David Topping and reviewed by two anonymous referees.


Ackermann, I. J., Hass, H., Memmesheimer, M., Ebel, A., Binkowski, F. S., and Shankar, U.: Modal aerosol dynamics model for Europe: development and first applications, Atmos. Environ., 32, 2981–2999,, 1998. a, b

ACOM/NCAR: WRF-Chem preprocessor, NCAR [software],, last access: 28 December 2020. a

Ahmadov, R., McKeen, S. A., Robinson, A. L., Bahreini, R., Middlebrook, A. M., de Gouw, J. A., Meagher, J., Hsie, E.-Y., Edgerton, E., Shaw, S., and Trainer, M.: A volatility basis set model for summertime secondary organic aerosols over the eastern United States in 2006, J. Geophys. Res. Atmos., 117, D06301,, 2012. a

Air Korea: Homepage, Korea Environment Corporation,, last access: 21 December 2020. a

Aquila, V., Hendricks, J., Lauer, A., Riemer, N., Vogel, H., Baumgardner, D., Minikin, A., Petzold, A., Schwarz, J. P., Spackman, J. R., Weinzierl, B., Righi, M., and Dall'Amico, M.: MADE-in: a new aerosol microphysics submodel for global simulation of insoluble particles and their mixing state, Geosci. Model Dev., 4, 325–355,, 2011. a

Baklanov, A., Brunner, D., Carmichael, G., Flemming, J., Freitas, S., Gauss, M., Hov, o., Mathur, R., Schlünzen, K. H., Seigneur, C., and Vogel, B.: Key Issues for Seamless Integrated Chemistry–Meteorology Modeling, B. Am. Meteorol. Soc., 98, 2285–2292,, 2017. a

Bannister, R. N.: A review of forecast error covariance statistics in atmospheric variational data assimilation. II: Modelling the forecast error covariance statistics, Q. J. Roy. Meteor. Soc., 134, 1971–1996,, 2008. a

Barker, D., Huang, X.-Y., Liu, Z., Auligné, T., Zhang, X., Rugg, S., Ajjaji, R., Bourgeois, A., Bray, J., Chen, Y., Demirtas, M., Guo, Y.-R., Henderson, T., Huang, W., Lin, H.-C., Michalakes, J., Rizvi, S., and Zhang, X.: The Weather Research and Forecasting Model's Community Variational/Ensemble Data Assimilation System: WRFDA, B. Am. Meteorol. Soc., 93, 831–843,, 2012. a

Bocquet, M., Elbern, H., Eskes, H., Hirtl, M., Žabkar, R., Carmichael, G. R., Flemming, J., Inness, A., Pagowski, M., Pérez Camaño, J. L., Saide, P. E., San Jose, R., Sofiev, M., Vira, J., Baklanov, A., Carnevale, C., Grell, G., and Seigneur, C.: Data assimilation in atmospheric chemistry models: current status and future prospects for coupled chemistry meteorology models, Atmos. Chem. Phys., 15, 5325–5358,, 2015. a, b

Buchholz, R. R., Emmons, L. K., Tilmes, S., and the CESM2 Development Team: CESM2.1/CAM-chem Instantaneous Output for Boundary Conditions, UCAR/NCAR – Atmospheric Chemistry Observations and Modeling Laboratory,, 2019. a, b

Chen, D., Liu, Z., Ban, J., Zhao, P., and Chen, M.: Retrospective analysis of 2015–2017 wintertime PM2.5 in China: response to emission regulations and the role of meteorology, Atmos. Chem. Phys., 19, 7409–7427,, 2019. a

Chen, F. and Dudhia, J.: Coupling an advanced land-surface/hydrology model with the Penn State/NCAR MM5 modeling system. Part I: Model implementation and sensitivity, Mon. Weather Rev., 129, 569–585,<0587:CAALSH>2.0.CO;2, 2001. a

Chen, S.-H. and Sun, W.-Y.: A One-dimensional Time Dependent Cloud Model, J. Meteorol. Soc. Jpn. Ser. II, 80, 99–118,, 2002. a

Chin, M., Ginoux, P., Kinne, S., Torres, O., Holben, B., Duncan, B., Martin, R., Logan, J., Higurashi, A., and Nakajima, T.: Tropospheric Aerosol Optical Thickness from the GOCART Model and Comparisons with Satellite and Sun Photometer Measurements, J. Atmos. Sci., 59, 461–483,<0461:TAOTFT>2.0.CO;2, 2002. a

China National Environmental Monitoring Centre: Homepage,, last access: 21 December 2020. a

Choi, J., Park, R. J., Lee, H.-M., Lee, S., Jo, D. S., Jeong, J. I., Henze, D. K., Woo, J.-H., Ban, S.-J., Lee, M.-D., Lim, C.-S., Park, M.-K., Shin, H. J., Cho, S., Peterson, D., and Song, C.-K.: Impacts of local vs. trans-boundary emissions from different sectors on PM2.5 exposure in South Korea during the KORUS-AQ campaign, Atmos. Environ., 203, 196–205,, 2019. a

Courtier, P. and Talagrand, O.: Variational assimilation of meteorological observations with the direct and adjoint shallow-water equations, Tellus A, 42, 531–549,, 1990. a

Courtier, P., Thépaut, J.-N., and Hollingsworth, A.: A strategy for operational implementation of 4D-Var, using an incremental approach, Q. J. Roy. Meteor. Soc., 120, 1367–1387,, 1994. a

Descombes, G., Auligné, T., Vandenberghe, F., Barker, D. M., and Barré, J.: Generalized background error covariance matrix model (GEN_BE v2.0), Geosci. Model Dev., 8, 669–696,, 2015. a

Elbern, H., Strunk, A., Schmidt, H., and Talagrand, O.: Emission rate and chemical state estimation by 4-dimensional variational inversion, Atmos. Chem. Phys., 7, 3749–3769,, 2007. a

Emmons, L. K., Schwantes, R. H., Orlando, J. J., Tyndall, G., Kinnison, D., Lamarque, J.-F., Marsh, D., Mills, M. J., Tilmes, S., Bardeen, C., Buchholz, R. R., Conley, A., Gettelman, A., Garcia, R., Simpson, I., Blake, D. R., Meinardi, S., and Pétron, G.: The Chemistry Mechanism in the Community Earth System Model Version 2 (CESM2), J. Adv. Model. Earth Sy., 12, e2019MS001882,, 2020. a

Fast, J. D., Gustafson Jr., W. I., Easter, R. C., Zaveri, R. A., Barnard, J. C., Chapman, E. G., Grell, G. A., and Peckham, S. E.: Evolution of ozone, particulates, and aerosol direct radiative forcing in the vicinity of Houston using a fully coupled meteorology-chemistry-aerosol model, J. Geophys. Res., 111, D21305,, 2006. a

Friedl, M., McIver, D., Hodges, J., Zhang, X., Muchoney, D., Strahler, A., Woodcock, C., Gopal, S., Schneider, A., Cooper, A., Baccini, A., Gao, F., and Schaaf, C.: Global land cover mapping from MODIS: algorithms and early results, Remote Sens. Environ., 83, 287–302,, 2002. a

Grell, G. and Baklanov, A.: Integrated modeling for forecasting weather and air quality: A call for fully coupled approaches, Atmos. Environ., 45, 6845–6851,, 2011. a

Grell, G., Peckham, S., Schmitz, R., McKeen, S., Frost, G., Skamarock, W. C., and Eder, B.: Fully coupled “online” chemistry within the WRF model., Atmos. Environ., 39, 6957–6975,, 2005. a

Grell, G. A. and Dévényi, D.: A generalized approach to parameterizing convection combining ensemble and data assimilation techniques, Geophys. Res. Lett., 29, 38-1–38-4,, 2002. a

Guenther, A., Karl, T., Harley, P., Wiedinmyer, C., Palmer, P. I., and Geron, C.: Estimates of global terrestrial isoprene emissions using MEGAN (Model of Emissions of Gases and Aerosols from Nature), Atmos. Chem. Phys., 6, 3181–3210,, 2006. a

Ha, S.: WRFDA V3.9.1 modified for WRF-Chem with the RACM-MADE-VBS parameterization, Version 1, Zenodo [code],, 2021. a

Ha, S., Liu, Z., Sun, W., Lee, Y., and Chang, L.: Improving air quality forecasting with the assimilation of GOCI aerosol optical depth (AOD) retrievals during the KORUS-AQ period, Atmos. Chem. Phys., 20, 6015–6036,, 2020. a, b, c

Hallquist, M., Wenger, J. C., Baltensperger, U., Rudich, Y., Simpson, D., Claeys, M., Dommen, J., Donahue, N. M., George, C., Goldstein, A. H., Hamilton, J. F., Herrmann, H., Hoffmann, T., Iinuma, Y., Jang, M., Jenkin, M. E., Jimenez, J. L., Kiendler-Scharr, A., Maenhaut, W., McFiggans, G., Mentel, Th. F., Monod, A., Prévôt, A. S. H., Seinfeld, J. H., Surratt, J. D., Szmigielski, R., and Wildt, J.: The formation, properties and impact of secondary organic aerosol: current and emerging issues, Atmos. Chem. Phys., 9, 5155–5236,, 2009. a

Hong, S.-Y., Noh, Y., and Dudhia, J.: A New Vertical Diffusion Package with an Explicit Treatment of Entrainment Processes, Mon. Weather Rev., 134, 2318–2341,, 2006. a

Iacono, M. J., Delamere, J. S., Mlawer, E. J., Shephard, M. W., Clough, S. A., and Collins, W. D.: Radiative forcing by long-lived greenhouse gases: Calculations with the AER radiative transfer models, J. Geophys. Res., 113, D13103,, 2008. a, b, c

Jimenez, P., Dudhia, J., Gonzalez-Ruoco, J. F., Navarro, J., Montavez, J. P., and Garcia-Bustamente, E.: A revised scheme for the WRF surface layer formulation, Mon. Weather Rev., 140, 898–918, 2012. a

Jo, Y.-J., Lee, H.-J., Jo, H.-Y., Woo, J.-H., Kim, Y., Lee, T., Heo, G., Park, S.-M., Jung, D., Park, J., and Kim, C.-H.: Changes in inorganic aerosol compositions over the Yellow Sea area from impact of Chinese emissions mitigation, Atmos. Res., 240, 104948,, 2020. a, b, c

Lee, H.-J., Jo, H.-Y., Song, C.-K., Jo, Y.-J., Park, S.-Y., and Kim, C.-H.: Sensitivity of Simulated PM2.5 Concentrations over Northeast Asia to Different Secondary Organic Aerosol Modules during the KORUS-AQ Campaign, Atmosphere, 11, 1004,​​​​​​​, 2020. a, b

Liu, X., Ma, P.-L., Wang, H., Tilmes, S., Singh, B., Easter, R. C., Ghan, S. J., and Rasch, P. J.: Description and evaluation of a new four-mode version of the Modal Aerosol Module (MAM4) within version 5.3 of the Community Atmosphere Model, Geosci. Model Dev., 9, 505–522,, 2016. a

Liu, Z., Liu, Q., Lin, H.-C., Schwartz, C. S., Lee, Y.-H., and Wang, T.: Three-dimensional variational assimilation of MODIS aerosol optical depth: Implementation and application to a dust storm over East Asia, J. Geophys. Res., 116, D23206,, 2011. a

Lorenc, A.: Analysis methods for numerical weather prediction, Q. J. Roy. Meteor. Soc., 112, 1177–1194, 1986. a

Madronich, S.: Photodissociation in the atmosphere: 1. Actinic flux and the effect of ground reflections and clouds, J. Geophys. Res., 92, 9740–9752, 1987. a, b

McKeen, S., Grell, G., Peckham, S., Wilczak, J., Djalalova, I., Hsie, E.-Y., Frost, G., Peischl, J., Schwarz, J., Spackman, R., Holloway, J., de Gouw, J., Warneke, C., Gong, W., Bouchet, V., Gaudreault, S., Racine, J., McHenry, J., McQueen, J., Lee, P., Tang, Y., Carmichael, G. R., and Mathur, R.: An evaluation of real-time air quality forecasts and their urban emissions over eastern Texas during the summer of 2006 Second Texas Air Quality Study field study, J. Geophys. Res., 114, D00F11,, 2009. a

Miyazaki, K., Sekiya, T., Fu, D., Bowman, K. W., Kulawik, S. S., Sudo, K., Walker, T., Kanaya, Y., Takigawa, M., Ogochi, K., Eskes, H., Boersma, K. F., Thompson, A. M., Gaubert, B., Barre, J., and Emmons, L. K.: Balance of Emission and Dynamical Controls on Ozone During the Korea–United States Air Quality Campaign From Multiconstituent Satellite Data Assimilation, J. Geophys. Res. Atmos., 124, 387–413,, 2019. a

Miyazaki, K., Bowman, K. W., Yumimoto, K., Walker, T., and Sudo, K.: Evaluation of a multi-model, multi-constituent assimilation framework for tropospheric chemical reanalysis, Atmos. Chem. Phys., 20, 931–967,, 2020. a

Molod, A., Takacs, L., Suarez, M., and Bacmeister, J.: Development of the GEOS-5 atmospheric general circulation model: evolution from MERRA to MERRA2, Geosci. Model Dev., 8, 1339–1356,, 2015. a

National Center for Atmospheric Research (NCAR): WRF source codes and graphics software downloads, NCAR [software],, last access: 30 November 2021. a

National Centers for Environmental Prediction, National Weather Service, NOAA, and U.S. Department of Commerce: NCEP ADP Global Upper Air and Surface Weather Observations (PREPBUFR format), Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory [data set],, 2008. a

Parrish, D. F. and Derber, J. C.: The National Meteorological Center's spectral statistical-interpolation analysis system, Mon. Weather Rev., 120, 1747–1763, 1992. a

Peng, Z., Lei, L., Liu, Z., Sun, J., Ding, A., Ban, J., Chen, D., Kou, X., and Chu, K.: The impact of multi-species surface chemical observation assimilation on air quality forecasts in China, Atmos. Chem. Phys., 18, 17387–17404,, 2018. a

Peterson, D., Hyer, E., Han, S.-O., Crawford, J. H., Park, R., Holz, R., Kuehn, R., Eloranta, E., Knote, C., Jordan, C. E., and Lefer, B.: Meteorology influencing springtime air quality, pollution transport, and visibility in Korea, Elementa Sci. Ant., 7, 57,, 2019. a

Purser, R. J., Wu, W.-S., Parrish, D. F., and Roberts, N. M.: Numerical aspects of the application of recursive filters to variational statistical analysis. Part I: Spatially homogeneous and isotropic Gaussian covariances, Mon. Weather Rev., 131, 1524–1535, 2003. a

Saide, P. E., Kim, J., Song, C. H., Choi, M., Cheng, Y., and Carmichael, G. R.: Assimilation of next generation geostationary aerosol optical depth retrievals to improve air quality simulations, Geophys. Res. Lett., 41, 9188–9196,, 2014.  a

Saide, P. E., Gao, M., Lu, Z., Goldberg, D. L., Streets, D. G., Woo, J.-H., Beyersdorf, A., Corr, C. A., Thornhill, K. L., Anderson, B., Hair, J. W., Nehrir, A. R., Diskin, G. S., Jimenez, J. L., Nault, B. A., Campuzano-Jost, P., Dibb, J., Heim, E., Lamb, K. D., Schwarz, J. P., Perring, A. E., Kim, J., Choi, M., Holben, B., Pfister, G., Hodzic, A., Carmichael, G. R., Emmons, L., and Crawford, J. H.: Understanding and improving model representation of aerosol optical properties for a Chinese haze event measured during KORUS-AQ, Atmos. Chem. Phys., 20, 6455–6478,, 2020. a

Sandu, A. and Chai, T.: Chemical Data Assimilation – An Overview, Atmosphere, 2, 426–463,, 2011. a

Skamarock, W. C., Klemp, J. B., Dudhia, J., Gill, D. O., Barker, D. M., Duda, M. G., Huang, X.-Y., Wang, W., and Powers, J. G.: A Description of the Advanced Research WRF Version 3, NCAR Tech. Note, NCAR/TN-475+STR, 113 pp.,, 2008. a

Stockwell, W. R., Kirchner, F., Kuhn, M., and Seefeld, S.: A new mechanism for regional atmospheric chemistry modeling, J. Geophys. Res. Atmos., 102, 25847–25879,, 1997. a, b

Sun, W., Liu, Z., Chen, D., Zhao, P., and Chen, M.: Development and application of the WRFDA-Chem three-dimensional variational (3DVAR) system: aiming to improve air quality forecasting and diagnose model deficiencies, Atmos. Chem. Phys., 20, 9311–9329,, 2020. a, b

Tian, M., Liu, Y., Yang, F., Zhang, L., Peng, C., Chen, Y., Shi, G., Wang, H., Luo, B., Jiang, C., Li, B., Takeda, N., and Koizumi, K.: Increasing importance of nitrate formation for heavy aerosol pollution in two megacities in Sichuan Basin, southwest China, Environ. Pollut., 250, 898–905,, 2019. a

Woo, J.-H., Choi, K.-C., Kim, H. K., Baek, B. H., Jang, M., Eum, J.-H., Song, C. H., Ma, Y.-I., Sunwoo, Y., Chang, L.-S., and Yoo, S. H.: Development of an anthropogenic emissions processing system for Asia using SMOKE, Atmos. Environ., 58, 5–13,, 2012. a

Zaveri, R. A. and Peters, L. K.: A new lumped structure photochemical mechanism for large-scale applications, J. Geophys. Res.-Atmos., 104, 30387–30415,, 1999. a

Zaveri, R. A., Easter, R. C., Fast, J. D., and Peters, L. K.: Model for Simulating Aerosol Interactions and Chemistry (MOSAIC), J. Geophys. Res., 113, D13204,, 2008. a

Short summary
In an effort to improve air quality forecasting, the WRFDA 3D-Var system is newly extended for the assimilation of surface PM2.5 and PM10 using the RACM/MADE-VBS chemistry in the WRF-Chem model. Through a case study during the Korea–United States Air Quality (KORUS-AQ) period, it is demonstrated that the analysis can lead to improving the prediction of surface PM concentrations up to 26 % for 24 h, diminishing most bias errors.