Development of Korean Air Quality Prediction System version 1 (KAQPS v1) with focuses on practical issues

For the purpose of providing reliable and robust air quality predictions, an air quality prediction system was developed for the main air quality criteria species in South Korea (PM10, PM2.5, CO, O3 and SO2). The main caveat of the system is to prepare the initial conditions (ICs) of the Community Multiscale Air Quality (CMAQ) model simulations using observations from the Geostationary Ocean Color Imager (GOCI) and ground-based monitoring networks in northeast Asia. The performance of the air quality prediction system was evaluated during the Korea-United States Air Quality Study (KORUS-AQ) campaign period (1 May– 12 June 2016). Data assimilation (DA) of optimal interpolation (OI) with Kalman filter was used in this study. One major advantage of the system is that it can predict not only particulate matter (PM) concentrations but also PM chemical composition including five main constituents: sulfate (SO2− 4 ), nitrate (NO−3 ), ammonium (NH + 4 ), organic aerosols (OAs) and elemental carbon (EC). In addition, it is also capable of predicting the concentrations of gaseous pollutants (CO, O3 and SO2). In this sense, this new air quality prediction system is comprehensive. The results with the ICs (DA RUN) were compared with those of the CMAQ simulations without ICs (BASE RUN). For almost all of the species, the application of ICs led to improved performance in terms of correlation, errors and biases over the entire campaign period. The DA RUN agreed reasonably well with the observations for PM10 (index of agreement IOA = 0.60; mean bias MB =−13.54) and PM2.5 (IOA = 0.71; MB =−2.43) as compared to the BASE RUN for PM10 (IOA = 0.51; MB =−27.18) and PM2.5 (IOA = 0.67; MB =−9.9). A significant improvement was also found with the DA RUN in terms of bias. For example, for CO, the MB of −0.27 (BASE RUN) was greatly enhanced to −0.036 (DA RUN). In the cases of O3 and SO2, the DA RUN also showed better performance than the BASE RUN. Further, several more practical issues frequently encountered in the air quality prediction system were also discussed. In order to attain more accurate ozone predictions, the DA of NO2 mixing ratios should be implemented with careful consideration of the measurement artifacts (i.e., inclusion of alkyl nitrates, HNO3 and peroxyacetyl nitrates – PANs – in the ground-observed NO2 mixing ratios). It was also discussed that, in order to ensure accurate nocturnal prePublished by Copernicus Publications on behalf of the European Geosciences Union. 1056 K. Lee et al.: Air quality prediction system in Korea dictions of the concentrations of the ambient species, accurate predictions of the mixing layer heights (MLHs) should be achieved from the meteorological modeling. Several advantages of the current air quality prediction system, such as its non-static free-parameter scheme, dust episode prediction and possible multiple implementations of DA prior to actual predictions, were also discussed. These configurations are all possible because the current DA system is not computationally expensive. In the ongoing and future works, more advanced DA techniques such as the 3D variational (3DVAR) method and ensemble Kalman filter (EnK) are being tested and will be introduced to the Korean air quality prediction system (KAQPS).

Abstract. For the purpose of providing reliable and robust air quality predictions, an air quality prediction system was developed for the main air quality criteria species in South Korea (PM 10 , PM 2.5 , CO, O 3 and SO 2 ). The main caveat of the system is to prepare the initial conditions (ICs) of the Community Multiscale Air Quality (CMAQ) model simulations using observations from the Geostationary Ocean Color Imager (GOCI) and ground-based monitoring networks in northeast Asia. The performance of the air quality prediction system was evaluated during the Korea-United States Air Quality Study (KORUS-AQ) campaign period (1 May-12 June 2016). Data assimilation (DA) of optimal interpolation (OI) with Kalman filter was used in this study. One major advantage of the system is that it can predict not only particulate matter (PM) concentrations but also PM chemical composition including five main constituents: sulfate (SO 2− 4 ), nitrate (NO − 3 ), ammonium (NH + 4 ), organic aerosols (OAs) and elemental carbon (EC). In addition, it is also capable of predicting the concentrations of gaseous pollutants (CO, O 3 and SO 2 ). In this sense, this new air quality prediction system is comprehensive. The results with the ICs (DA RUN) were compared with those of the CMAQ simulations without ICs (BASE RUN). For almost all of the species, the application of ICs led to improved performance in terms of correlation, errors and biases over the entire campaign period. The DA RUN agreed reasonably well with the observations for PM 10 (index of agreement IOA = 0.60; mean bias MB = −13.54) and PM 2.5 (IOA = 0.71; MB = −2.43) as compared to the BASE RUN for PM 10 (IOA = 0.51; MB = −27.18) and PM 2.5 (IOA = 0.67; MB = −9.9). A significant improvement was also found with the DA RUN in terms of bias. For example, for CO, the MB of −0.27 (BASE RUN) was greatly enhanced to −0.036 (DA RUN). In the cases of O 3 and SO 2 , the DA RUN also showed better performance than the BASE RUN. Further, several more practical issues frequently encountered in the air quality prediction system were also discussed. In order to attain more accurate ozone predictions, the DA of NO 2 mixing ratios should be implemented with careful consideration of the measurement artifacts (i.e., inclusion of alkyl nitrates, HNO 3 and peroxyacetyl nitrates -PANs -in the ground-observed NO 2 mixing ratios). It was also discussed that, in order to ensure accurate nocturnal pre-dictions of the concentrations of the ambient species, accurate predictions of the mixing layer heights (MLHs) should be achieved from the meteorological modeling. Several advantages of the current air quality prediction system, such as its non-static free-parameter scheme, dust episode prediction and possible multiple implementations of DA prior to actual predictions, were also discussed. These configurations are all possible because the current DA system is not computationally expensive. In the ongoing and future works, more advanced DA techniques such as the 3D variational (3DVAR) method and ensemble Kalman filter (EnK) are being tested and will be introduced to the Korean air quality prediction system (KAQPS).

Introduction
Air quality has long been considered an important issue in climate change, visibility and public health, and it is strongly dependent upon meteorological conditions, emissions and the transport of air pollutants. Air pollutants typically consist of atmospheric particles and gases such as particulate matter (PM), carbon monoxide (CO), ozone (O 3 ), nitrogen dioxide (NO 2 ) and sulfur dioxide (SO 2 ). These aerosols and gases play important roles in anthropogenic climate forcing, both directly (Bellouin et al., 2005;Carmichael et al., 2009;IPCC, 2013;Scott et al., 2014) and indirectly (Breon et al., 2002;IPCC, 2013;Penner et al., 2004;Scott et al., 2014) influencing the global radiation budget. Among the various air pollutants, PM and surface O 3 are the most notorious health threats, as has been stated by several previous studies (Carmichael et al., 2009;Dehghani et al., 2017;Khaniabadi et al., 2017).
With the stated importance of atmospheric aerosols and gases, considerable research efforts have been made to monitor and quantify their amounts in the atmosphere through satellite-, airborne-and ground-based observations as well as chemistry-transport model (CTM) simulations. In South Korea, the Korean Ministry of the Environment (KMoE) provides real-time chemical concentrations as measured by ground-based observations for six air quality criteria air pollutants (PM 10 , PM 2.5 , O 3 , CO, SO 2 and NO 2 ) at the Air Korea website (https://www.airkorea.or.kr, last access: 8 March 2020). PM 10 or (PM 2.5 ) refers to the atmospheric particulate matter that has an aerodynamic diameter less than 10 (or 2.5) µm. In addition, the National Institute of Environmental Research (NIER) of South Korea provides air quality predictions using multiple CTM simulations. Air quality predictions are another crucial element for protecting public health through the forecasting of high air pollution episodes in advance and alerting citizens about these high episodes. In this context, reliable and robust air quality forecasts are necessary to avoid any confusion caused by poor predictions given by CTM simulations.
Although there are various datasets representing air quality, limitations remain in the observations and model outputs. Specifically, observation data are, in general, known to be more accurate than model outputs, but they have spatial and temporal limitations. These limitations will be overcome by improving spatial and temporal coverage via future geostationary satellite instruments such as the Geostationary Environment Monitoring Spectrometer (GEMS) over Asia, the Tropospheric Emissions: Monitoring of Pollution (TEMPO) over North America and the Sentinel-4 over Europe. In addition, the TROPOspheric Monitoring Instrument (TROPOMI) on board the Copernicus Sentinel-5 Precursor satellite was successfully launched into low earth orbit (LEO) on 13 October 2017 and is providing information on the chemical composition in the atmosphere with a higher spatial resolution of 3.5 km × 7 km.
Unlike observational data, models can provide meteorological and chemical information without any spatial and temporal data discontinuity, but they do have an issue of inaccuracy. The major causes of uncertainty in the results of CTM simulations are introduced from imperfect emissions, meteorological fields, initial conditions (ICs), and physical and chemical parameterizations in the models . In order to minimize the limitations and maximize the advantages of observation data and model outputs, there have been numerous attempts to provide accurate and spatially as well as temporally continuous information on chemical composition in the atmosphere by integrating observation data with model outputs via data assimilation (DA) techniques.
Although the Korean numerical weather prediction (NWP) carried out by the Korea Meteorological Administration (KMA) employs various DA techniques, almost no previous efforts have been made to develop an air quality prediction system with DA in South Korea. Therefore, in the present study, the air quality prediction system named as Korean Air Quality Prediction System version 1 (KAQPS v1) was developed by preparing ICs via DA for the Community Multiscale Air Quality (CMAQ) model (Byun and Schere, 2006;Byun and Ching, 1999) using satellite-and ground-based observations for particulate matter (PM) and atmospheric gases such as CO, O 3 and SO 2 . The performances of the system were then demonstrated during the period of the Korea-United States Air Quality Study (KORUS-AQ) campaign (1 May-12 June 2016) in South Korea.
In this study, the optimal interpolation (OI) method with the Kalman filter was applied in order to develop the air quality prediction system, since this method is still useful and viable in terms of computational cost and performance. The performance of the method is almost comparable to that of the 3D variational (3DVAR) method, as shown in Tang et al. (2017). More complex and advanced DA techniques are currently being and will continue to be applied to current air quality prediction systems. These works are now in progress.
In addition, this paper also discusses several practical issues frequently encountered in the air quality predictions such as (i) DA of NO 2 mixing ratios for accurate ozone prediction with a careful consideration of measurement artifacts; (ii) the issue of the nocturnal mixing layer height (MLH) for nocturnal predictions; (iii) predictions of dust episodes; (iv) the use of non-static free parameters; and (v) the influences of multiple implementations of the DA before the actual predictions.
The details of the datasets and methodology used in this study are described in Sect. 2. The results of the developed air quality prediction system are discussed in Sect. 3, and then a summary and conclusions are given in Sect. 4.

Methodology
The air quality prediction system was developed using the CMAQ model along with meteorological inputs provided by the Weather Research and Forecasting (WRF) model . The ICs for the CMAQ model simulations were prepared via the DA method using satelliteretrieved and ground-based observations. The performances of the developed prediction system were evaluated using ground in situ data. The models, data and DA technique are described in detail in the following sections.

WRF model simulations
The WRF model has been developed for providing mesoscale numerical weather prediction (NWP). It has also been used to provide meteorological input fields for CTM simulations Chemel et al., 2010;Foley et al., 2010;Lee et al., 2016;Park et al., 2014). In this study, WRF v3.8.1 with the Advanced Research WRF (ARW) dynamical core was applied to prepare the meteorological inputs for the CMAQ model simulations. Dynamical and physical configurations for the WRF model simulations were selected as follows: the Yonsei University (YSU) scheme for planetary boundary layer ; the WRF single-moment 6-class (WSM6) scheme for the microphysics (Hong and Lim, 2006); the Grell-Freitas ensemble scheme for cumulus parameterization (Grell and Freitas, 2014); the Noah-MP land surface model Yang et al., 2011); the Rapid Radiative Transfer Model for Global Circulation Models (RRTMG) for shortwave/longwave options (Iacono et al., 2008); and the revised MM5 scheme for surface layer options (Jimenez et al., 2012). The National Centers for Environmental Prediction (NCEP) Final (FNL) Operational Global Analysis data on 1 • ×1 • grids were chosen for the ICs and boundary conditions (BCs) for the WRF simulations. In order to minimize meteorological field errors for the applications of ICs and BCs to the WRF simulations, the objective analysis (OBSGRID) nudging was conducted using the NCEP Automated Data Processing (ADP) Global Upper Air Observations/Surface Observational weather data via the Cressman's (1959) successive correction method. The adjusted meteorological variables were temperature, geopotential height, relative humidity and zonal/meridional winds.
The model domain for the WRF simulations covers northeast Asia with a horizontal resolution of 15 km × 15 km, having a total of 223 latitudinal and 292 longitudinal grid cells. The size of the WRF domain is slightly larger than that of the CMAQ domain, as shown in Fig. 1. The meteorological data have 27 vertical layers from the surface (1000 hPa) to 50 hPa. The WRF meteorological fields (e.g., temperature, pressure, wind, humidity, and clouds) were then transformed into the CMAQ-ready format via the Meteorology-Chemistry Interface Processor (MCIP; Otte and Pleim, 2010) v4.3, which is a software to serve for transforming horizontal and vertical coordinates while trying to maintain dynamic consistency between WRF and CMAQ model simulations.

CMAQ model simulations
The CMAQ v5.1 model was used to estimate the concentrations of the atmospheric chemical species over the domain, as shown in Fig. 1. The CMAQ domain has 204 latitudinal and 273 longitudinal grid cells in total and also has a 15 km × 15 km horizontal resolution and 27 vertical layers. The CMAQv5.1 model was configured to use. Chemical and physical configurations for the CMAQ model simulations were selected as follows: SAPRC07tc for the gasphase chemical mechanism (Hutzell et al., 2012); AERO6 for aerosol thermodynamics ; Euler Backward Iterative (EBI) chemistry solver (Hertel et al., 1993), which is a numerically optimized photochemistry mechanism solver; M3DRY for dry deposition velocity (Pleim and Xiu, 2003;Xiu and Pleim, 2001); global mass-conserving scheme (Yamartino and WRF) for horizontal and vertical advection (Colella and Woodward, 1984); MULTISCALE (Louis, 1979), which is a simple first-order eddy diffusion scheme for horizontal diffusion; and the Asymmetric Convective Model, version 2 (ACM2; Pleim, 2007a, b), for vertical diffusion.
For anthropogenic emissions, KORUS v1.0 emissions (Woo et al., 2012) were used. The emissions cover almost all of Asia and are based on three emission inventories: the Comprehensive Regional Emissions inventory for Atmospheric Transport Experiment (CREATE) for East Asia excluding Japan; the Model Inter-Comparison Study for Asia (MICS-Asia) for Japan; and the Studies of Emissions and Atmospheric Composition, Clouds and Climate Coupling by Regional Surveys (SEAC4RS) for South and Southeast Asia.
Biogenic emissions were prepared by running the Model of Emissions of Gases and Aerosols from Nature (MEGAN v2.1; Guenther et al., 2006Guenther et al., , 2012 with a grid size identical to that of the CMAQ model simulations. For the MEGAN simulations, the MODIS land cover data (Friedl et al., 2010)   obtained from the Fire Inventory from NCAR (FINN; Wiedinmyer et al., , 2011. The lateral BCs for the CMAQ model simulations were prepared using the global model results of the Model for Ozone and Related chemical Tracers, version 4 (MOZART-4; Emmons et al., 2010) at every 6 h. The mapping and regridding of the MOZART-4 data were conducted by matching the CMAQ grid information.

Satellite-based observations
A Korean geostationary satellite of Communication, Ocean and Meteorological Satellite (COMS) was launched on 26 June in 2010 over the Korean Peninsula. The COMS is a geostationary orbit satellite, and it is stationed at an altitude of approximately 36 000 km at a latitude of 36 • N and a longitude of 128.2 • E, with a horizontal coverage of 2500 km × 2500 km (refer to Fig. 1). Among the three payloads of the COMS, Geostationary Ocean Color Image (GOCI) is the first multichannel ocean color sensor with visible and nearinfrared channels. The GOCI instrument provides hourly spectral images with a spatial resolution of 500 m × 500 m from 00:30 to 07:30 coordinated universal time (UTC) for eight spectral (six visible and two near-infrared) channels at 412, 443, 490, 555, 660, 680, 745 and 865 nm.
The Yonsei aerosol retrieval (YAER) algorithm for the GOCI sensor was initially developed by Lee et al. (2010) to retrieve the aerosol optical properties (AOPs) over ocean areas and was then improved by expanding to consider nonspherical aerosol optical properties (Lee et al., 2012). Choi et al. (2016) further extended the algorithm for application to land surfaces, and the algorithm was referred to as the GOCI YAER version 1 algorithm. With the GOCI YAER algorithm, hourly aerosol optical depths (AODs) at 550 nm were produced over East Asia. Choi et al. (2016) compared the retrieved GOCI AODs with other satellite-retrieved and ground-based observations and found several errors in the cloud masking and surface reflectances. These errors were corrected in the recently updated second version of the GOCI YAER algorithm (Choi et al., 2018), which used the updated cloud masking and more accurate surface reflectances. In this study, the most recent GOCI AOD products from the GOCI YAER version 2 algorithm were used.

Ground-based observations
In addition to the satellite data, ground-based observations in South Korea and China were also collected for use in the air quality prediction system for PM and gas-phase pollutants. The orange, red and blue dots in Fig. 1 represent the ground-based observation sites in China, Air Korea and supersite stations in South Korea, respectively. These observations provide real-time concentrations of air quality criteria species such as PM 10 , PM 2.5 , CO, O 3 , SO 2 and NO 2 .
Throughout the period of the KORUS-AQ campaign, ground-based observation data were obtained from 1514 stations in China, 264 Air Korea stations and seven supersite stations in South Korea. In this study, 80 % of the groundbased observations in China and Air Korea stations in South Korea were randomly selected for use in the prediction system. The other 20 % of the data and supersite observations were used to evaluate the performances of the developed air quality prediction system.
In addition, AErosol RObotic NETwork (AERONET) AODs were used to conduct an independent evaluation of the air quality prediction system. AERONET is a federated global ground-based sun photometer network (Holben et al., 1998). Cloud-screened and quality-assured level 2.0 AODs for the AERONET were used in this study.

Air quality prediction system
In the present study, the air quality prediction system was developed by adjusting the ICs for the CMAQ model simulations based on DA with satellite-retrieved and groundmeasured observations. Two parallel WRF-CMAQ model runs were conducted. The first experiment that involved adjusting ICs via DA is referred to as DA RUN (see Fig. 2). In order to evaluate the prediction system, a second experiment, in which the ICs were originated from the previous CMAQ model simulations without assimilations, was also conducted. This CMAQ run is referred to as BASE RUN.

AOD calculations
CMAQ AODs are calculated by integrating the aerosol extinction coefficient (σ ext ) using the following equation: where z represents the vertical height; σ ext is defined as the sum of the absorption coefficient (σ abs ) and the scattering coefficient (σ sca ); and σ abs and σ sca can be estimated by Eqs. (3) and (4), respectively, as shown below.
where i and j denote the particulate species and size bin (or particle mode), respectively; ω ij (λ) is the single scattering albedo; β ij (λ) is the mass extinction efficiency (MEE) of particulate species i for the size bin or particle mode j ; [C] ij is the concentration of particulate species including (NH 4 ) 2 SO 4 , NH 4 NO 3 , black carbon, organic aerosols (OA), mineral dust and sea-salt aerosols; RH is the relative humidity; f ij (RH) is the hygroscopic factor; and the single scattering albedo (ω) implies to the fraction (portion) of scattering in the total extinction. Using Eqs.
(2)-(4), AODs were calculated from the aerosol composition and RH. There have been intensive tests using different β and f (RH) values in the following three previous studies: (1) Chin et al.'s (2002) study with the Goddard Chemistry Aerosol Radiation and Transport (GOCART) model; (2) Martin et al.'s (2003) study with the GEOS-Chem model; and (3) Malm and Hand's (2007) study with the CMAQ model. Lee et al. (2016) tested these methods and then found that Chin et al.'s (2002) method reproduced the best results in estimating AODs at 550 nm over East Asia. On the basis of Lee et al.'s (2016) work, σ ext was estimated with the β and f (RH) values suggested by Chin et al. (2002). After that, σ ext was integrated with respect to altitude, in order to calculate the AODs. The calculated AODs were used in the air quality prediction system in order to prepare the ICs for the PM predictions.

Data assimilation (DA)
The ground-based observations, together with GOCI-derived AODs, were used to prepare the ICs for the air quality predictions with the CMAQ model simulations. In order to achieve this, the following steps were taken: (i) the CMAQ-calculated concentrations of CO, O 3 and SO 2 were combined with the concentrations of CO, O 3 and SO 2 obtained from groundbased observations in South Korea (Air Korea) and China; (ii) the CMAQ-calculated AODs were assimilated with the GOCI AODs; (iii) the assimilated AODs were converted into PM 10 ; (iv) the converted PM 10 was again assimilated at the surface in South Korea and China; and (v) after the DA at the surface, the ratios of the assimilated species concentrations to the original CMAQ-simulated concentrations were applied so as to the adjust vertical profiles of the chemical species above the surface. In the air quality prediction system, the DA cycle is 24 h, and the assimilation takes place every day at 00:00 UTC (refer to Fig. 3).
The optimal interpolation (OI) method with the Kalman filter was chosen in the air quality prediction system. The OI method was originally used for meteorological applications (Lorenc, 1986) and has also been used in the assimilations for trace gases (Khattatov et al., , 2000Lamarque et al., 1999;Levelt et al., 1998). Recently, the OI technique has also been applied to aerosol fields (Collins et al., 2001;Yu et al., 2003;Generoso et al., 2007;Adhikary et al., 2008;Carmichael et al., 2009;Chung et al., 2010;Park et al., 2011Park et al., , 2014Tang et al., 2015Tang et al., , 2017. Aerosol assimilation using the OI method was first applied by Collins et al. (2001) as follows: where τ m , τ m and τ o represent the assimilated products by the OI method, the modeled values and the observed values, respectively; K is the Kalman gain matrix; H is the observation operator (or forward operator), which is an interpolator from model to observation space; B and O are the background and observation error covariance matrices, respectively; (·) T denotes the transpose of a matrix; f o is the frac- Figure 2. Schematic diagram of the Korean air quality prediction system developed in this study. The initial conditions (ICs) of the CMAQ model simulations are prepared by assimilating CMAQ outputs with satellite-retrieved and ground-measured observations. The data process for preparing the ICs is shown in the box with dashed gray lines. Figure 3. Schematic diagram of the Korean air quality prediction system for particulate matter (PM) and gas-phase pollutants. The data assimilation (DA) cycle is 24 h for both PM and gas-phase pollutants such as CO, O 3 and SO 2 . The DA of NO 2 is excluded in the current study, the reason for which is discussed in the text.
tional error in the observation-retrieved value; ε o is the minimum root-mean-square error in the observation-retrieved values; I denotes the unit matrix; f m is the fractional error in the model estimates; ε m is the minimum root-mean-square error in the model estimates; d x is the horizontal distance between two model grid points; l mx is the horizontal correlation length scale for the errors in the model; d z is the vertical distance between two model grid points; and l mz is the vertical correlation length scale for the errors in the model. In this work, the OI technique was applied for the DA of atmospheric gaseous species as well as particulate species.
Six free parameters (f m , f o , ε m , ε o , l mx and l mz ) were used to calculate the error covariance matrices of the observations and model, the mathematical formalisms of which are described in Eqs. (7) and (8), respectively. Several previous studies have used fixed values for free parameters (Collins et al., 2001;Yu et al., 2003;Adhikary et al., 2008;Chung et al., 2010). These runs are called "static" runs. In contrast to those previous studies, "non-static" free parameters were applied in this study by minimizing the differences between the assimilated values and observations via an iterative process at each assimilation time step. This non-static free-parameter scheme is possible due to the fact that the OI technique with the Kalman filter is much less costly in terms of computation time than other DA techniques, such as the 3D or 4D variational methods. This is another advantage of using the OI technique in this system. It typically takes less than 20 min with a workstation environment (dual Intel Xeon 2.40 GHz processor).

Allocation of the assimilated PM 10 & PM 2.5 to particulate composition
In the procedure of DA, PM 10 was assimilated in this study, because the PM 10 data were more plentiful than PM 2.5 . The assimilated PM 10 then needs to be allocated to the PM composition for the CMAQ-model prediction runs. In order to achieve this, the differences between the assimilated PM 10 and background PM 10 ( PM 10 ) were first calculated. Then, PM 2.5 was estimated using the ratios of PM 2.5 to PM 10 from the background CMAQ model runs (i.e., PM 2.5 = PM 10 × PM 2.5 /PM 10 ). PM 2.5 was then allocated to the  Fig. 4. In Fig. 4, the PM "OTHERS" indicates the remaining particulate matter species after excluding sulfate, nitrate, ammonium, organic aerosol (OA) and elementary carbon (EC). The PM OTH-ERS occupies 25 % of the total PM 2.5 observed at supersites. The other fraction, PM 10 ×(1−PM 2.5 /PM 10 ), was also distributed to the coarse-mode particles (PM 2.5-10 ) as crustal elements.

Results and discussions
The performances of the air quality prediction system were evaluated by comparing them with ground-based observations from the Air Korea network and supersite stations in South Korea. Several sensitivity analyses were also conducted in order to assess the influences of the DA time intervals on the accuracy of the air quality prediction.  the DA at 00:00 UTC. For the forecast hours from 01:00 to 23:00 UTC, all of the ground observations (254 Air Korea and seven supersite stations) were used to evaluate the performances of the developed air quality prediction system. As shown in Fig. 5, we achieved some improvements in the prediction performances by applying the ICs to the CMAQ model simulations. The BASE RUN significantly underpredicted PM 10 , PM 2.5 and CO, while the DA RUN produced concentrations that were more consistent with the observations than those of the BASE RUN.

Time-series analysis
In the case of CO, the observed CO mixing ratios were about 3 times higher than those from the BASE RUN. These large differences are well known and have been attributed to the underestimated emissions of CO (Heald et al., 2004). However, when the DA was applied, the predictions of the CO mixing ratios improved. Similarly, the performances of the PM 10 and PM 2.5 predictions improved with the application of the DA. Unlike PM 10 , PM 2.5 and CO, the O 3 mixing ratios and its diurnal trends from both the BASE RUN and DA RUN tend to be well matched with the observations. By contrast, the poorest performances of the BASE RUN and the DA RUN were shown for SO 2 .
In addition, a dust event took place between 6 and 8 May. This event is captured by the DA RUN (check red peaks in Fig. 5a and b), while the BASE RUN cannot capture this dust event. This demonstrates the capability of the current system to possibly predict dust events in South Korea. In the DA RUN, dust information is provided to the CMAQ model runs  through both/either GOCI AOD and/or ground PM observations measured along the dust plume tracks.
The effectiveness of the DA with respect to prediction time was also analyzed by calculating the aggregated average concentrations of atmospheric species (see Figs. 6, 7 and 9). Figure 6 depicts the CMAQ-calculated average concentrations of PM 10 , PM 2.5 , CO and SO 2 against the Air Korea observations. Our air quality prediction system regenerated relatively well-matched concentrations for PM 10 , PM 2.5 and CO from the DA RUN. Figure 7 shows the case of ozone from the DA RUN by assimilating CMAQ outputs with Air Korea-observed (a) O 3 mixing ratios and (b) both O 3 and NO 2 mixing ratios for a preliminary test run. The ozone mixing ratios from the DA RUN in Fig. 7a were reasonably consistent with the observations at 00:00 UTC but disagreed with those between 04:00 and 09:00 UTC (13:00 and 18:00 KST), when solar insolation is the most intense. This may be attributed to the chemical imbalances between ozone production and ozone destruction (or titration). However, if CMAQ NO 2 was assimilated with ground-based observations in South Korea (Air Korea) and China, the predicted ozone mixing ratios became substantially closer to the observations, as shown in Fig. 7b. This is clearly due to the fact that NO x is an important precursor of ozone. In the prediction of the ozone mixing ratios, both 1 h peak ozone (around 15:00 KST) and 8 h averaged ozone mixing ratios (between 09:00 and 17:00 KST) are important. Figure 7 clearly shows that the prediction accuracies of both the ozone mixing ratios were improved after the DA of NO 2 mixing ratios.
Although the DA for NO 2 provided better ozone predictions, one should take caution in using the NO 2 observations. The NO 2 mixing ratios measured at Air Korea sites are known to be contaminated by other nitrogen gases such as nitric acid (HNO 3 ), peroxyacetyl nitrates (PANs) and alkyl nitrates (ANs), since the Air Korea NO 2 mixing ratios are measured through a chemiluminescent method with catalysts of gold or molybdenum oxide at high temperatures. These are known to be "NO 2 measurement artifacts" (Jung et al., 2017), which is one of the reasons that the DA of NO 2 was not shown in Fig. 6. The NO 2 mixing ratios are corrected from the Air Korea NO 2 data and are then used to prepare the ICs via the DA for more accurate ozone and NO 2 predictions. Currently, such corrections of the observed NO 2 mixing ratios are being standardized with more sophisticated year-long NO 2 measurements. After the corrections of the NO 2 measurement artifacts, more evolved schemes of ozone and NO 2 predictions will be possible in the future. As shown in Fig. 7, about a 20 % reduction (average fraction of non-NO 2 mixing ratios in the observed NO 2 mixing ratios) was made for these demonstration runs (Jung et al., 2017).
Another practical issue is now discussed. Although the assimilation with the observed NO 2 mixing ratios can enhance the accuracy of the predictions of the daytime ozone mixing ratios, the nighttime ozone mixing ratios tend to be consistently overpredicted in the aggregated plot of the ozone mixing ratios at the observation sites (see Fig. 7). This can be caused by underestimated NO 2 mixing ratios and thus not enough nighttime ozone titration. As mentioned before, reliable NO 2 prediction via the correction of the NO 2 mea- Figure 9. Aggregated average concentrations of (a) PM 10 , (b) PM 2.5 , (c) OC, (d) EC, (e) sulfate, (f) nitrate and (g) ammonium as predicted by CMAQ model during the period of the KORUS-AQ campaign. The others are the same as those shown in Fig. 7, except for the fact that the observation data used here were obtained from the seven supersite stations in South Korea. surement artifacts will be made in the future. Another possible reason of the overpredicted ozone mixing ratios during the nighttime can be underestimation of the mixing layer height (MLH). Figure 8 shows a comparison between lidarmeasured MLH (black dashed line) and WRF-calculated MLH (with the option of the Yonsei University planetary boundary layer scheme by ; see red line). As shown in Fig. 8, the nocturnal lidar-measured MLH is about 2 times higher than the nocturnal WRF-calculated MLH as measured at a lidar site inside the campus of Seoul National University (SNU) in Seoul. Such underestimated MLH in the model tends to compress the ozone molecules within the mixing layer during the nighttime, which leads to consistently overpredicted nocturnal ozone mixing ratios. Based on this discrepancy shown in Fig. 8, more intensive comparison study is being carried out by comparing lidar-measured MLH with model-calculated MLH at multiple sites in South Korea.
In this work, the aerosol composition (including EC, OA, sulfate, nitrate and ammonium) was further compared with the composition observed at the supersites shown in Fig. 9. As shown in Fig. 9, agreement was observed between the DA RUN and observations for all of the major PM constituents. Again, a strong capability of our DA system is to improve the predictions of the aerosol composition. Figure 10 shows the spatial distributions and bias of PM and chemical species throughout the entire period of the KORUS-AQ campaign over the Seoul metropolitan area (SMA). Noticeable improvements are observed to have been achieved in the spatial distributions by applying the ICs to the CMAQ model simulations, particularly for PM 10 (Fig. 10a), PM 2.5 (Fig. 10b) and CO (Fig. 10c). As shown in Fig. 10, the underpredicted concentrations of PM 10 , PM 2.5 and CO were adjusted to concentrations closer to the observations. In the case of SO 2 (see Fig. 10d), the DA RUN produced better agreement with the observations than the BASE RUN, but there were still underpredicted SO 2 concentrations over the northeastern part of the SMA.

Spatial distribution
By contrast, relatively lower ozone mixing ratios from the DA RUN against the BASE RUN were found in the southwestern part of the SMA (see Fig. 10e). Due to the nonlinear relationship between NO x and O 3 , high mixing ratios (or emissions) of NO x in the SMA can lead to depletion of ozone. In these runs, the precursors of ozone such as NO x and volatile organic compounds (VOCs) were excluded in the preparation of the ICs for CMAQ model simulations. Again, this is because the Air Korea NO 2 mixing ratios are contaminated by several reactive nitrogen species, so the data cannot be directly used in the assimilation procedures. In the case of VOCs, a limited number of datasets are available in South Korea for the DA. Improvements in the prediction of ozone mixing ratios can be achieved when the NO 2 mixing ratios are corrected and a sufficient amount of VOC data (possibly from satellite data in the future) are available.

Statistical analysis
In order to achieve a better understanding of the performances of the DA RUN, analyses of statistical variables such as index of agreement (IOA), Pearson's correlation coefficient (R), root-mean-square error (RMSE) and mean bias (MB) were conducted using observations from the Air Korea stations for PM 10 , PM 2.5 , CO, SO 2 and O 3 (see Fig. 11). Definitions of the statistical variables are given in Appendix A.
After the applications of the ICs, both RMSE and MB became lower, while the correlation coefficient became higher for the entire set of predictions. In addition, it was found that the differences between the BASE RUN and the DA RUN tended to diminish as the prediction time progressed. The results of the statistical analysis are listed in Table 1  was found for C, with MB = -0.036 for the DA RUN and MB = −0.27 for the BASE RUN. Regarding O 3 and SO 2 , the DA RUN showed slightly better performances than the BASE RUN.  3.2 Sensitivity test of DA time interval

AOD
In this section, a sensitivity analysis was conducted with different implementation time intervals of the DA (i.e., 24, 6 and 3 h) for AOD (refer to Fig. 12). As shown in Fig. 12, more frequent implementation of the DA is expected to make the predicted results closer to the observations. Although the DA RUN with a shorter assimilation time interval tends to produce a better prediction, it is not always the most appropriate choice, since the shorter assimilation time interval results in increased computational cost. Therefore, an optimized assimilation time interval should be found to achieve the best performances from the given DA system with the consideration of its own computational ability.

PM and gases
In addition, sensitivity analyses of the developed air quality prediction system applied to multiple implementations of the DA with different time intervals were also investigated for (a) PM 10 , (b) PM 2.5 , (c) CO, (d) SO 2 and (e) O 3 , shown in Fig. 13. Figure 13 shows a soccer plot analysis for BASE RUN (blue crosses) and DA RUNs with different DA time intervals of 24 h (OI; red circles), 2 h (2 h OI; black diamonds) and 1 h (1 h OI; dark-green triangles). This set of testing was designed based on the fact that the performances are expected to improve if the DAs are implemented multiple times prior to the actual predictions at 00:00 UTC. Here, for the 2 h OI run, the DA was implemented three times a day at 20:00, 22:00 and 00:00 UTC, while for the 1 h OI run, the DA was implemented at 22:00, 23:00 and 00:00 UTC. The performances of all of the chemical species excluding ozone improved, as expected, with DA RUNs with more frequent and longer DA time intervals (i.e., three-time implementation with a 2 h time interval in our cases). In the case of ozone, the best performance was found for the air quality prediction system with the DA time interval of 24 h. Unsurprisingly, more frequent DAs prior to the actual prediction mode (i.e., before 00:00 UTC in our system) with a longer time interval (such as 2 h) will be computationally costly. There will certainly be a tradeoff between the precision of air quality prediction and the computational cost. The system should be designed under the consideration of these two factors.

Summary and conclusions
In this study, the air quality prediction system was developed by preparing the ICs for CMAQ model simulations using GOCI AODs and ground-based observations of PM 10 , CO, ozone and SO 2 during the period of the KORUS-AQ campaign (1 May-12 June 2016) in South Korea. The major advantages of the developed air quality prediction system are its comprehensiveness in predicting the ambient concentrations of both gaseous and particulate species (including PM composition) and its powerfulness in terms of computational cost. Figure 12. Variations in three performance metrics (R, RMSE and MB) with time intervals of data assimilations. For these tests, the GOCI AODs were used in the DA to update the initial conditions of the CMAQ model simulations. The results from the three CMAQ model simulations were compared with AERONET AODs ("ground truth"). The blue squares represent the performances from the BASE RUNs and the red squares indicate the performances from the DA RUNs. The three experiments were carried out with the assimilation time intervals of 24, 6 and 3 h, respectively. Here, the DA RUN with the 24 h time interval is referred to as "air quality prediction", and the DA RUNs with the 6 and 3 h time interval are referred to as "air quality reanalysis".
The performances of the developed prediction system were evaluated using near-surface in situ observation data. The CMAQ model runs with the ICs (DA RUN) showed higher consistency with the observations of almost all of the chemical species, including PM composition (sulfate, nitrate, ammonium, OA and EC) and atmospheric gases (CO, ozone and SO 2 ), than the CMAQ model runs without the ICs (BASE RUN). Particularly for CO, the DA was able to remarkably improve the model performances, while the BASE RUN significantly underpredicted the CO concentrations (predicting about one-third of the observed values). In the case of ozone, both the BASE RUN and DA RUN were in close agreement with observations. More reliable predictions of ozone mixing ratios will be achieved via the DA of the observed NO 2 mixing ratios and the corrections of model-simulated mixing layer height (MLH). For SO 2 , the performances of both the BASE RUN and the DA RUN were somewhat poor. Regarding this issue, more accurate SO 2 emissions are required to achieve better SO 2 predictions, and these can be estimated through inverse modeling using satellite data (e.g., Lee et al., 2011). The adjustments of both ICs and emissions may be able to improve the performances of the air quality prediction system, and this will be examined in future studies. The CMAQ-predicted concentrations were compared with the Air Korea observations. Blue crosses, red circles, dark-green triangles and black diamonds represent the performances calculated from the BASE RUN, the DA RUNs with the OI system, the 1 h OI system and the 2 h OI system, respectively. Moreover, the developed air quality prediction system will be upgraded by using the new observation data that will be retrieved after 2020 from the Geostationary Environment Monitoring Spectrometer (GEMS) with a high spatial resolution of 7 km×8 km as well as a high temporal resolution of 1 h over a large part of Asia. In addition, the current DA technique of the OI with the Kalman filter can also be upgraded with the use of more advanced DA methods such as variational techniques of 3DVAR and 4DVAR methods, as well as with the ensemble Kalman filter (EnK) method. These research endeavors are currently underway.
In conjunction with improving the air quality modeling system, artificial intelligence (AI)-based air quality prediction systems are also currently being developed in several ways (e.g., Kim et al., 2019). Actually, Kim et al. (2019) developed an AI-based PM prediction system based on a deep recurrent neural network (RNN) in South Korea. The AIbased prediction system was optimized by iterative model trainings with the inputs of ground-observed PM 10 , PM 2.5 , and meteorological fields including wind speed, wind direction, relative humidity, and precipitation. The AI-based prediction system showed better performances at several sites than the CMAQ model simulations. However, it works only for the observation sites in South Korea where ground-based observations are available. By taking advantage of both the CTM-based air quality prediction and the AI-based prediction systems, both systems will be eventually combined so as to create a more accurate hybrid air quality prediction system over South Korea. This will be the ultimate goal of the series of our research work.

Appendix A: Formulas for statistical evaluation indices
The formulas used to evaluate the performances of the air quality prediction system are defined as follows: Index of agreement (IOA) =