A comparative study of two-way and offline coupled WRF v3.4 and CMAQ v5.0.2 over the contiguous US: performance evaluation and impacts of chemistry–meteorology feedbacks on air quality

The two-way coupled Weather Research and Forecasting and Community Multiscale Air Quality (WRF-CMAQ) model has been developed to more realistically represent the atmosphere by accounting for complex chemistry–meteorology feedbacks. In this study, we present a comparative analysis of two-way (with consideration of both aerosol direct and indirect effects) and offline coupled WRF v3.4 and CMAQ v5.0.2 over the contiguous US. Long-term (5 years from 2008 to 2012) simulations using WRF-CMAQ with both offline and two-way coupling modes are carried out with anthropogenic emissions based on multiple years of the U.S. National Emission Inventory and chemical initial and boundary conditions derived from an advanced Earth system model (i.e., a modified version of the Community Earth System Model/Community Atmospheric Model). The comprehensive model evaluations show that both two-way WRF-CMAQ and WRF-only simulations perform well for major meteorological variables such as temperature at 2 m, relative humidity at 2 m, wind speed at 10 m, precipitation (except for against the National Climatic Data Center data), and shortwave and longwave radiation. Both two-way and offline CMAQ also show good performance for ozone (O3) and fine particulate matter (PM2.5). Due to the consideration of aerosol direct and indirect effects, two-way WRF-CMAQ shows improved performance over offline coupled WRF and CMAQ in terms of spatiotemporal distributions and statistics, especially for radiation, cloud forcing, O3, sulfate, nitrate, ammonium, elemental carbon, tropospheric O3 residual, and column nitrogen dioxide (NO2). For example, the mean biases have been reduced by more than 10 W m−2 for shortwave radiation and cloud radiative forcing and by more than 2 ppb for max 8 h O3. However, relatively large biases still exist for cloud predictions, some PM2.5 species, and PM10 that warrant follow-up studies to better understand those issues. The impacts of chemistry–meteorological feedbacks are found to play important roles in affecting regional air quality in the US by reducing domain-average concentrations of carbon monoxide (CO), O3, nitrogen oxide (NOx), volatile organic compounds (VOCs), and PM2.5 by 3.1% (up to 27.8%), 4.2% (up to 16.2%), 6.6% (up to 50.9%), 5.8% (up to 46.6%), and 8.6% (up to 49.1%), respectively, mainly due to reduced radiation, temperature, and wind speed. The overall performance of the two-way coupled WRF-CMAQ model achieved in this work is generally good or satisfactory and the improved performance for two-way coupled WRF-CMAQ should be considered along with other factors in developing future model applications to inform policy making.


Introduction
The Community Multiscale Air Quality (CMAQ) modeling system developed by the U.S. Environmental Protection Agency (EPA) (Byun and Schere, 2006;Scheffe et al., 2016;San Joaquin Valley Air Pollution Control District, 2018;Pye et al., 2020;U.S. EPA, 2020) has been extensively used by both the scientific community and governmental agencies over various geographical regions and under different meteorological and air pollution conditions to address major key air quality issues such as atmospheric ozone (O 3 ), acid rain, regional haze, and trans-boundary or long-range transport of air pollutants during the past decades over North America (Zhang et al., 2009a, b;Hogrefe et al., 2015), Asia (Wang et al., 2009Liu et al., 2010;Zheng et al., 2015;Li et al., 2017;Xing et al., 2017;Yu et al., 2018;Mehmood et al., 2020), and Europe (Kukkonen et al., 2012;Mathur et al., 2017;Solazzo et al., 2017). The CMAQ model is traditionally driven offline by the three-dimensional meteorology fields generated separately from other meteorological models, such as the Weather Research and Forecasting (WRF) model, and the dynamic feedbacks of chemistry predictions on meteorology are neglected. However, more recently (IPCC, 2018), chemistry-meteorology feedbacks have been found to play important roles in affecting both global and regional climate change and air quality (Jacobson et al., 1996;Mathur et al., 1998;Ghan et al., 2001;Zhang, 2008;Zhang et al., , b, 2017Grell and Baklanov, 2011;Wong et al., 2012;Baklanov et al., 2014;Yu et al., 2014;Gan et al., 2015a;Wang et al., 2015a;Xing et al., 2015a, b;Yahya et al., 2015a, b;Hong et al., 2017;Jung et al., 2019). Feedbacks of aerosols on radiative transfer through aerosol-radiation interactions (i.e., aerosol direct forcing) and aerosol-cloud interactions (i.e., aerosol indirect forcing) are especially important (Zhang, 2008;Zhang et al., 2015a, b;Baklanov et al., 2014;Wang et al., 2015a;Yahya et al., 2015a, b). Recognizing this importance, as well as the recent advances in knowledge on chemistrymeteorology interactions and computational resources, the U.S. EPA developed a two-way coupled WRF-CMAQ model that accounts for the aerosol direct effect alone (Wong et al., 2012). This version of CMAQ has been applied for both regional and hemispheric studies Hogrefe et al., 2015;Xing et al., 2016Xing et al., , 2017Hong et al., 2017Hong et al., , 2020Sekiguchi et al., 2018;Yoo et al., 2019). For example, Xing et al. (2016) showed that aerosol direct feedbacks may further improve air quality resulting from emission controls in the US and also indicated that coupled models are key tools for quantifying such feedbacks. Reduction in atmospheric ventilation resulting from aerosol-induced surface cooling can exacerbate ground level air pollution. Hong et al. (2017) estimated an increase by 4.8%-9.5% in concentrations of major air pollutants over China in winter due to incorporation of such effects. Xing et al. (2017) reported that the aerosol direct effects could reduce daily max 1 h O 3 by up to 39 μg m −3 over China in January through reducing solar radiation and photolysis rates. Hong et al. (2020) found that the benefits of reduced pollutant emissions through weakening aerosol direct effects can largely offset the additional deaths caused by the warming effect of greenhouse gases over China. Some of those studies have also found that the missing aerosol indirect effects in WRF-CMAQ may introduce large model biases on their simulations of radiation and thus air quality Sekiguchi et al., 2018;Yoo et al., 2019). There has been a growing awareness that both aerosol effects should be considered together to provide greater fidelity in coupling complex atmospheric processes among chemistry, aerosols, cloud, radiation, and precipitation (Grell and Baklanov, 2011). To address this issue and better represent the one-atmosphere modeling capability of CMAQ, Yu et al. (2014) further extended the two-way coupled WRF-CMAQ model by including aerosol indirect effects and improved WRF-CMAQ's capability for predicting cloud and radiation variables.
Different from the traditional online integrated air quality models, such as the Gas, Aerosol, Transport, Radiation, General Circulation, and Mesoscale Meteorological (GATOR-GCMM) model (Jacobson, 2001), the WRF model coupled with chemistry (WRF/Chem; Grell et al., 2005), and the WRF model coupled with the Community Atmosphere Model version 5 (WRF-CAM5; Ma et al., 2013;Zhang et al., 2015aZhang et al., , b, 2017, in which atmospheric dynamics and chemistry are integrated and simulated altogether without an interface between meteorology and atmospheric chemistry (Zhang et al., 2013), two-way WRF-CMAQ (also referred to as the online access model) is created by combining existing meteorology (i.e., WRF) and atmospheric chemistry (i.e., CMAQ) models with an interactive interface (Yu et al., 2014). As pointed out by Yu et al. (2014), the main advantage of two-way CMAQ is to allow the existing numerical techniques to be used in both WRF and CMAQ to facilitate future independent development of both models while also maintaining CMAQ as a stand-alone model (the offline capability). In the past, a number of studies have compared and evaluated online vs. offline coupled model performance (Pleim et al., 2008;Matsui et al., 2009;Wilczak et al., 2009;Lin et al., 2010;Herwehe et al., 2011;Yu et al., 2011;Wong et al., 2012;Zhang et al., 2013Zhang et al., , 2016aChoi et al., 2019). However, due to the missing offline coupled mode or component for most online coupled models, many of those intercomparison studies are subject to some key limitations such as inconsistent model treatments in chemical options (Matsui et al., 2009;Lin et al., 2010;Zhang et al., 2013;Choi et al., 2019) or in both physical and chemical options (Wilczak et al., 2009;Herwehe et al., 2011;Zhang et al., 2016a), different domain projection methods or resolutions (Wilczak et al., 2009;Lin et al., 2010;Zhang et al., 2013), or disunified model inputs (Wilczak et al., 2009;Lin et al., 2010;Zhang et al., 2013). Due to the unique coupling approach, two-way WRF-CMAQ can be used to overcome those limitations and set up ideal intercomparisons between online and offline simulations using consistent model treatments (Pleim et al., 2008;Yu et al., 2011;Wong et al., 2012).
In this study, we provide a robust examination of model improvements by considering chemistry-meteorology feedbacks and their impacts on US air quality using the two-way WRF-CMAQ model (the same version as in Yu et al., 2014) with both aerosol direct and indirect effects. Long-term (5 years from 2008 to 2012) simulations using both two-way and offline coupled WRF and CMAQ models are carried out and compared (to the best of our knowledge) for the first time over the contiguous US (CONUS) with anthropogenic emissions based on multiple years of the U.S. National Emission Inventory (NEI) and chemical initial and boundary conditions (ICONs/BCONs) downscaled from the advanced Earth system model, i.e., an updated version of the Community Earth System Model/CAM5 (CESM/CAM5; He and Zhang, 2014;Glotfelty et al., 2017). Our objectives include (1) performing a comprehensive model evaluation for major meteorological variables and chemical species from this long-term application of the two-way coupled WRF-CMAQ and (2) conducting a comparative study of two-way and offline coupled WRF and CMAQ to examine the impacts of chemistry-meteorology interactions on US air quality.
Compared to previous studies in the literature, there are a few key features of this work. First, the intercomparisons between two-way (or online) and offline WRF-CMAQ are performed here using consistent model configurations including both physical and chemical options and inputs. Second, unlike a few previous intercomparison studies (Pleim et al., 2008;Yu et al., 2011;Wong et al., 2012) using two-way WRF-CMAQ with only aerosol direct effects for relatively short episodes, the model version in this work includes both aerosol direct and indirect effects and simulations are conducted for multiple years to provide more robust assessments. Third, compared to other studies (e.g., Yahya et al., 2015a, b;Choi et al., 2019) focusing on the impacts of chemistry-meteorology feedbacks on meteorology only or limited chemical species, this study performs comprehensive and extensive evaluation and comparison to demonstrate importance of chemistry-meteorology feedbacks on regional meteorology and air quality.
consists of 34 layers from the surface (~ 38 m) to 100 hPa (~ 15 km). The two-way coupled WRF-CMAQ includes estimations of aerosol optical properties based on prognostic aerosol size distributions and composition. These aerosol optical properties are then used to modulate the shortwave radiation budget estimated using the Rapid and accurate Radiative Transfer Model for General circulation (RRTMG) radiation scheme (Iacono et al., 2008) in WRF. Additionally, aerosol indirect effects, including the first (cloud albedo) and second (cloud lifetime) indirect aerosol forcing and the glaciation (ice-and mixed-phase cloud lifetime) indirect aerosol forcing are also modeled. More details on the model development of this version of WRF-CMAQ can be found in Yu et al. (2014). On the other hand, the WRF-only model calculates the radiation budgets by using prescribed aerosol optical properties such as aerosol optical depth, single-scattering albedo and asymmetry parameters and cloud formation by assuming default droplet number concentration and fixed cloud effective radius, which may not be representative for the large regions with complex air pollution conditions. Both the two-way and offline coupled WRF-CMAQ use the same model configurations as shown in Table S1 in the Supplement, except that prognostic aerosol impacts on radiation and clouds are fully treated in two-way WRF-CMAQ. The physics options include the RRTMG shortwave and longwave radiation schemes, the Asymmetric Convective Model (ACM2) planetary boundary layer (PBL) scheme (Pleim, 2007), the Pleim-Xiu (PX) land-surface scheme (Xiu and Pleim, 2001), the Morrison two-moment microphysics scheme (Morrison et al., 2009), and version 2 of the Kain-Fritsch (KF2) cumulus scheme (Kain, 2004). The chemical options include the Carbon Bond 2005 (CB05) chemical mechanism (Yarwood et al., 2005) with additional chloride chemistry (Sarwar et al., 2008), the sixth-generation CMAQ aerosol module (AERO6) (Appel et al., 2013), and CMAQ's aqueous-phase chemistry (AQCHEM). In addition, the time steps of dynamics and radiation for two-way WRF-CMAQ are set as 1 and 15 min, respectively, and the call frequency for CMAQ in the two-way coupled model is set to be 5 min.
The meteorological ICONs/BCONs are generated from the National Centers for Environmental Prediction Final Analysis (NCEP-FNL) datasets, and the chemical ICONs/ BCONs are downscaled from a modified version of CESMv1.2.2/CAM5 Glotfelty et al., 2017). The chemical ICONs/BCONs generated from CESM simulations consider the year-to-year variation. The CESM simulations have been comprehensively evaluated against surface data, remote sensing (including satellite) data, and reanalysis data for major meteorological and chemical variables over Europe, Asia, North America, and the globe. The results are also compared with other existing global model results and show generally satisfactory or superior performance. The anthropogenic emissions are based on two versions of NEI. NEI 2008 and NEI 2011 are used to cover the 5-year period, i.e., NEI 2008for 2008and NEI 2011-2012. Biogenic emissions are calculated online using the Biogenic Emissions Inventory System (BEIS) v3 (Schwede et al., 2005). The sea salt and dust emissions are also generated online by CMAQ's inline modules (Zender et al., 2003;Zhang et al., 2005). Two-way coupled WRF-CMAQ simulations are reinitialized every 5 d for meteorology fields only. We have conducted sensitivity simulations in the past (Wang et al., 2021) and found that a 5 d reinitialization frequency is more suitable to improve the overall simulation quality while preserving chemistry-meteorology feedbacks. The WRF-only simulations apply the same reinitialization method to make sure any deviation between two simulations is determined more by the feedback processes.
The model evaluation in this work mainly focuses on the long-term climatological type of performance in representative seasons (i.e., winter and summer) by comparing 5-year average spatially and temporally matched model predictions of major surface meteorological and radiation-cloud variables and surface and column chemical species against various surface and satellite observations and reanalysis data (the 5-year annual results can be found in the Supplement). A brief inter-annual comparison between observations and two-way CMAQ simulations is also performed for selected major meteorological and chemical variables to examine the model's capability in reproducing the year-to-year variations of those variables. The surface meteorological data include temperature at 2 m (T2), relative humidity at 2 m (RH2), wind speed at 10 m (WS10), and wind direction at 10 m (WD10) from the National Climatic Data Center (NCDC), and precipitation from the NCDC, the National Acid Deposition Program (NADP), the Global Precipitation Climatology Project (GPCP), the Parameter-elevation Regressions on Independent Slopes Model (PRISM), and the Tropical Rainfall Measuring Mission Multisatellite Precipitation Analysis (TMPA). The radiation and cloud data include downward shortwave radiation at the ground surface (SWDOWN), net shortwave radiation at the ground surface (GSW), downward longwave radiation at the ground surface (GLW), outgoing longwave radiation at the top of the atmosphere (OLR), and shortwave and longwave cloud forcing (SWCF and LWCF) from the Clouds and the Earth's Radiant Energy System (CERES); aerosol optical depth (AOD), cloud fraction (CF), cloud water path (CWP), and cloud optical thickness (COT) from the MODerate resolution Imaging Spectroradiometer (MODIS); and cloud droplet number concentration (CDNC) derived based on MODIS data by Bennartz (2007). The chemical data include surface O 3 from the Aerometric Information Retrieval System-Air Quality Subsystem (AIRS-AQS) and the Clean Air Status and Trends Network (CASTNET); surface particulate matter with 2.5 μm or less (PM 2.5 ) and its constituents including sulfate (SO 4 2 − ), nitrate (NO 3 − ), ammonium (NH 4 + ), elemental carbon (EC), organic carbon (OC), and total carbon (TC = EC+OC) from the Interagency Monitoring of Protected Visual Environments (IMPROVE) and the Chemical Speciation Network (CSN); surface particulate matter with diameters of 10 μm or less (PM 10 ) from the AQS; and column abundance variables such as column carbon monoxide (CO) from the Measurements of Pollution in the Troposphere (MOPITT), tropospheric ozone residual (TOR) from the Ozone Monitoring Instrument (OMI), and column nitrogen dioxide (NO 2 ) and formaldehyde (HCHO) from the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY).
The satellite datasets used in this study are all level 3 gridded monthly averaged data with various resolutions (i.e., 0.25° for OMI and PRISM; 0.5° for SCIAMACHY; 1° for CERES, GPCP, MODIS, and MOPITT). For the calculation of model performance statistics, the satellite data with different resolutions are mapped to CMAQ's Lambert conformal conic projection using bi-linear interpolation in the NCAR command language. CMAQ model outputs at approximate time of the satellite overpass are paired with the satellite retrievals to facilitate a consistent comparison. Note that only those grid points with valid satellite observations are considered when paring model results with observations, and the averaging kernels are not considered when analyzing the column CO and NO 2 results, which may introduce some uncertainties . Modeled CDNC is calculated as the average value of the layer of low-level warm clouds between 950 and 850 hPa as suggested by Bennartz (2007). Following the approach of Wielicki et al. (1996), the SWCF and LWCF are calculated as the difference between the clear-sky and the all-sky reflected radiation at the top of atmosphere for both simulations and observations.
The statistical performance evaluation follows a protocol similar to that of Zhang et al. (2006Zhang et al. ( , 2009a and Yahya et al. (2016) and uses well-accepted statistical measures such as correlation coefficient (R), mean bias (MB), root-mean-square error (RMSE), normalized mean biases (NMB), and normalized mean error (NME) (S. . Because of different sampling protocols among monitoring networks, the evaluation is conducted separately for individual networks for the same simulated variables and species. 3 Comprehensive model evaluation of two-way WRF-CMAQ 3.1 Meteorological evaluation 3.1.1 Surface meteorological variables- Figure 1 shows the spatial distribution of 5-year average MBs for T2, RH2, WS10, and hourly precipitation from two-way WRF-CMAQ against the NCDC data in winter and summer 2008-2012, and Tables 1 and 2 summarize the statistics for the same variables. Most variables except for precipitation show overall moderate to good spatial performance, with many sites showing MBs within ±1.0°C for T2, ±10% for RH2, ±1 m s −1 for WS10, and ±0.2 mm h −1 for precipitation in both seasons. WRF-CMAQ tends to overpredict T2 (i.e., warm bias) over widespread areas of domain, especially along the US Atlantic coast, the eastern and southeastern US, the central US, and the US Pacific coast in winter and underpredict T2 (i.e., cold bias) over the eastern US, the central US, and mountainous US in summer, which leads to an overall small warm bias in the whole year (see Fig. S1 in the Supplement). Similar warm biases of T2 in winter have been previously reported by Cohen et al. (2015) and are found to be associated with the relatively deep PBL depth using the non-local ACM2 PBL scheme. The relatively larger warm and cold biases over coastal and mountainous areas are likely due to the coarse grid spacing of 36 km that cannot resolve the complex topography well (Yahya et al., 2016). Compared to many previous WRF studies Brunner et al., 2015;Yahya et al., 2016), which typically show cold T2 biases, the overall small warm biases in this study can be attributed to the soil moisture nudging technique used in the PX land surface scheme (Pleim and Gilliam, 2009). The spatial patterns of MBs for RH2 show a general anti-correlation compared to T2 (i.e., RH2 is overpredicted where T2 is underpredicted and vice versa) due to the way RH2 is calculated based on T2. The spatial distribution of MBs for WS10 also shows dominant overpredictions in both winter and summer, especially along coastlines, indicating the prescribed sea surface temperature might not be sufficient to resolve the air-sea interactions. Systematic overpredictions of hourly precipitation against NCDC data in both seasons are found to be mainly caused by low non-convective precipitation events and can be attributed to the Morrison microphysics scheme (Yahya et al., 2016).
The precipitation performance is further examined by comparing WRF-CMAQ with TMPA and PRISM as shown in Fig. 2. The spatial distribution of precipitation is well simulated by WRF-CMAQ, especially over the CONUS, against observations by capturing the hot spots along the Pacific Northwest coast in winter and some areas over the central US and Florida in summer. Moderate overpredictions of precipitation against TMPA over the Atlantic Ocean and Gulf of Mexico in summer are also evident, possibly caused by overprediction of convective precipitation by the Kain-Fritsch scheme (Hong et al., 2017) over ocean. As shown in Tables 1 and 2 Tables 1 and 2 summarize the domain-average model performance statistics. WRF-CMAQ predicts the longwave radiation variables GLW and OLR very well with domain-average NMBs of −0.3% and 1.8% in winter and −3.6% and 0.9% in summer, respectively, and Rs of 0.96 to 0.99 for both. The shortwave radiation variables SWDOWN and GSW are slightly overpredicted on average with NMBs of 11.3% and 7.5% in winter and 17.1% and 15.1% in summer, respectively, and Rs ranging from 0.75 to 0.99 for both. The simulations also reliably reproduce the spatial distribution of both longwave and shortwave radiation compared to observations in both seasons. The relatively large overpredictions for shortwave radiation especially in summer are very likely caused by the large underpredictions of aerosol direct radiative forcing reflected from the underpredictions of AOD (Fig. 5), as well as underprediction of indirect cloud radiative forcing (see Fig. 8). It has been reported that WRF v3.4 does not treat the subgrid cloud feedback to radiation, which could also contribute to the overpredictions in shortwave radiation, especially in summer (Alapaty et al., 2012;Hong et al., 2017). The model largely underpredicts the magnitude of AOD in both seasons (NMBs of −59.8% in winter and −67.8% in summer), while providing a reasonable representation of the spatial distribution of AOD over the US, with generally higher values over the Midwest in winter and over the eastern US in summer. The model also underpredicts the elevated AODs over oceans and the northern part of domain in both seasons. Similar AOD underpredictions have been reported in previous studies over the US using two-way coupled WRF-CMAQ Hogrefe et al., 2015;Xing et al., 2015a). The relatively large underpredictions of AOD may be caused by several factors. First, underprediction of PM 2.5 concentrations, particularly SO 4 2 − in both seasons and OC in summer (Tables 3 and 4), can contribute significantly to the underprediction of AOD, especially over the eastern US. Second, the underestimation of dust emissions may contribute to missing hot spots from the model over arid areas in California and Arizona (Zender et al., 2003) and underestimates of sea salt emissions may lead to missing elevated AODs over oceans . Third, challenges in adequately representing prescribed and wildfire emissions in the NEI (Kelly et al., 2019) may cause many missing hot spots over large areas of the Pacific Northwest, California, Canada, and the eastern US, especially in summer. Fourth, uncertainties in BCONs of PM 2.5 concentrations may further contribute to underpredictions of AOD over oceans and the northern part of the domain. For example, Kaufman et al. (2001) found that the background AOD could reach 0.1 over the Pacific Northwest using Aerosol Robotic Network (AERONET) data. The AODs in the current simulation seem to be biased low (between 0.02-0.06 in both seasons over the Pacific Ocean) and indicate potential underpredictions of PM 2.5 BCONs, especially in the free troposphere. Finally, there are uncertainties associated with MODIS retrievals. Remer et al. (2005) found that the uncertainty of level 3 MODIS monthly AODs can be up to ±0.05 ± 0.15 AOD over the land due to clouds and surface reflectance. More AOD data from other satellites or AERONET might be considered in the future work to provide more robust ensemble type of evaluation for AOD. in summer, which is consistent with the performance reported in Yu et al. (2014). The model can reproduce the high CFs over the northern and northeastern parts of domain, as well as over oceans, while capturing the low CFs over the mountainous and plateau regions in the US and Mexico, especially in winter. In addition to the underprediction of PM 2.5 (thus underestimating CCN), the large underpredictions of cloud variables (especially CDNC and COT) can be attributed to uncertainties in aerosol microphysics schemes (Yahya et al., 2016), as well as missing aerosol indirect effects on subgrid convective clouds (Yu et al., 2014). Gantt et al. (2014) and Zhang et al. (2015b) also showed the aerosol activation scheme (i.e., Abdul-Razzak and Ghan, 2000) used in the current version of WRF-CMAQ may have underestimated CDNC and thus CWP and COT due to some missing processes such as insoluble aerosol adsorption and giant cloud condensation nuclei. Overall, the relatively poor model performance for cloud variables reflects current limitations in representing aerosol indirect effects and aerosol-cloud interactions in state-of-science online coupled models. Further model improvements that incorporate new knowledge from emerging studies should be conducted in the future.
As shown in Fig. 8, WRF-CMAQ predictions of SWCF and LWCF generally agree well with the satellite observations in both seasons. The model can capture the elevated SWCF and LWCF over the Atlantic Ocean and widespread areas over the eastern US in winter and those over the Pacific Northwest, northern part of the domain, and Atlantic Ocean in summer. The domain-average NMBs are −11.1% for SWCF and −15.1% for LWCF in winter and −41.3% for SWCF and −33.3% for LWCF in summer. The relatively larger biases in summer compared to winter are correlated with larger biases associated with radiation and cloud predictions potentially caused by larger underpredictions of aerosol predictions. As discussed earlier, the underpredictions of SWCF may partially contribute the overprediction of SWDOWN (more shortwave radiation reaching the ground), and those of LWCF may further lead to the overpredictions in OLR (more longwave radiation emitted into the space). The performance of SWCF and LWCF is consistent with the 12 km simulation reported in Yu et al. (2014) and even slightly better in terms of NMBs, which might be associated with the long-term vs. short-term simulations. It is also worth noting that SWCF (LWCF) is calculated as the difference between the clear-sky and all-sky shortwave (longwave) radiation at the top of the atmosphere, and thus performance for SWCF and LWCF depends on performance for both radiation and cloud properties. The generally better performance in terms of model bias for SWCF and LWCF compared to the cloud variables seems to be driven by the relatively good performance of shortwave and longwave radiation in the model. Figure 9a shows the spatial distribution of simulated average daily maximum 8 h O 3 in summer (2008-2012) from two-way WRF-CMAQ overlaid with observations from both the AIRS-AQS and CASTNET networks. WRF-CMAQ shows good performance by capturing the spatial distribution of max 8 h O 3 over widespread areas of the domain. The model tends to overpredict O 3 along coastlines in the southeastern US, the Gulf of Mexico, and US Pacific coast, which can be attributed to a poor representation of coastal boundary layers (Yu et al., 2007) and lack of O 3 sink via halogen chemistry (Sarwar et al., 2015) and deposition to water . The simulation also underpredicts O 3 in widespread areas in the Midwest, central US, and mountainous regions of the US, which is consistent with the results of 36 km simulations from  that used an earlier version of CMAQ v4.6 with the same CB05 gas-phase mechanism. In addition to cold T2 biases over those areas (Fig. 1), the underpredictions are also believed to be associated with inaccurate representations of precursor emissions and elevated/complex terrain due to the coarse grid spacing of 36 km over those regions.  found that their 12 km simulation showed improved performance over similar regions especially in summer. Figure 9c shows the monthly variation of domain-average 5-year average O 3 mixing ratios between observations from AIRS-AQS and simulations from two-way WRF-CMAQ, and Figure 9d shows the diurnal variation of domain-average 5-year average hourly O 3 mixing ratios between observations from CASTNET and simulations from two-way WRF-CMAQ for winter and summer. As shown in Fig. 9c, the O 3 mixing ratios are overpredicted throughout the year, which is consistent with overprediction of T2 (figure not shown). The largest overprediction occurs in the relatively cold months such as September to December. It is interesting that the observations show the largest monthly O 3 mixing ratios in spring and early summer while the simulation shows the peak during the summer. The difference in timing of peak O 3 between observations and simulations during the year might be associated with uncertainties in the BCONs of O 3 that reflect impacts of the long-range transport and associated stratosphere-troposphere exchange of O 3 . As shown in Fig. 9d, WRF-CMAQ tends to overpredict O 3 during most hours (i.e., 02:00-18:00 LT) in summer and throughout the whole day in winter partially due to the overprediction of T2, especially in winter (Fig. 1). The diurnal pattern of O 3 is captured much better during summer with much less prediction bias, especially during the nighttime, indicating that the model does a better job in predicting the evolution of nocturnal boundary layer and atmospheric chemistry in the warm season than the cold season. The overall overpredictions in this work are also consistent with previous studies Appel et al., 2007;, although our results show much better nighttime performance owing to the application of the ACM2 scheme that treats both local and non-local closure (Pleim, 2007). As also shown in Table  4, the domain-average NMBs and NMEs for max 8 h O 3 in summer are 10.6% and 13.2% against AIRS-AQS and −3.0% and 11.5% against CASTNET, respectively. The statistics are also consistent with previous studies using the CMAQ model (Zhang et al., 2009a;Appel et al., 2013Appel et al., , 2017Penrod et al., 2014) and can be considered to have good performance according to the criteria suggested by Zhang et al. (2013) and Emery et al. (2017).  Figure 10a and c shows the spatial distribution of simulated 5-year average PM 2.5 from two-way WRF-CMAQ overlaid with observations from both the CSN and IMPROVE networks in winter and summer, 2008-2012. As shown, WRF-CMAQ performs well for PM 2.5 over widespread areas of the Midwest and northeastern US in both seasons, while PM 2.5 is underpredicted over the southeastern and western US, especially in winter. The model also misses some hot spots of observed concentrations in the western US, which are mainly caused by TC underpredictions (Fig. S6) that are likely linked to poorly allocated and underestimated wildfire emissions in the NEI (Wiedinmyer et al., 2006;Roy et al., 2007;Kelly et al., 2019). The relatively large underpredictions over the eastern US are mainly caused by the combined effects from SO 4 2 − , NH 4 + , and TC. As shown in Fig. S6, WRF-CMAQ largely underpredicts SO 4 2 − in the Midwest and southeastern US mainly due to the underprediction of oxidants such as O 3 (see Fig. 9a) (which leads to less production from the gaseous oxidation), overprediction of precipitation (see Fig. 2) (which leads to more wet deposition and removal), and large underprediction of cloud fields (see Figs. 6-7) (which leads to less aqueous-phase formation), over the same area.

Aerosols-
On the other hand, NH 4 + and NO 3 − are either underpredicted or overpredicted, respectively, over the similar areas mainly due to underprediction of SO 4 2 − . According to the aerosol thermodynamics, when SO 4 2 − is underpredicted, NH 4 + tends to be underpredicted due to its major role as cation. More gaseous NH 3 will be available to neutralize NO 3 − , thus leading to overprediction of NO 3 − especially over the sulfate-poor regions (West et al., 1999).
Other potential reasons include the inaccurate assumptions in the thermodynamic module (for example, the internally mixed aerosol state and equilibrium assumption may not be representative over some regions and different time periods; S. , uncertainties in emissions of key species such as NH 3 and non-volatile cations that affect particle acidity (Mebust et al., 2003;Wang and Zhang, 2014;Vasilakos et al., 2018;Pye et al., 2020), and measurement errors, especially for NO 3 − and NH 4 + (X.-Y. Karydis et al., 2007;. TC underpredictions over most sites of the domain can be attributed to the underprediction of emissions (e.g., wildfire and primary OC) and underestimation of secondary organic aerosol (SOA) formation Pye et al., 2017) since EC (a chemically inert species) is overpredicted, which suggests that atmospheric mixing did not drive the TC underpredictions.
Figure 10e and f show the monthly variation of 5-year average PM 2.5 between observations from CSN and IMPROVE, respectively, and simulations from two-way WRF-CMAQ. constituents well with majority of data within the 1 : 2 ratio lines in both seasons.
Systematic underpredictions of SO 4 2 − and NH 4 + in winter and overpredictions of NO 3 − in summer are shown, which are consistent with their spatial distributions. Relatively large underpredictions and overpredictions of TC especially in winter compensate each other and lead to relatively low overall model biases. As also shown in Fig. S6, the model fails to reproduce high concentrations of PM 10 (those > 20 μg m −3 ) over widespread areas of the domain, especially over dust source areas in California, Arizona, and New Mexico. Hong et al. (2017) found the similar large underprediction of dust using CMAQ v5.0.2 over China and attributed it to a too-high threshold for friction velocity in the current dust module (Dong et al., 2016). Sea salt also seems to be underpredicted by WRF-CMAQ, although sea salt predictions are better than dust as shown along the coastlines. Figure 3 shows the bar charts of annual averaged observations and simulations for PM 2.5 over the CSN and IMPROVE sites. Overall, the model performs well for PM 2.5 for most of years and better over CSN than IMPROVE sites with general underpredictions in most years. The observations for both CSN and IMPROVE show a general decreasing trend, except for 2009 over CSN with a strong drop of PM 2.5 concentrations. According to EPA (2012), the strong drop of PM 2.5 in 2009 is due to a few reasons, including the many national and local regulations that are imposed, the contribution of economic slowdown to cleaner air conditions, and favorable meteorological conditions to lower air pollution levels in 2009. The impacts are more apparent over CSN sites mainly composed of urban and suburban areas than IMPROVE sites mainly composed of remote areas and national parks. Two-way WRF-CMAQ is able to reproduce the declining trend well particularly over IMPROVE sites and again demonstrate its capability in accurately simulating the year-to-year variations of not only meteorology but air quality.
As recommended by some previous studies Emery et al., 2017), generally ±15% and ±30% for model biases and 30% and 50% for model errors can be considered as good and acceptable performance. As shown in Tables 3  and 4 Figures 12 and 13 show the spatial distribution of 5-year average column abundances between various satellite products and two-way WRF-CMAQ for column CO, TOR, column NO 2 , and column HCHO in winter and summer 2012, and Tables 3 and 4 summarize the statistics. As shown, WRF-CMAQ can reproduce the spatial distribution of the column abundances of gases quite well in both seasons except for column HCHO in winter with Rs ranging from 0.70 to 0.87. TOR in both seasons, column NO 2 in winter and column HCHO in summer are also generally well predicted in terms of magnitudes with NMBs of 4.7% for TOR and 0.3% for NO 2 , respectively, in winter and −8.0% for TOR and 15.0% for HCHO, respectively, in summer. Systematic underpredictions for column CO occur in both seasons over the whole domain with NMBs of −20.5% in winter and −27.8% in summer for a few reasons. First, the BCONs of CO may be significantly underestimated from the CESM model. Using WRF/Chem or its variant, Zhang et al. (2016b found that the column CO performance could be greatly improved by adjusting the BCON using the satellite observation. A similar approach could be applied in future WRF-CMAQ simulations as well. Second, as pointed by Heald et al. (2003), the regional emissions, especially biomass burning, could be a significant source for elevated CO concentrations, and thus underestimation of these emissions could contribute to the CO underprediction. A more robust set of fire emissions from FINN generated by NCAR based on satellite retrievals has been applied to the similar time period recently but using the WRF-Chem model  and were found to improve the column CO performance. Finally, Emmons et al. (2009) showed positive biases (i.e., 19%) of MOPITT retrievals over the land when compared to in situ measurements, and the biases may have been increasing over time due to the MOPITT bias drift (e.g., 0.5% yr −1 for version 7 retrieval). The predicted TOR can capture the observed high values over the eastern US and oceans and the low values in elevated terrain, especially in summer, and it shows the best performance among all gas species. Both satellite observations and simulations can capture the elevated column NO 2 over the industrial and metropolitan areas in the domain where large nitrogen oxide (NO x ) emission sources are located, especially in winter. The model shows moderate underprediction with an NMB of −27.8% in summer, which can be attributed to both uncertainties in the emissions and satellite retrievals. For example, the lightning emissions of NO x are missing from this study, which have been found by previous studies (Allen et al., 2012) to contribute up to 2.0 × 10 15 molec. cm −2 over the southern US, the Gulf of Mexico, and northern Atlantic Ocean during the summer. Boersma et al. (2004) also found that different column NO 2 retrieval approaches may lead to large errors (> 25%) over polluted areas. Column HCHO over the CONUS, especially the southeastern US, is well predicted in summer in terms of both magnitude and spatial distribution and correlates well with the biogenic emission source regions. The underprediction of column HCHO in winter may indicate potential underestimation of anthropogenic emissions. Other reasons including potential low yield of HCHO from isoprene and terpene in the CB05 mechanism and uncertainties in satellite retrievals (Stavrakou et al., 2009;Lorente et al., 2017). For example, According to Stavrakou et al. (2009), the air mass factors used for HCHO column calculation may bear ~ 18% error under clear-sky conditions to ~ 50% error for very cloudy conditions. The winter typically has higher cloud cover than summer (see Figs. 6 and 7) and thus higher uncertainties for HCHO column.

Impacts of chemistry-meteorology feedbacks
In this section, the impacts of chemistry-meteorology feedbacks including aerosol direct and indirect effects on regional meteorology and air quality over the US are further examined by comparing results from two-way WRF-CMAQ and offline coupled WRF and CMAQ. Model performance from the two sets of simulations is first compared to demonstrate the potential performance improvements of the two-way model, and the impacts on regional meteorology and air quality are further investigated via the spatial difference plots for selected variables and species.

Meteorology
Figures 2 and 8 compare observations and simulations from the two-way WRF-CMAQ and WRF-only models for precipitation and SWCF/LWCF, respectively. Tables 1 and 2 also summarize the model performance statistics for all major meteorological variables for the two simulations. The statistics of some cloud variables from the WRF-only simulation are not available due to missing model outputs. Overall, good performance is evident for both simulations for surface meteorological variables with slightly better performance for most of variables (except for RH2 in both seasons and T2 in summer) for the two-way WRF-CMAQ simulation than the WRF-only simulation.  Yahya et al. (2015a, b) that showed similar improvements in meteorological and radiative variables when comparing predictions from WRF-Chem with those from WRF only. Since identical inputs and physics options are used in both simulations, the differences in performance for meteorological variables is due to the consideration of feedback processes among chemistry, aerosol, cloud, and radiation in the two-way coupled WRF-CMAQ simulation. Figure 15 shows the 5-year average difference plots of selected major meteorological variables including SWDOWN, T2, RH2, WS10, PBL height, and precipitation between two-way WRF-CMAQ and WRF-only in 2008-2012. As shown, the incoming shortwave radiation is reduced by up to 24.8 W m −2 (13.6%) with a domain average of 13.0 W m −2 (6%) due to the combined aerosol direct and indirect radiative effects over the domain. The reduction is predominant over the eastern US where both aerosol loading and cloud cover are high and over the oceans where cloud cover is high. The magnitude of shortwave radiation reduction in this work is consistent with other studies. For example, Wang et al. (2015a) found that the combined aerosol direct and indirect effects using the WRF/Chem model, which includes the sub-scale cloud forcing not treated in the current WRF-CMAQ model, may decrease the incoming shortwave radiation by 16.0 W m −2 in the summer over the US. Hogrefe et al. (2015) reported the reduction of shortwave radiation may reach up to 20 W m −2 over the eastern US by only considering the aerosol direct effect using an older version of WRF-CMAQ v5.0.1. Xing et al. (2015b) showed that the aerosol direct forcing may cause the surface shortwave radiation to decrease by up to 10 W m −2 over the eastern US over a decadal time period using WRF-CMAQ v5.0. The reduction of shortwave radiation further reduces the surface temperature by up to 0.25 °C over the eastern US, which is much larger than the reduction of 0.1 °C reported by Hogrefe et al. (2015), mainly due to the inclusion of aerosol indirect effects. However, there are smaller reductions in T2 over the Pacific Ocean and even increases (by up to 0.1 °C) over large areas of Atlantic Ocean and Gulf of Mexico where much larger reductions of shortwave radiation occur. As pointed by Wang et al. (2015a), due to the much larger heat capacity of ocean, the response of sea surface temperature is less sensitive to the change of shortwave radiation for ocean compared to the land. The large increase of incoming longwave radiation and latent heat (figures not shown) caused by the aerosol indirect effects and other complex feedback processes over the ocean compensates for the reduction of shortwave radiation, especially over the Atlantic Ocean and Gulf of Mexico, and thus leads to less reduction or even increases of T2. RH2 is found to mostly increase by 3.4% over the land, caused by the decrease of temperature while decrease by 2.6% over the ocean caused by either the increase of temperature or large decrease of water vapor. Over the land, the decreases in shortwave radiation and temperature along with the latent heat (figure not shown) lead to a more stable PBL and thus suppress the wind (by reducing the wind speed as shown). Over the ocean, the changes lead to a more unstable PBL and thus enhance the wind over the ocean. The wind speed and PBL height are reduced by up to 0.05 m s −1 and 25 m, respectively, over the US. The aerosol feedbacks on precipitation are also mixed with relatively large decreases by up to 0.4 mm d −1 over the US and increases by up to 0.4 mm d −1 over oceans. The suppression of precipitation over the land is mainly due to the formation of more small-sized CCNs caused by aerosol indirect effects and align well with areas with high aerosol loadings while the enhancement of precipitation, especially along coastlines and over oceans, might be associated with the larger CCN formation via more activated sea salt particles, as indicated by Zhang et al. (2010) and Wang et al. (2015a).

Air quality
Figures 9-11 compare observations and simulations from two-way WRF-CMAQ and offline CMAQ for O 3 , PM 2.5 , and PM 2.5 constituents. Tables 3 and 4 summarize the statistics for all major chemical variables for the two simulations. As shown in Fig. 9, two-way WRF-CMAQ shows better performance for both the monthly variation of O 3 (throughout the whole year) over AQS sites and the diurnal pattern of O 3 (especially during winter) over CASTNET sites due to better performance of T2 and radiation compared to offline WRF and CMAQ. As shown in Fig. 10, two-way WRF-CMAQ shows better spatial distribution of PM 2.5 in winter and similar one in summer and better performance for PM 2.5 for most of months over CSN sites and for cold seasons across IMPROVE sites compared to offline CMAQ. Figure 11 shows systematically better performance for SO 4 2 − , NO 3 − , NH 4 + , and TC with more data within 1 : 2 or closer to 1 : 1 ratio lines of scatterplots in both seasons. Overall, as shown in Tables 3 and 4, both simulations show generally good performance for all major chemical species except for PM 10 . For example, the domain-average NMBs are 10.6% (AQS) and −3.0% (CASTNET) vs. 14.2% (AQS) and 0.2% (CASTNET) for O 3 in summer, −7.2% (CSN) and 8.6% (IMPROVE) vs. 1.8% (CSN) and 23.7% (IMPROVE) for PM 2.5 in winter and −13.2% (CSN) and −26.9% (IMPROVE) vs. −14.0% (CSN) and −22.8% (IMPROVE) for PM 2.5 in summer for two-way WRF-CMAQ and offline coupled CMAQ, respectively. The two-way WRF-CMAQ shows better domain-wide statistics in terms of both correlation and biases for many variables including O 3 , SO 4 2 − , NO 3 − , and EC as well as TOR and column NO 2 in both seasons, apparently due to the treatment of chemistry-meteorology feedbacks. Offline CMAQ performs better for total PM 2.5 , especially in the western US due to higher dust emissions from higher wind speed and higher SOA due to stronger radiation and higher temperatures. However, more robust comparisons are needed in the future with improved dust emissions and the use of FINN wildfire emissions. Figure 16 shows the 5-year average difference plots of selected chemical variables including CO, O 3 , NO x , volatile organic compounds (VOCs), SO 4 2 − , SOA, PM 2.5 , and PM 10 between two-way WRF-CMAQ and offline coupled CMAQ. As shown, the CO mixing ratios decrease by up to 79.2 ppb (27.8%), especially over the western US, with a domain-average reduction of 3.0 ppb (3.1%) due to reduced formation of CO from the oxidation of VOCs caused by reduced solar radiation as indicated by Zhang et al. (2017). Such reductions seem to dominate over the increases caused by reduced PBL height, especially in the western US, where PBL height reductions are at a minimum. The O 3 mixing ratios decrease by up to 5.2 ppb (16.2%) with domain average of 1.7 ppb (4.2%) mainly due to the reduced solar radiation and T2. The change of O 3 is consistent with other studies such as Makar et al. (2015) and Wang et al. (2015a) that also reported lower O 3 mixing ratios caused by aerosol direct and indirect effects. On the other hand, both NO x and VOC mixing ratios increase over the eastern US, while they decrease over the western US. The increase should be caused by the combination of the large reduction of PBL mixing and reduced solar radiation, which reduces NO 2 photolysis and VOC oxidation to SOA. For aerosol species, SO 4 2 − concentrations increase by up to 0.38 μg m −3 (26.6%) especially over the eastern US. In fact, the decrease of O 3 mixing ratios caused by feedbacks is expected to reduce SO 4 2 − production via the gas-phase oxidation pathway due to the influence of O 3 on OH but increase SO 4 2 − production via the aqueous-phase chemistry pathway due to more clouds in the two-way WRF-CMAQ simulation. Thus, the net increase of SO 4 2 − is more dominated by the aqueous-phase chemistry instead of the gas-phase oxidation. This net increase of

Summary and conclusion
In this study, two sets of long-term simulations for 2008-2012 using the two-way coupled WRF-CMAQ and offline coupled WRF and CMAQ, respectively, are conducted, evaluated, and compared to investigate the performance improvements due to chemistry-meteorology feedbacks and impacts of those feedbacks on the reginal air quality in the US. First, the two-way coupled WRF-CMAQ simulation with both aerosol direct and indirect radiative forcing is comprehensively evaluated in both winter and summer seasons, and the annual trend is examined between observations and simulations for selected major variables. The results show that WRF-CMAQ performs well for major surface meteorological variables such as temperature at 2 m, relative humidity at 2 m, wind speed at 10 m, and precipitation with domain-average MBs of −1.1 to 1.1 °C, 2.2%-3.7%, 0.38-0.57 m s −1 , and 0.13-0.23 mm d −1 (except for 0.71-0.75 mm d −1 against NCDC), respectively, in winter and summer. The relatively large positive biases for precipitation are found to be more apparent when observed precipitation is low (dominated more by the non-convective precipitation) and are thus believed to be more associated with uncertainties in the Morrison microphysics scheme. The long-term simulation also shows generally good performance for major radiation and cloud radiative variables. Relatively large model biases still exist for cloud variables such as CDNC, COT, and CWP, indicating that the processes associated with aerosol indirect effects are still not well understood and an accurate simulation of those effects is still challenging using state-of-the-science models. WRF-CMAQ can also capture the observed year-to-year variations well for almost all the major meteorological and chemical variables. and overprediction of NO 3 − in the cold season. In addition to biases inherited from the meteorology, the model performance for chemistry also suffers from uncertainties associated with emissions, the use of a coarse spatial resolution, and representation of aerosol formation pathways in the model. For example, the relatively large biases for EC might be associated with poorly allocated anthropogenic or wildfire emissions, and those for OC might be due to underestimation of SOA formation in version 5.0.2 of CMAQ. WRF-CMAQ also predicts the column abundances of chemical species well and the relatively large model biases for CO are found to be associated with an underestimation of BCONs. The model better reproduces the observed number of exceedance days for O 3 than PM 2.5 mainly due to better performance for O 3 than PM 2.5 concentrations.
The performance comparison between two-way WRF-CMAQ and WRF-only simulations shows that two-way WRF-CMAQ model performs better for major surface meteorological, radiation, and cloud radiative variables due to the consideration of chemistry-meteorology feedbacks associated with aerosol direct and indirect forcing. In summary, the two-way coupled WRF-CMAQ modeling in this study shows generally satisfactory and consistent performance for the long-term prediction of regional meteorology and air quality when compared to other studies in the literature. Possible causes for the meteorological and chemical biases that were identified through this work can provide valuable information for future model development to improve the two-way coupled WRF-CMAQ model and those biases should also be considered when making future climate and air quality projections. Non-negligible model improvements for many major meteorological and chemical variables compared to the traditional application of offline coupled WRF and CMAQ suggest the importance of chemistry-meteorology feedbacks, especially aerosol direct and indirect effects. The feedbacks should be considered along with other factors in developing future model applications to inform policy making.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.               Spatial difference plots (two-way WRF-CMAQ-WRF-only) for major meteorological variables between two-way WRF-CMAQ and WRF-only in 2008-2012. Spatial difference plots (two-way WRF-CMAQ -offline CMAQ) for major chemical species between two-way WRF-CMAQ and offline CMAQ in 2008-2012. The 5-year average performance statistics for meteorological variables between two-way WRF-CMAQ and WRF-only simulations in summer 2008-2012.