Intercomparison of multiple two-way coupled meteorology and air quality models (WRF v4.1.1–CMAQ v5.3.1, WRF–Chem v4.1.1, and WRF v3.7.1–CHIMERE v2020r1) in eastern China

. Two-way coupled meteorology and air quality models, which account for aerosol–radiation–cloud interactions, have been employed to simulate meteorology and air quality more realistically. Although numerous related studies have been conducted, none have compared the performances of multiple two-way coupled models in simulating meteorology and air quality over eastern China. Thus, we


Introduction
Aerosols in the atmosphere due to anthropogenic and natural emissions cause air pollution but also have climate and meteorological impacts through the aerosol-radiation interaction (ARI) and aerosol-cloud interaction (ACI) (Carslaw et al., 2010;Rosenfeld et al., 2014;Fan et al., 2016;IPCC, 2021).The feedbacks of aerosols to meteorology have been widely investigated using two-way coupled meteorology and air quality models in the past 2 decades (Jacobson, 1994(Jacobson, , 1997(Jacobson, , 1998(Jacobson, , 2001(Jacobson, , 2002;;Grell et al., 2005;Wong et al., 2012;Wang et al., 2014;Zhou et al., 2016;Briant et al., 2017;Feng et al., 2021).In these models, two-way interactions between meteorology and aerosols are enabled by including all the processes involving ARI or/and ACI (Grell and Baklanov, 2011;Wang et al., 2014;Briant et al., 2017;Wang et al., C. Gao et al.: Intercomparison of multiple two-way meteorology and air quality models in eastern China 2021).The fundamental theories, modeling techniques, developments, and applications of two-way coupled meteorology and air quality models in North America, Europe, and Asia have been systemically reviewed (Zhang, 2008;Baklanov et al., 2014;Gao et al., 2022a).
As pointed out by these review papers, the treatments of and parameterization schemes for all the physiochemical processes involving ARI and ACI can be very different in two-way coupled models, so the simulation results from these models can vary in many aspects.At the same time, the configurations of coupled models, such as the meteorological and chemical initial and boundary conditions (ICs and BCs), horizontal and vertical resolutions, and emission inventories and processing tools, play important roles in model simulations.Model intercomparison projects targeting various twoway coupled meteorology and air quality models have been carried out.For example, the Air Quality Model Evaluation International Initiative Phase II focused on the performances of multiple two-way coupled models and the effects of aerosol feedbacks in Europe and the United States (Brunner et al., 2015;Im et al., 2015a, b;Makar et al., 2015a, b).In Asia, the Model Inter-Comparison Study for Asia Phase III was conducted to evaluate ozone (O 3 ) and other gaseous pollutants, fine particulate matter (PM 2.5 ), and acid and reactive nitrogen deposition using various models with/without ARI and/or ACI enabled (Li et al., 2019;Chen et al., 2019;Itahashi et al., 2020;Ge et al., 2020;Kong et al., 2020).With respect to this project, Gao et al. (2018Gao et al. ( , 2020) ) reviewed in detail the performances of seven two-way coupled models from different research groups in simulating a heavy air pollution episode during January 2010 in the North China Plain and how aerosol feedbacks affected simulations of meteorological variables and PM 2.5 concentrations.Targeting India, a heavily polluted region, Govardhan et al. (2016) compared the results for aerosol optical depth (AOD) and various aerosol species (black carbon, mineral dust, and sea salt) obtained using WRF-Chem (the Weather Research and Forecasting model coupled with chemistry, which includes ARI) with those obtained using the Spectral Radiation-Transport Model for Aerosol Species (which includes both ARI and ACI), although the models had different configurations.
So far, there have been no comprehensive comparisons of multiple coupled models using the same model configuration with respect to the high-aerosol-loading region over eastern China, which has experienced rapid growth in its economy, urbanization, and population as well as severe air quality problems in recent decades (He et al., 2002;Wang and Hao, 2012;Gao et al., 2017;Geng et al., 2021).In the eastern China region (ECR), several open-source and proprietary two-way coupled models have been applied to investigate the effects of ARI and/or ACI, but most such studies have focused on certain short-term episodes of heavy air pollution; no year-long simulations have been performed (Xing et al., 2017;Ding et al., 2019;Ma et al., 2021).The commonly used open-source models in ECR are WRF-Chem and WRF-CMAQ (Grell et al., 2005;Wong et al., 2012), and there has been no application of the two-way coupled WRF-CHIMERE model that has been applied to examine aerosolradiation-cloud interactions in Europe and Africa (Briant et al., 2017;Tuccella et al., 2019).At the same time, model simulations should be compared against not only surface measurement data but also satellite data (Zhao et al., 2017;Hong et al., 2017;Campbell et al., 2017;Wang et al., 2018).Even though the running time of an individual modeling system (e.g., WRF-CMAQ or WRF-CHIMERE) has been evaluated by considering online and offline versions of it and by applying various computing configurations (Wong et al., 2012;Briant et al., 2017), the computational efficiencies of multiple two-way coupled models need to be compared under the same computing conditions as well.
In this paper, a comparative evaluation of three opensource two-way coupled meteorology and air quality models (WRF-CMAQ, WRF-Chem, and WRF-CHIMERE) applied to the ECR is conducted.The remainder of the paper is organized as follows.Section 2 describes the study methods, including model configurations and evaluation protocols.Sections 3 and 4 present the analyses and intercomparisons of simulations using these three two-way coupled models with regard to meteorology and air quality, respectively.The major findings of this work are summarized in Sect. 5.

Model configurations and data sources
One-year simulations of meteorology and air quality in eastern China were examined using the two-way coupled WRF-CMAQ, WRF-Chem, and WRF-CHIMERE models with or without ARI and/or ACI.The horizontal grid resolution was 27 km (the east-west direction comprised 110, 120, and 120 grid cells and the north-south direction comprised 150, 160, and 170 grid cells for the WRF-CMAQ, WRF-Chem, and WRF-CHIMERE models, respectively).All three coupled models used in this study have 30 levels (i.e., 29 layers) from the surface to 100 hPa, with 11 layers in the bottom 1 km and a bottom-layer thickness of 23.2 m.The anthropogenic emissions in the Multiresolution Emission Inventory for China (MEIC) (Li et al., 2017) and biomass burning emissions (Wiedinmyer et al., 2011) in the Fire INventory from the University Corporation for Atmospheric Research, version 1.5 (FINN v1.5), were considered in our simulations, and their spatial, temporal, and species allocations were performed using Python (Wang et al., 2023).Biogenic emissions were calculated using the Model of Emissions of Gases and Aerosols from Nature, version 3.0 (MEGAN v3.0; Gao et al., 2019).Dust and sea salt emissions were used in the calculations of the inline modules (Table 1).The meteorological ICs and lateral BCs were derived from the National Center for Environmental Prediction Final Analysis (NCEP-FNL) datasets (http://rda.ucar.edu/datasets/ds083.2, last access: 20 October 2020) with a horizontal resolution of 1°× 1°at 6 h intervals for each of the three coupled models, and the flux in the model's top boundary was set to zero.To improve the long-term accuracy of the meteorological variables when utilizing the WRF model, we turned on the options for observational and grid four-dimensional data assimilation (FDDA), and pressure, station height, relative humidity, wind speed (WS), and wind direction were observed 4 times per day at 00:00, 06:00, 12:00, and 18:00 UTC at 2168 stations (https://doi.org/10.5281/zenodo.6975602;Gao et al., 2022b).Notably, turning on FDDA in two-way coupled models could dampen the simulated aerosol feedback (Wong et al., 2012;Forkel et al., 2012;Hogrefe et al., 2015;Zhang et al., 2016).To mitigate the effects of turning on FDDA on aerosol feedback in long-term simulations, we set the nudging coefficients of the u and v wind, temperature, and water vapor mixing ratio above the planetary boundary layer to 0.0001, 0.0001, and 0.00001 s −1 , respectively.The chemical ICs/lateral BCs were downscaled from the Whole Atmosphere Community Climate Model (WACCM) for WRF-CMAQ and WRF-Chem using the mozart2camx and mozbc tools, respectively.WRF-CHIMERE employed the climatology from a general circulation model developed at the Laboratoire de Météorologie Dynamique (LMDz), which was coupled to a global chemistry and aerosol model: INteractions between Chemistry and Aerosols (INCA; Mailler et al., 2017).For the model-top chemical BCs, the WRF-CMAQ and WRF-Chem models consider the impacts of stratosphere-troposphere O 3 exchange using O 3 -potential vorticity parameterization (Safieddine et al., 2014;Xing et al., 2016).Equivalent options were used in the two models in this study.WRF-CHIMERE employs the climatology from the LMDz-INCA data (Mailler et al., 2017).
Table 1 lists the options in the parameterization schemes for aerosol-radiation-cloud interactions.To maintain the consistency of the physical schemes, the same Rapid Radiative Transfer Model for General Circulation Models (RRTMG) shortwave (SW) and longwave (LW) radiation schemes and the Morrison microphysics scheme were adopted in the WRF-Chem and WRF-CMAQ models.WRF-CHIMERE applies the same radiation schemes as well as the Thompson microphysics scheme.Other schemes (cumulus, surface, and land surface) used in the WRF-CMAQ and WRF-Chem models were also selected following Gao et al.'s (2022a) widely utilized options outlined in Table S1 in the Supplement.The other schemes employed in WRF-CHIMERE are the same as those used in WRF-Chem.To consider the effects of clouds on radiative transfer calculations, the fractional cloud cover and cloud optical properties were included in the RRTMG SW/LW radiation schemes employed in the three coupled models (Xu and Randall, 1996;Iacono et al., 2008).The coupled WRF-CMAQ model with the Kain-Fritsch cumulus scheme included the impacts of the cumulus cloud fraction (CF) on RRTMG radiation (Alapaty et al., 2012), whereas the WRF-Chem and WRF-CHIMERE models with the Grell-Freitas cumulus scheme did not.In the Fast-JX photolysis scheme employed by the three coupled models, the impacts of clouds were included by considering the cloud cover and cloud optical properties.However, the calculations of the cloud cover and cloud optical properties differed in these models, and Table S1 presents the relevant information.Regarding the aerosol-size distribution, we used the modal approach with Aitken, accumulation, and coarse modes in WRF-CMAQ and the four-and 10-bin sectional approaches in the WRF-Chem and WRF-CHIMERE models, respectively (Binkowski and Roselle, 2003;Zaveri et al., 2008;Archer-Nicholls et al., 2014;Menut et al., 2013Menut et al., , 2016)).
To demonstrate the capabilities of the three two-way coupled models with/without aerosol feedbacks in simulating meteorology and air quality, we comprehensively evaluated the strengths and weaknesses of each coupled model and validated them against extensive ground-based and satellite measurements.The ground-based data included 572 hourly ground-based meteorological observations (air temperature (T2) and relative humidity (RH2) at 2 m above the surface, WS at 10 m above the surface (WS10), and precipitation (PREC); http://data.cma.cn, last access: 20 October 2020); 327 hourly national environmental observations (fine particulate matter (PM 2.5 ), ozone (O 3 ), nitrogen dioxide (NO 2 ), sulfur dioxide (SO 2 ), and carbon monoxide (CO); https://quotsoft.net/air,last access: 20 October 2020); 109 hourly surface SW radiation (SSR) measurements (Tang et al., 2019); and 74 radiosonde data retrieved two times per day, which were used to calculate the planetary boundary layer height at 08:00 and 20:00 local time (PBLH08 and PBLH12) (Guo et al., 2019).Figure 1 shows the locations of these data.Because there were no observed water vapor mixing ratio (Q2) data, this parameter was calculated by Q2 = RH2 w s , where RH2 is the relative humidity and w s is the saturation mixing ratio (Wallace and Hobbs, 2006).
The satellite data included the following: the monthly average downwelling SW/LW flux at the surface and SW/LW flux at the top of the atmosphere (TOA) obtained from the clouds and the Earth's radiant energy system (CERES) (https://ceres.larc.nasa.gov,last access: 20 October 2020); PREC from the Tropical Rainfall Measuring Mission (TRMM); CF, liquid-water path (LWP), and AOD from the Moderate Resolution Imaging Spectroradiometer (MODIS); the tropospheric NO 2 and SO 2 columns in the planetary boundary layer (PBL) from the Ozone Monitoring Instrument; the total CO column from the Measurements of Pollution in the Troposphere instrument (https://giovanni.gsfc.nasa.gov/giovanni,last access: 20 October 2020); total column ozone (TCO) from the Infrared Atmospheric Sounding Interferometer on Meteorological Operational Satellite A (IASI-METOP-A) (https://cds.climate.copernicus.eu/cdsapp/#!/dataset/satellite-ozone?tab=form, last access: 20 October 2020); and the total ammonia (NH 3 ) column from IASI-METOP-B (https://iasi.aeris-data.fr/nh3_iasi_a_https://doi.org/10.5194/gmd-17-2471-2024 Geosci.Model Dev., 17, 2471-2492, 2024 arch, last access: 20 October 2020).These data were downloaded and interpolated to the same horizontal resolution as the model results using the Rasterio library (Gillies et al., 2013).Thereafter, the model and observed values at each grid point were extracted.

Scenario setup
To comprehensively assess the performances of WRF v4.In all of the simulations performed in this study, a spinup time of 1 month was set up to reduce the influence of the initial conditions.Multiple statistical metrics, including the correlation coefficient (R), mean bias (MB), normalized mean bias (NMB), normalized gross error (NGE), and root mean square error (RMSE), were used to compare each scenario simulation and ground-based/satellite-borne observa-tions.The mathematical definitions of these metrics are provided in Sect.S3 in the Supplement.To compare the simulations obtained with the three coupled models, the respective model configurations of the physics and chemistry routines were set to be as consistent as possible.We systematically analyzed the annual and seasonal statistical metrics of the meteorological and air quality variables, which were simulated by the three two-way coupled models with/without aerosol feedbacks.Thereafter, we quantified the respective contributions of ARI and ACI effects to model performance.

Multimodel meteorological evaluations
This section presents the annual and seasonal (spring: March-May; summer: June-August; autumn: September-November; winter: December-February) statistical metrics of the simulated meteorological variables and air quality as well as a comparison of them with the ground-based and satellite observations.The running times of the eight simulation scenarios are also discussed.

Ground-based observations
Figures 2 and S1-S7 show the spatial distributions of R, MB, and RMSE for the hourly SSR, T2, Q2 (water vapor mixing ratio), RH2, WS10, PREC, PBLH08, and PBLH120 from WRF-CMAQ, WRF-Chem, and WRF-CHIMERE with-/without turning on aerosol feedback against ground-based observations from each site throughout 2017.The calculated annual model evaluation metrics for all sites in eastern China are summarized in Table S1, and the related seasonal R and MB values are presented in Fig. 3. Here, we mainly focus on comparisons of SSR, T2, RH2, and WS10.Further, Sect.S1.1 presents analyses of Q2, PREC, PBLH08, and PBLH20.
The accuracy of radiation prediction is of great significance in ARI evaluation.The annual and seasonal average simulated SSR data were compared with the groundbased observations (Figs.3-4 and Table S3), and the SSR over eastern China was found to be simulated very reasonably by the models, with R values of 0.61-0.78.The simulated results were overestimated at annual and seasonal scales (MBs in spring and summer were larger than those in autumn and winter).The overestimated annual SSR values were 19.98, 14.48, and 9.24 W m −2 for WRF-CMAQ, WRF-Chem, and WRF-CHIMERE, respectively.Brunner et al.'s (2015) comparative study also reported that most two-way coupled models overestimated the SSR for Europe and North America.Such overestimations could be caused by multiple factors, namely, the uncertainties in cloud development owing to PBL and convection parameterizations (Alapaty et al., 2012) and the diversity in the treatment of land-surface processes (Brunner et al., 2015), which tend to play more important roles than the enabling of two-way   were significantly smaller than those with ARI effects enabled.
Generally, the simulated magnitudes and temporal variations in the air temperature at 2 m above the ground exhibited high consistency with the observations (R = 0.88-0.97).The models tended to display a negative bias in the annual and seasonal T2, and the T2 underestimations in spring and winter were greater than those in summer and autumn (Figs. 3 and 4).Following Makar et al. (2015a), WRF-Chem and GEM-MACH produced negative MBs in summer and positive MBs in winter with ACI and ARI effects enabled; additionally, WRF-CMAQ with only ARI effects enabled produced negative MBs in summer over North America in 2010.Notably, Makar et al.'s (2015a) study lacked winter meteorology evaluations using WRF-CMAQ.The results of a comparison of MBs revealed the following order: WRF-CHIMERE > WRF-CMAQ > WRF-Chem.The annual and seasonal MBs of WRF-CMAQ and WRF-Chem were approximately −1°, whereas those of WRF-CHIMERE ranged from −2 to −1 °C.The RMSE values of WRF-CMAQ (2.71-3.05°C) and WRF-Chem (2.82-3.27°C) were almost equal.Those of WRF-CHIMERE (3.39-4.53°C) were larger at annual and seasonal scales.Notably, reduced underestimations of the annual and seasonal T2 in eastern China by the three coupled models were observed when ARI effects were enabled.With ACI effects enabled, the MBs for T2 simulated by WRF-Chem_BOTH did not change significantly compared with those of WRF-Chem_NO; additionally, compared with WRF-CHIMERE_NO, WRF-CHIMERE_BOTH showed greater underestimations of T2 for the full year (−1.30°C), spring (−0.12 °C), and winter (−0.40 °C).
Regarding RH2, the annual and seasonal simulations using WRF-CMAQ exhibited the highest correlation with the observed values, followed by WRF-Chem and WRF-CHIMERE, and the smallest correlation coefficients of the three models were observed in autumn (∼ 0.5).The spatial MBs between the simulations performed using the three models and observations displayed a general converse trend compared with T2 (i.e., RH2 was overestimated where T2 was underestimated and vice versa).This can be explained by the calculation of RH2 based on T2 in the models (Wang et al., 2021).The annual and seasonal MBs were 0.65 %-71.03 % and −21.30% to 60.00 %, respectively (Fig. 4 and Table S3); only WRF-Chem produced negative MBs in the summer.The magnitude of RMSE exhibited an inverse pattern compared with R for the three models, with the maximum (28.48 %-29.52 %) and minimum (12.57%-16.07 %) values observed in autumn and summer, respectively.S3 show that WRF-CMAQ_ARI further reduced the overestimations of the annual and seasonal RH2 in the ECR, whereas WRF-Chem_ARI (except for summer) and WRF-CHIMERE_ARI displayed the opposite trend.Moreover, the variations in the annual and seasonal RH2 MBs simulated by WRF-Chem_BOTH and WRF-CHIMERE_BOTH were further reduced compared with those simulated by WRF-Chem_ARI (except for summer) and WRF-CHIMERE_ARI, respectively.Furthermore, similar analyses were performed for WS10, and the results revealed that WRF-CMAQ performed better in capturing the WS10 patterns than WRF-Chem and WRF-CHIMERE.The R values for all three models ranged from 0.47 to 0.60; WRF-CMAQ and WRF-Chem overestimated WS by ∼ 0.5 m s −1 , whereas WRF-CHIMERE overestimated it by ∼ 1.0 m s −1 (Table S3 and Figs. 3-4).The overestimation of WS10 under real-world low-wind conditions is a common phenomenon in existing weather models, and it is mainly caused by outdated geographic data, a coarse model resolution, and a lack of a good physical representation of the urban canopy (Gao et al., 2015(Gao et al., , 2018)).The three models exhibited lower correlations (0.31-0.54) and MBs (0.20-0.86 m s −1 ) in summer compared with the other seasons, and the RMSEs were ∼ 2.0 m s −1 .Enabling ARI effects mitigated the overestimations of the three models, particularly WRF-CMAQ_ARI.
To determine and quantify how well our results compared with those of the extant studies using two-way coupled models, we compared our study with previous ones in terms of the evaluation results for meteorology and air quality.We discuss meteorology and air quality in this section and Sect.4.1, respectively.We employed box-and-whisker plots, and the 5th, 25th, 75th, and 95th percentiles were used as statistical indicators.In the plots, the dashed lines in the boxes represent the mean values and the circles represent outliers.Previous studies mainly used WRF-Chem and WRF-CMAQ to evaluate meteorology and air quality, whereas WRF coupled with the nested air quality prediction modeling system (WRF-NAQPMS) and the global-regional assimilation and prediction system coupled with the Chinese Unified Atmospheric Chemistry Environment forecasting system (GRAPES-CUACE) barely had application potential.As mentioned in Sect. 1, previous investigations of meteorology and air quality using WRF-CHIMERE with/without aerosol feedbacks have not been conducted for the ECR.Therefore, only the evaluation results involving WRF-Chem and WRF-CMAQ were analyzed to study aerosol feedbacks.
Figure S8 shows a comparison between the statistical metrics for T2, RH2, Q2, and WS10 in this study and the evaluation results of previous studies.Based on the number of samples in the statistical metrics of each meteorological variable, most previous studies mainly involved the simulation and evaluation of T2, WS10, and RH2, with relatively few studies focusing on Q2.Compared with the evaluation results of the extant studies, the ranges of our statistical metrics were roughly similar, although there were some notable differences.The R values of the WRF-CMAQ and WRF-Chem models in our study were higher than those of the previous studies; the MBs of T2 simulated by WRF-CMAQ were smaller, whereas those of T2 simulated by WRF-Chem were larger; and the RMSEs of the WRF-CMAQ simulation   were larger, whereas those of the WRF-Chem simulation were smaller.For RH2, the R values of WRF-CMAQ and WRF-Chem were larger than the average level of the previous studies, whereas the MBs and RMSEs for WRF-CMAQ were larger.Those for WRF-Chem were smaller than the average reported in previous studies.For Q2, the model performance of WRF-CMAQ in this study was generally better than the average level reported in previous studies, although the R value between the WRF-Chem simulation results and observed values was higher (and the MB and RMSE were lower) than the average level reported in previous studies.
We also conclude that the simulation results of WRF-CMAQ and WRF-Chem reproduced the variations in WS10 better than the simulations reported by previous studies did.

Satellite-borne observations
To further evaluate the performances of WRF-CMAQ, WRF-Chem, and WRF-CHIMERE against the satellite observations, we analyzed the annual and seasonal statistical metrics for SW and LW radiation at the surface, PREC, cloud cover, and LWP simulated by the three coupled models with and without aerosol feedbacks by comparing the simulations with the satellite-borne observations ( As shown in Table 3 and Fig. 5, the three coupled models exhibited relatively poor performances for SSR, with annual MBs of 8.21-30.74W m −2 and correlations of 0.61-0.78.A similar poor performance for SW radiation in the United States when using the coupled WRF-CMAQ and offline WRF models was also reported (Wang et al., 2021).The overall seasonal characteristics of SSR were reproduced by the three coupled models (Fig. S10 in the Supplement).Concurrently, regardless of whether aerosol feedbacks were enabled or not, the three models overestimated the seasonal SSR (except WRF-Chem_ARI in winter), obtaining higher MBs in spring and summer than in autumn and winter.The seasonal SSR overestimations might be directly due to the underestimation of the calculated AOD when examining the ARI effects (Wang et al., 2021).Compared with SSR, the three coupled models predicted the surface LW radiation variables (SLR) well (R values were up to 0.99), with annual domain-average MBs of −9.97 to −6.05 W m −2 .Furthermore, significant seasonal differences were observed in the LW radiation simulated by the three coupled models: the WRF-CMAQ and WRF-CHIMERE scenarios yielded underestimations, with maximum and minimum SLR values occurring in winter and summer, respectively, whereas the maximum underestimations of WRF-Chem were recorded in autumn, particularly for WRF-Chem_BOTH (Fig. S9).
As the three coupled models adopted the same grid resolution (27 × 27 km) as well as SW and LW radiation schemes (RRTMG), the above analysis demonstrated that the configuration differences among the aerosol components, size distributions, and mechanisms contributed to the diverse seasonal MBs (Tables 1 and S2).Moreover, the three two-way coupled models with ARI feedbacks effectively improved the performances of annual and seasonal SSR; however, for SLR, the performance improvements were much more variable across the three coupled models and different scenarios with and without ARI and/or ACI feedbacks enabled (Table S4).When ARI effects were enabled, the diverse refractive indices of the aerosol species groups caused discrepancies in the aerosol optical properties in different SW and LW bands in the RRTMG SW/LW radiation schemes of WRF-CMAQ, WRF-Chem, and WRF-CHIMERE calculated online (Tables S5-S6).The cloud optical properties induced by aerosol absorption in the RRTMG radiation schemes and calculated online differed regarding their treatments of the aerosol species groups in the three coupled models.With ACI effects enabled, the activation of cloud droplets from aerosols based on the Köhler theory was considered in WRF-Chem and WRF-CHIMERE, and these simulations were compared with those without aerosol feedbacks (Table S7).The treatments of prognostic ice-nucleating particles (INPs) formed via the heterogeneous nucleation of dust particles (diameters >0.5 µm) and the homogeneous freezing of hygroscopic aerosols (diameters >0.1 µm) were only investigated in WRF-CHIMERE, whereas prognostic INPs were not included in WRF-CMAQ and WRF-Chem.These discrepancies eventually contributed to the differences in the simulated radiation changes caused by aerosols.
According to the Intergovernmental Panel on Climate Change reports (IPCC, 2007(IPCC, , 2021)), the effects of aerosol feedbacks (particularly with ACI effects enabled) on PREC and cloud processes remained unclear.In this study, we further assessed the annual and seasonal simulated PREC, cloud cover, and LWP in the regions of the ECR with high aerosol loadings against the satellite observations (Table 3 and Figs.S12-S14) to provide new insights into enabling online feedbacks in two-way coupled modeling simulations from a yearly perspective.
The results indicated that PREC simulated by WRF-CMAQ (0.51-0.89) exhibited higher correlations than PREC simulated by WRF-Chem (0.61-0.73) and WRF-CHIMERE (0.54-0.70).WRF-CMAQ demonstrated the best correlation in winter, whereas WRF-Chem and WRF-CHIMERE had the best correlations in spring; the three models presented their worst correlations in summer, as the numerical models struggled to effectively capture enhanced convective activities in summer.Huang and Gao (2018) also revealed that accurate representations of lateral boundaries were crucial to improving PREC simulations in China during summer.WRF-CMAQ underestimated annual PREC, with MBs of −76.49 to −51.93 mm, whereas WRF-Chem and WRF-CHIMERE produced large PREC overestimations ranging from +108.04 to +207.05 mm (Table 3), particularly in southern China regions (Fig. S11).WRF-CMAQ also produced negative biases (−27.89 to +42.08 mm) for seasonal PREC, except for WRF-CMAQ_ARI in winter.WRF-Chem and WRF-CHIMERE only underestimated seasonal PREC in autumn (−31.39 to −26.89 mm) and winter (−7.12 to −4.43 mm), respectively (Fig. S12).The variations in the annual and seasonal MBs of PREC were consistent with the changes in CF and LWP (Zhang et al., 2016), and these changes will be discussed in detail below.
By considering aerosol feedbacks, the ARI-induced decreases in the annual MB of PREC for WRF-CMAQ, WRF-Chem, and WRF-CHIMERE were 24.56, 12.11, and 4.70 mm, respectively.WRF-Chem_BOTH (24.9 mm) and WRF-CHIMERE_BOTH (3.41 mm) showed greater overestimation of annual PREC compared with WRF-Chem_ARI and WRF-CHIMERE_ARI, respectively.Significant increases (+53.15mm) and decreases (−6.3 to −3.41 mm) in MBs were facilitated by WRF-CMAQ and the other two models with ARI effects enabled compared with those without feedbacks, respectively.WRF-Chem and WRF-CHIMERE with ARI and ACI effects enabled produced larger MB enhancements (+3.54 to +7.46 mm) at the seasonal scale (Fig. S12).Notably, the discrepancies in simulated PREC were mainly attributable to the selection of different microphysics and cumulus schemes in WRF-CMAQ (Morrison and Kain-Fritsch), WRF-Chem (Morrison and Grell-Freitas), and WRF-CHIMERE (Thompson and Grell-Freitas).
CF and LWP can significantly influence the spatiotemporal distributions of PREC; our simulated results for annual and seasonal CFs in the ECR are presented in Table 3 and Fig. S13.Overall, WRF-CMAQ performed the best at simulating CF.The R values of WRF-Chem during summer (0.69) and winter (0.70) were larger than those of WRF-CMAQ (0.59 and 0.64) and WRF-CHIMERE (0.56 and 0.66), whereas WRF-CMAQ and WRF-CHIMERE obtained better simulation results in winter and autumn, with correlations of up to 0.89 and 0.67, respectively.The three coupled models underestimated annual and seasonal CFs, with MBs of −16.83 % to −6.18 % and −21.13 % to −4.13 %, respectively; these results were consistent with those of previous two-way coupled modeling studies using WRF-CMAQ (−19.7 %) and WRF-Chem (−32 % to −9 %) in China (Hong et al., 2017;Zhao et al., 2017).The models simulated the annual LWP in the ECR reasonably well, with R values of >0.55 and negative biases varying from −57.36 to −31.29 g m −2 .These underestimations were closely related to the lack of cloud homogeneity (Wang et al., 2015;Dionne et al., 2020) and the excessive conversion of liquid water to ice in the selected cloud microphysics schemes (Klein et al., 2009).As shown in Fig. S14, the models performed best when simulating LWP in spring (R = 0.51-0.79),and their highest underestimations were observed in https://doi.org/10.5194/gmd-17-2471-2024 Geosci.Model Dev., 17, 2471-2492, 2024 winter (MBs = −54.82 to −40.89 g m −2 ), except for WRF-Chem, which obtained its maximum bias in autumn.
In relation to quantitatively determining the impacts of aerosol feedbacks on CF and LWP, the simulated scenarios revealed that WRF-CMAQ_ARI overwhelmingly decreased the annual and seasonal underestimations of CF (0.48 %-1.05 %) and LWP (3.03-4.29 g m −2 ), whereas WRF-Chem_ARI and WRF-CHIMERE_ARI slightly increased the underestimations (CF: 0.02 %-0.39 %; LWP: 0.03-0.58g m −2 ).Compared with WRF-CHIMERE_ARI, WRF-CHIMERE_BOTH produced larger variations in the annual and seasonal MBs of CF (0.23 %-0.93 %) and LWP (−2.96 to 7.38 g m −2 ).WRF-Chem_BOTH and WRF-Chem_ARI exhibited equivalent variations (CF: 0.03 %-0.71 %; LWP: 0.02-2.89g m −2 ).This could be explained by the different parameterization treatments of the cloud droplet number concentration (CDNC) simulated by the three coupled models with/without ACI effects enabled.The cloud condensation nuclei (CCN) activated by the aerosol particles can increase CDNC and impact LWP and CF.Without enabling any aerosol feedbacks or when only enabling ARI, CDNC is, by default, prescribed as a constant value of 250 cm −3 in the Morrison schemes of WRF-CMAQ and WRF-Chem and 300 cm −3 in the Thompson schemes of WRF-CHIMERE.When only ACI or both ARI and ACI effects are enabled, the prognostic CDNC is calculated online in the two-way coupled WRF-Chem and WRF-CHIMERE models when cloud maximum supersaturation is greater than aerosol critical supersaturation (Abdul-Razzak and Ghan, 2002; Chapman et al., 2009;Tuccella et al., 2019).Although we have obtained preliminary quantitative results for ACI effects on regional PREC, CF, and LWP, we acknowledge that several limitations still exist regarding the representation of ACI effects in state-of-the-art two-way coupled models.These limitations include a lack of consideration of the responses of convective clouds to ACI (Tuccella et al., 2019), numerical descriptions of giant CCN (Wang et al., 2021), and heterogeneous ice nuclei (Keita et al., 2020).

Multimodel air-quality evaluations
Similar to meteorology, to further determine the quantitative effects of enabling aerosol feedbacks on the simulation accuracy of the air-quality variables in the ECR, ground-based and satellite-borne observations were adopted for comparisons in the following evaluation analysis.The usage of computing resources for each simulation process was also assessed (Sect.4.3).

Ground-based observations
Table 4 and Fig. 7 present the statistical metrics of the annual and seasonal air pollutant concentrations (PM 2.5 , O 3 , NO 2 , SO 2 , and CO) simulated by the three coupled models.The comparisons between the surface measurements and simulations of PM 2.5 and O 3 are presented below, and performance assessments of the other gaseous pollutants are presented in Sect. 2 of the Supplement.
The R values of the annual PM 2.5 concentrations simulated by WRF-CMAQ (0.68) were the highest, followed by those obtained by WRF-Chem (0.65-0.68) and WRF-CHIMERE (0.52-0.53).The three models exhibited higher correlations in winter than in the other seasons (Fig. 7).Table 4 and Figs.6-7 reveal that WRF-CMAQ underestimated the annual and seasonal (except for autumn) PM 2.5 concentrations, with NMBs of −9.78 % to −6.39 % and −17.68 % to +5.17 %, respectively.WRF-Chem overestimated and underestimated PM 2.5 at annual and seasonal scales, with the related NMBs varying from −39.11 % to +24.72 %.Concurrently, WRF-CHIMERE excessively overestimated the annual and seasonal PM 2.5 concentrations (NMB: +19.51 % to +75.47 %).These biases could be related to the different aerosol and gas-phase mechanisms, dust and sea salt emission schemes, chemical ICs and BCs, and aerosol-sizedistribution treatments applied to the three two-way coupled models.Based on the NMB differences between the simulations with ARI and those without aerosol feedbacks, the ARI-induced annual and seasonal NMB variations in WRF-CMAQ_ARI and WRF-Chem_ARI ranged from +3.01 % to +4.21 % and from +3.07 % to +5.02 %, respectively, indicating that enabling ARI feedbacks slightly reduced the annual and seasonal (except for autumn) underestimations of PM 2.5 concentration.Notably, WRF-CHIMERE_ARI further overestimated the annual and seasonal PM 2.5 concentrations, with an NMB increase of up to 10.04 %.The increases in the PM 2.5 concentrations due to ARI effects were attributable to the synergetic decreases in SSR, T2, WS10, and PBLH as well as increases in RH2.With ACI feedbacks also enabled, WRF-Chem_BOTH greatly underestimated the annual and seasonal PM 2.5 , with NMBs varying from −24.15 % to −14.44 %, compared with WRF-Chem_ARI.WRF-CHIMERE_BOTH tended to decrease (−2.1 % to −0.51 %) the annual and autumn-winter NMBs and increase (+0.35 % to +3.04 %) the spring-summer ones.A further comparison of the ARI-and ACI-induced NMB variations demonstrated that the ARI-induced variations in the PM 2.5 concentrations were smaller than the ACI-induced ones in WRF-Chem and that the reverse pattern was seen with WRF-CHIMERE.This might be explained by the incorporation of dust aerosols in WRF-CHIMERE serving as ice nuclei (IN), which was not included in WRF-Chem in this study.
For O 3 , WRF-CHIMERE (R = 0.62) exhibited the highest correlation, followed by WRF-CMAQ (R = 0.55) and WRF-Chem (R = 0.45) (Table 4 and Fig. S16).WRF-CMAQ slightly underestimated the annual O 3 concentration, with NMBs and NGEs of −12.57% to −11.52 %; conversely, WRF-Chem and WRF-CHIMERE significantly overestimated it, with NMBs of 47.82 %-48.10 % and 29.46 %-29.75 %, respectively.The seasonal results for the statistical metrics displayed patterns consistent with the annual simulations, and the O 3 pollution levels in summer were better simulated than those in the other seasons (Fig. 6).The models with ARI feedbacks enabled slightly decreased the annual and seasonal O 3 NMBs and NGEs, which ranged from −3.02 % to +0.85 % (the only positive value of +0.85 % was produced by WRF-CMAQ in summer) and from −1.42 % to −0.75 %, respectively.Concurrently, regarding ACI effects, WRF-Chem and WRF-CHIMERE exhibited increased annual O 3 NMBs and NGEs of 0.12 %-0.65 % and 0.40 %-0.55 %, respectively.The ACI-induced seasonal NMB variations for WRF-Chem differed from those for WRF-CHIMERE; WRF-Chem increased and decreased in spring-summer and autumn-winter, respectively, whereas WRF-CHIMERE increased in all seasons except winter (Fig. 7).Such diverse NMB and NGE variations can be explained by two model differences.For the modeltop BCs, the WRF-CMAQ and WRF-Chem models employed the parameterization scheme of O 3 -potential vorticity, whereas WRF-CHIMERE employed the climatological data from LMDz-INCA.Regarding the gas-phase chemistry mechanisms, the three coupled models incorporated various photolytic reactions; this is discussed more comprehensively in Sect.4.2.Knote et al. (2015) comprehensively assessed the effects of seven gas-phase chemical mechanisms (the second generation Regional Acid Deposition Model, RADM2; the Regional Acid Deposition Model Version Karlsruhe, RADMKA; the Regional  to atmospheric dynamics and stability (as mentioned in the PBLH evaluation part of Sect.S1.1 in the Supplement) as well as photochemistry and heterogeneous reactions; thus, they eventually influenced O 3 formation (Xing et al., 2017;Qu et al., 2021;Zhu et al., 2021).Similar to the meteorological variables presented above, we conducted quality assurance for the statistical metrics via further comparisons with the PM 2.5 and O 3 results from previous model evaluations (Fig. S20).In this study, the performances of WRF-CMAQ and WRF-Chem in simulating PM 2.5 were better than the average levels reported by the previous studies of the ECR.Regarding the simulation of the O 3 level, WRF-Chem performed worse compared with the average level reported by the previous studies.Although the R values of O 3 simulated by WRF-CMAQ in this study were lower than the average level reported in the previous studies, our RMSEs were smaller.

Satellite-borne observations
In this section, we further investigated the discrepancies among the different models regarding the calculated AOD and column concentrations of the gases (O 3 , NO 2 , SO 2 , CO, and NH 3 ) and compared them with various satellite observations.As WRF-CHIMERE was not set to output simulated NH 3 concentrations, the discussion here only includes the results from the WRF-CMAQ and WRF-Chem models.
Regarding the seasonal variations, we observed relatively high correlations (0.71-0.88) for AOD in autumn, with lower values (0.53-0.84) in the other seasons (Fig. 8).WRF-CMAQ and WRF-Chem tended to underestimate (MBs of −0.1 to −0.4) and overestimate (MBs of 0.01-0.05)AOD in summer and the other seasons, respectively.WRF-CHIMERE exhibited positive (0.03-0.04) and negative (−0.10 to −0.01) biases in winter and the other seasons, respectively.Regarding TCO (Fig. S24), the performances of the WRF-CMAQ and WRF-Chem models in spring and winter were slightly better than their performances in sum-   S23).Regarding NH 3 , the only primary alkaline gas in the atmosphere, the WRF-CMAQ and WRF-Chem models performed better in summer (R: 0.81-0.87;MB: −3.42 to 2.07 × 10 15 molecules cm −2 ) (Fig. S25).The NH 3 emissions from fertilizers and livestock have been substantially underestimated in China (Zhang et al., 2017), and the peak values were obtained in spring and summer (Huang et al., 2012).Additionally, bidirectional exchanges of fertilizer-induced NH 3 were not considered in our simulations.Moreover, after comparing the performances of the models for each pollutant described in Sect.4.1 and 4.2, the only disparity found between evaluations with ground-based observations and those with satellite-borne observations was for CO.The formation of CO via the oxidation of methane, an important source of CO emissions (Stein et al., 2014), was not considered in the three coupled models, and methane emissions were not included in the MEIC inventory.Furthermore, the contribution of CO to the atmospheric oxidation capacity (OH radicals) was nonnegligible; e.g., the values were approximately 20.54 %-38.97 % in Beijing (Liu et al., 2021) and 26 %-31 % in Shanghai (Zhu et al., 2020).In ad- dition, these discrepancies in model performance when simulating AOD and column concentrations of gases can be explained by differences in the representations of the aerosol species groups, the Fast-JX photolysis scheme, and the gasphase mechanisms in the three coupled models.More detailed interpretations were grouped into four aspects: (1) AODs are calculated via the Mie theory using the refractive indices of different numbers of aerosol species groups (5, 6, and 10) in different coupled models (WRF-CMAQ, WRF-Chem, and WRF-CHIMERE) (Tables S5-S6); (2) seven (294.6, 303.2, 310.0, 316.4, 333.1, 382.0, and 607.7 nm), four (300, 400, 600, and 999 nm), and five (200, 300, 400, 600, and 999 nm) effective wavelengths were used to calculate the actinic fluxes and photolysis rates in the Fast-JX photolysis modules of WRF-CMAQ, WRF-Chem, and WRF-CHIMERE, respectively; (3) there are different methods in the Fast-JX schemes of the three coupled models for calculating the aerosol and cloud optical properties (Tables S1  and S5-S6); and (4) 77, 52, and 40 gas-phase species comprised 218, 132, and 120 gas-phase reactions under the CB6, CBMZ, and MELCHIOR2 mechanisms, respectively.When only ARI effects were enabled in the three models , relatively limited improvements were observed in the annual AOD and NO 2 columns simulated by these models.The AOD simulations improved in spring and summer but worsened in autumn and winter (Table 4 and Fig. 9).
Larger ARI-induced variations in the seasonal MBs of the NO 2 columns were observed in WRF-CMAQ (−0.18 to 0.13 petamolecules cm −2 ) compared with WRF-Chem and WRF-CHIMERE (0-0.01 petamolecules cm −2 ).When both ARI and ACI effects were enabled in WRF-Chem, the model performance for seasonal AOD simulations worsened considerably.The annual and seasonal NO 2 simulations by WRF-Chem became slightly worse, whereas those by WRF-CHIMERE became slightly better.Unlike the AOD and the NO 2 column concentrations, the improvements in the annual and seasonal column simulations of total ozone, PBL SO 2 , and NH 3 by all the two-way coupled models were limited when one or both of ARI and ACI were enabled.

Computational performance
Table 6 presents a summary of the comparative results for the time consumption by the central processing unit (CPU) per simulation day when using WRF-CMAQ, WRF-Chem, and WRF-CHIMERE with and without aerosol feedbacks in 2017.The results indicated that WRF-CMAQ consumed the shortest CPU time when simulating the 1-day meteorology and air quality with or without aerosol feedbacks enabled, followed by WRF-CHIMERE and WRF-Chem.Compared with the simulations without aerosol feedbacks, the processing time of WRF-CMAQ with ARI enabled inhttps://doi.org/10.5194/gmd-17-2471-2024 Geosci.Model Dev., 17, 2471-2492, 2024

Conclusions
Two-way coupled meteorology and air-quality models have been deployed for the ECR in recent years.However, no study has comprehensively assessed multiple coupled models of this region.To the best of our knowledge, this is the first study to perform comprehensive intercomparisons of the open-source two-way coupled meteorology and air-quality models WRF-CMAQ, WRF-Chem, and WRF-CHIMERE.
We have systemically evaluated the hindcast simulations for 2017 and explored the impacts of ARI and/or ACI on the model performance and computational efficiency for the ECR.
After performing detailed comparisons with ground-based and satellite-borne observations, the evaluation results revealed that the three coupled models performed well for meteorology and air quality, particularly for surface temperature (with an R value of up to 0.97) and PM 2.5 concentrations (with an R value of up to 0.68).The effects of aerosol feedbacks on the model performance varied with the twoway coupled model, variable, and timescale.The computational time increased by 20 %-70 % when these two-way coupled models enabled aerosol feedbacks compared with when the simulations proceeded without aerosol-radiationcloud interactions.Notably, the three coupled models effectively reproduced the spatiotemporal distributions of the satellite-retrieved CO column concentrations but not those for ground-observed CO concentrations.
The intercomparisons revealed some uncertainty sources in the evaluation of the aerosol feedback effects.As numerous schemes can be combined with the configurations of dif- ferent coupled models, we only evaluated the simulations with specific settings.Future comparisons considering more combinations of multiple schemes within the same or different coupled models are desired.Among the three coupled models, the numerical representations of specific variables in the same scheme (e.g., the treatments of cloud cover and cloud optical properties in the Fast-JX photolysis scheme) are diverse.More accurate representations of photolysis processes must be considered to reduce evaluation uncertainties.Additionally, the FDDA nudging technique can attenuate ARI effects during severe air pollution episodes, and the optimal nudging coefficients for different regions must be determined.Finally, the actual mechanisms underlying ACI effects are still unclear, and new advances in the measurement and parameterization of CCN/IN activations and PREC must be duly incorporated in coupled models.
SSR through the effects of ARI and ACI.When the three models incorporated ARI effects, the simulation accuracies for SSR over the whole year and the four seasons improved, whereas the enabling of ACI effects resulted in relatively limited improvement.Additionally, the MB variations of WRF-CMAQ and WRF-Chem simulations were higher in spring and winter than in summer and autumn, whereas the maximum and minimum MBs of WRF-CHIMERE simulations were obtained in summer (−10.33 W m −2 ) and autumn (−7.64 W m −2 ), respectively.The annual and seasonal decrease in SSR simulated by WRF-Chem and WRF-CHIMERE with ACI effects enabled https://doi.org/10.5194/gmd-17-2471-2024Geosci.Model Dev., 17, 2471-2492, 2024 Figures 3-4 and Table

Figure 2 .
Figure 2. Statistical metrics (R, MB, and RMSE) for annual simulations and observations of SSR in eastern China.

Figure 3 .
Figure 3.Time series of the hourly SSR, T2, RH2, and WS10 over the ECR in 2017 simulated by the coupled models WRF-CMAQ, WRF-Chem, and WRF-CHIMERE with/without aerosol feedbacks and the corresponding observed time series.

Figure 5 .
Figure 5. Seasonal SSR in the ECR from CERES observations and simulations using WRF-CMAQ, WRF-Chem, and WRF-CHIMERE with and without aerosol feedbacks.

Figure 6 .
Figure 6.Time series of the hourly PM 2.5 and O 3 concentrations over the ECR in 2017 simulated by WRF-CMAQ, WRF-Chem, and WRF-CHIMERE with/without aerosol feedbacks and the corresponding observed time series.

Figure 7 .
Figure 7. Taylor diagrams (R, normalized standard deviation, and NMB) of the seasonal PM 2.5 , O 3 , NO 2 , SO 2 , and CO in the ECR simulated using the three two-way coupled models (WRF-CMAQ, WRF-Chem, and WRF-CHIMERE) with/without aerosol feedbacks compared with the surface observations.

Figure 8 .
Figure 8. Spatial distributions of seasonal AOD in the ECR from MODIS observations and simulations using the WRF-CMAQ, WRF-Chem, and WRF-CHIMERE models with and without aerosol feedbacks.

Table 2 .
Summary of scenario settings in the three coupled models.

Table 3 .
Statistical metrics (R, MB, NMB, NGE, and RMSE) between the annual simulations and satellite retrievals of SSR and SLR, TOA SW and LW radiation, PREC, CF, and LWP in the ECR.The best results are shown in bold and the mean simulations and observations are in italics.

Table 4 .
Statistical metrics (R, MB, NMB, NGE, and RMSE) of the annual simulations of surface PM 2.5 , O 3 , NO 2 , SO 2 , and CO in the ECR with respect to the corresponding observations.The best results are in bold, while the mean simulations and observations are in italics.

Table 5 .
Statistical metrics (R, MB, NMB, NGE, and RMSE) for the simulated AOD, TCO, tropospheric column NO 2 , PBL column SO 2 , total column CO, and total column density of NH 3 in the ECR with respect to the corresponding satellite-retrieved data.The best results are shown in bold, and the annual mean simulations and observations are in italics.
N/A indicates that the NH3 column concentrations were not extracted as output from the WRF-CHIMERE simulations with/without aerosol feedbacks.

Table 6 .
Summary of the running times for the different coupled models.