Evaluation of global EMEP MSC-W (rv4.34)-WRF (v3.9.1.1) model surface concentrations and wet deposition of reactive N and S with measurements

Abstract. Atmospheric pollution has many profound effects on human health, ecosystems, and the climate. Of concern are high concentrations and deposition of reactive nitrogen (Nr) species, especially of reduced N (gaseous NH3, particulate NH4+). Atmospheric chemistry and transport models (ACTMs) are crucial to understanding sources and impacts of Nr chemistry and its potential mitigation. Here we undertake the first evaluation of the global version of the EMEP MSC-W ACTM driven by WRF meteorology (1° × 1° resolution), with a focus on surface concentrations and wet deposition of N and S species relevant to investigation of atmospheric Nr and secondary inorganic aerosol (SIA). The model-measurement comparison is conducted both spatially and temporally, covering 9 monitoring networks worldwide. Model simulations for 2010 compared use of both HTAP and ECLIPSEE (ECLIPSE annual total with EDGAR monthly profile) emissions inventories; those for 2015 used ECLIPSEE only. Simulations of primary pollutants are somewhat sensitive to the choice of inventory in places where regional differences in primary emissions between the two inventories are apparent (e.g. China), but much less so for secondary components. For example, the difference in modelled global annual mean surface NH3 concentration using the two 2010 inventories is 18 % (HTAP: 0.26 μg m−3; ECLIPSEE: 0.31 μg m−3) but only 3.5 % for NH4+ (HTAP: 0.316 μg m−3; ECLIPSEE: 0.305 μg m−3). Comparisons of 2010 and 2015 surface concentrations between model and measurement demonstrate that the model captures well the overall spatial and seasonal variations of the major inorganic pollutants NH3, NO2, SO2, HNO3, NH4+, NO3−, SO42−, and their wet deposition in East Asia, Southeast Asia, Europe and North America. The model shows better correlations with annual average measurements for networks in Southeast Asia (Mean R for 7 species:  = 0.73), Europe ( = 0.67) and North America ( = 0.63) than in East Asia ( = 0.35) (data for 2015), which suggests potential issues with the measurements in the latter network. Temporally, both model and measurement agree on higher NH3 concentrations in spring and summer, and lower concentrations in winter. The model slightly underestimates annual total precipitation measurements (by 13–34 %) but agrees well with the spatial variations in precipitation in all four world regions (0.65–0.78 R range). High correlations between measured and modelled NH4+ precipitation concentrations are also observed in all regions except East Asia. For annual total wet deposition of reduced N, the greatest consistency is in North America (R = 0.75), followed by Southeast Asia (R = 0.68) and Europe (R = 0.61). Model-measurement bias varies between species in different networks; for example, bias for NH4+ and NO3− is most in Europe and North America and least in East and Southeast Asia. The greater uniformity in spatial correlations than in biases suggests that the major driver of model-measurement discrepancies (aside from differing spatial representativeness and uncertainties and biases in measurements) are shortcomings in absolute emissions rather than in modelling the atmospheric processes. The comprehensive evaluations presented in this study support the application of this model framework for global analysis of current and potential future budgets and deposition of Nr and SIA.



Introduction
In view of increasing growth in global anthropogenic emissions, the physical and chemical behaviour of reactive nitrogen (Nr) 40 species, especially those that contain reduced N (i.e. gaseous NH3 and particulate NH4 + ) have been explored in both experimental and modelling studies Wagner et al., 2020;Ciarelli et al., 2019;Tang et al., 2021). As the predominant alkaline gas, NH3 exerts significant control on the formation of ambient particles and the acidity of deposition. It readily reacts with H2SO4 and HNO3 (respectively derived from emissions of SO2 and NOx), and the ammonium sulphate ((NH4)2SO4) and nitrate (NH4NO3) particles formed in these reactions are important in Earth's radiation budget (Laskin et al., 45 2015) due to their capacity to act as cloud condensation nuclei and to absorb/scatter solar radiation. Crucially, the (NH4)2SO4 and NH4NO3 secondary inorganic aerosols (SIA) typically constitute at least a third of the fine particulate matter (PM2.5) surface concentration (Li et al., 2017), exposure to which causes substantial premature mortality globally (Burnett et al., 2018).
For half the world's population, the PM2.5 air pollution burden is increasing (Shaddick et al., 2020). In addition, NH3 and NH4 + enter aquatic and terrestrial ecosystems through wet and dry deposition where they are powerful nutrients for many plants and 50 microorganisms. As a result, excessive anthropogenic reduced N emissions to the atmosphere can lead to severe eutrophication and formation of hypoxic zones, with their consequent threats to ecosystem diversity (Erisman et al., 2005).
The surface concentrations and deposition fluxes of atmospheric pollutants are influenced by many spatial and temporal factors such as emissions, meteorology, long-distance transport and chemical transformations. Ambient measurements play a vital role in assessing existing concentrations but can generally only represent the air quality in the local area and cannot 55 immediately distinguish between the influence of local and remote sources. Speciated gas and particle-phase sampling and analysis is challenging and expensive (Tang et al., 2018b). Consequently, measurements are generally sparsely located and often not very well temporally resolved, even in regions of the world with well-developed air pollution monitoring networks (Tang et al., 2021), which again limits the interpretation of atmospheric chemical and meteorological processes. Moreover, different world regions have monitoring networks that are subject to different analytical and data handling protocols, 60 potentially leading to systematic differences. Non-identical sampling duration and frequencies within these networks also add uncertainties and complexities to global comparison studies.
Compared with measurements, global and regional-scale atmospheric chemistry transport models such as EMEP MSC-W (Simpson et al., 2012), CMAQ (Byun and Schere, 2006) and WRF-Chem (Chapman et al., 2009) can provide comprehensive simulations of air pollutant concentrations and depositions with greater spatial-temporal resolution and coverage. These models 65 also facilitate insight into the chemical and meteorological linkages between diverse emission sources and the concentration and deposition of pollutants at locations away from initial emissions. Such models are essential when it comes to simulating the impacts of possible future policy actions. A number of global models have already been utilized to investigate sulphate, nitrate or ammonia budgets, including GISS II-prime (Adams et al., 1999), GEOS-Chem (Pye et al., 2009), LMDz-INCA (Hauglustaine et al., 2014) and STOCHEM-CRI (Khan et al., 2020). Bian et al. (2017) presented a budget analysis of global 70 nitrate simulations from 9 models and found wide variation in the tropospheric burdens of HNO3, NO3 -, NH3 and NH4 + between the models. However, global simulations and evaluation of Nr species in atmospheric chemistry transport models remain rare.
In particular, there has been little comparison between modelled surface concentrations and wet deposition of Nr species, especially NH3 and NH4 + , with regional ground-based measurement networks worldwide, which is the motivation for this work. 75 Here, we present for the first time a detailed evaluation of the global simulation performance of the EMEP MSC-W chemical transport model coupled with the WRF numerical weather model. Our aim was to compare model output temporally and spatially with available ambient measurements from 9 monitoring networks in 4 global regions. A further aim was to examine the sensitivities of the model-measurement comparison to two different global emission inventories (HTAP v2 and https://doi.org/10.5194/gmd-2021-166 Preprint. Discussion started: 29 June 2021 c Author(s) 2021. CC BY 4.0 License.
The WRF model included data assimilation (Newtonian nudging) of the numerical weather prediction model meteorological reanalysis from the US National Center for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) Global Forecast System (GFS) at 1° resolution, every 6 hours (Saha et al., 2010). A higher resolution UK/Europe regional version of the EMEP-WRF modelling system has been well evaluated previously against field measurements (Vieno et al., 2010;Vieno et al., 2016;Vieno et al., 2014). However, an assessment of the global version has not yet been undertaken. 110 Moreover, integrating WRF with the EMEP MSC-W model is still an innovative application, as most studies utilize meteorological data from the IFS model as described above.
Two global emission inventories were used in this work. The ECLIPSE (Evaluating the CLimate and Air Quality ImPacts of Short-livEd Pollutant) inventory version V6 (https://iiasa.ac.at/web/home/research/researchPrograms/air/ECLIPSEv6.html) contains annual gridded emissions of SO2, NO2, NH3, CO, CH4, NMVOC (non-methane volatile organic compounds), primary 115 fine particulate matter (PM2.5) and primary coarse particulate matter (PMco) (Klimont et al., 2017) at 0.5° × 0.5° spatial resolution. Its emission sectors include energy, industry, solvent use, transport, domestic combustion, agriculture, open burning https://doi.org/10.5194/gmd-2021-166 Preprint. Discussion started: 29 June 2021 c Author(s) 2021. CC BY 4.0 License. of agricultural waste, and waste treatment. We used ECLIPSE emission inventories for 2010 and 2015 to permit comparison between model and measurements for two self-consistent years of emissions, meteorology and measurements. The HTAP (Task Force on Hemispheric Transport of Air Pollution) inventory version V2 (https://edgar.jrc.ec.europa.eu/dataset_htap_v2) 120 consists of 0.1° × 0.1° gridded monthly emissions of SO2, NO2, NH3, CO, CH4, NMVOC, PM2.5, PM10, black carbon (BC) and organic carbon (OC) for 2010 (2015 was not available at the time of this work) from 7 sectors (international and domestic air, shipping, energy, industry, transport, residential, and agriculture) and was used to investigate the sensitivity of model outputs to different global inventories. The HTAP inventory utilises nationally reported emissions together with regional scientific inventories (e.g. from US-EPA, the MICS-Asia group, EMEP/TNO, the REAS and the EDGAR group) for those 125 regions where national emissions are not available (Janssens- Maenhout et al., 2015;Gusev et al., 2012;West et al., 2010).
Both inventories were aggregated to 1° × 1° resolution internally in the model. All inventory emission sector-layers were re-assigned to 11 Selected Nomenclature for sources of Air Pollution (SNAP) sectors: (1) combustion in energy and transformation industries, (2) non-industrial combustion plants, (3) combustion in manufacturing industry, (4) production processes, (5) extraction and distribution of fossil fuels and geothermal energy, (6) solvent and other product use, (7) road 130 transport, (8) other mobile sources and machinery, (9) waste treatment and disposal, (10) agriculture, (11) other sources and sinks.
In addition, monthly emission time series by sector and country derived from EDGAR (Emission Database for Global   Atmospheric Research, v4.3.2 datasets) temporal emission profiles (Crippa et al., 2020) (https://edgar.jrc.ec.europa.eu/dataset_temp_profile) were applied to the ECLIPSE annual total emissions for all pollutants. 135 Therefore, from here on we refer to the inventory with ECLIPSE annual emissions and EDGAR monthly temporal profiles as ECLIPSEE. All EDGAR emission subsectors (~33) are further divided into 11 SNAP sectors. The time-splitting factor (TSNAP) for a given pollutant for a given country/region was computed as follows. Annual average emission of pollutant from EDGAR v4.3.2 subsector j, # & : Monthly time-splitting factor of pollutant from subsector j, )_$ :  Simpson et al. (2017;2020) as are the wind-derived emissions of dust and sea salt (Simpson et al., 2012;Tsyro et al., 2011). 150 https://doi.org/10.5194/gmd-2021-166 Preprint. Discussion started: 29 June 2021 c Author(s) 2021. CC BY 4.0 License.

Measurement Datasets
Ambient measurement data were compiled from the 9 regional and national monitoring networks in East Asia, Southeast Asia, Europe, and North America listed in Table 1. The number of monitoring sites in each network varies with year and with species but Fig. 1 shows the monitoring sites for NH4 + in 2015 as an example. The frequency and duration (i.e. averaging) of sampling, and the sampling and analytical methods used, including the size fraction of PM sampled, vary across the measurement 155 networks. Some measurement locations are also deliberately sited to be close to particular industrial or agricultural sources, in which case a model grid average concentration may not reflect the measurement. Although much of this information is presented in official network reports, much useful metadata is absent from the data portals and addition of this information directly to the portals is a recommendation for improvement. In this work, only measurement data with at least 75% data capture in the year are used to avoid bias. A full data mining of global measurement data was not undertaken here, but we 160 believe we have captured the major networks of long-running, multi-species SIA gas and particle composition and wet deposition measurements.
The Chinese national nitrogen deposition monitoring network (NNDMN) was established in 2010 to measure inorganic N concentrations and deposition fluxes. The first database, NNDMN 1.0, which compiles monthly air concentration and deposition data for NH3, NO2, HNO3, NH4 + , and NO3up to 2015 was released in May 2019 . 165 The acid deposition monitoring network in East Asia and Southeast Asia (EANET) involves 13 countries and provides annual and monthly concentration and acid deposition data for more than 10 species.
The European Monitoring and Evaluation Programme/Chemical Co-ordinating Centre (EMEP/CCC) is a collaborative programme for measuring air pollutants across Europe (Tørseth et al., 2012). The measurement frequency varies from hourly and daily to weekly and biweekly or intermittently such as every 6-days. It also varies between species. This makes it difficult 175 to derive consistent annual and monthly averages comparisons between measurement and model.
The Air Data of the United States Environmental Protection Agency (EPA) provides access to annual outdoor air quality data including SO2, NO2, NH4 + , NO3 -, SO4 2-, collected from state, local and tribal monitoring agencies across the United States.
The Ammonia Monitoring Network (AMoN) and National Trends Network (NTN) are two further US networks that provide long-term records of weekly/biweekly NH3 gas concentrations and annual precipitation chemistries respectively. 180 In Canada, the National Air Pollution Surveillance (NAPS) program is the main source of ambient air quality data and consists of continuous and time-integrated monitoring of several species. Continuous measurements are implemented for CO, NO2, NO, NOx, O3, SO2, PM2.5, and PM10 at hourly resolution. The time-integrated samples collect once per 6 days for a 24-h period, encompass fine (PM2.5) and coarse (PM2.5-10) aerosol components (e.g., inorganic ions, metals), semi-volatile organic

Emissions
The global map of 2010 annual NH3 emissions from ECLIPSEE is shown in Fig. 2  Clear differences between the two emission inventories are observed in China, India, and several Southeast Asian countries, but differences in other world regions are relatively small: more than 70% of the relative differences in ECLIPSEE -HTAP emissions, the majority of which are positive, are within ± 10% of the average inventory emission for that grid. The ECLIPSEE inventory NH3 emissions are larger than the HTAP inventory emissions in north and southeast parts of China, western coastal 210 area of continental Europe, central Africa, Brazil and Argentina. The largest difference of 6496 mg m -2 , which is 73% of the inventory mean emission of 8956 mg m -2 for that model grid, is in the east of China (Fig. 2 bottom). In contrast, HTAP reports larger NH3 emissions than ECLIPSEE in areas of Southeast Asia, India, and western United States. The largest negative difference of -4281 mg m -2 (equating to 124% of the grid mean 3452 mg m -2 ) is located on the west coast of the United States.
Relative NH3 emission differences that are outside of ±100% of the average NH3 emissions from the two inventories for that 215 grid only account for 13% of total gridded differences, and the majority of instances where relative difference is large are for grids that have only low emissions, for which a small absolute difference equates to large relative difference.
Aside from the instances of quite localised discrepancies in the NH3 emissions between the two inventories, the small median positive (7.90 mg m -2 ) and negative (-12.0 mg m -2 ) differences, together with the global area-weighted average difference of only 16.0 mg m -2 (14% relative to the mean emission of the two inventories), indicate that ECLIPSEE and HTAP 220 provide very similar annual NH3 emissions in most grids over the whole global domain.
The seasonal profile of spatially averaged monthly NH3 emissions of the two inventories in 2010 was also investigated for East Asia, Southeast Asia, Europe and North America separately. The detail is presented in Supplementary Material. Clear NH3 emission peaks in spring and summer are observed in both inventories for all four global regions. In general, ECLIPSEE shows greater monthly variations than HTAP in East Asia, Southeast Asia, and Europe except for North America, which is 225 strongly indicative of different monthly (or day-of-week) temporal factors applied to annual totals in different inventories.

Similar observations derive from comparisons of emissions of NOx and SOx in the two inventories (Supplementary Material
Fig. S1 and S2). For example, the global area-weighted average difference in annual NOx emissions between the two inventories is only 11.0 mg m -2 (2.9%), whilst the maximum positive and negative differences for an individual model grid (ECLIPSEE -HTAP) are 15389 mg m -2 (162%) and -26815 mg m -2 (-186%) respectively. These large local differences in 230 NOx emissions are presumably due to the inclusion or exclusion of a specific point source in one emission inventory but not the other. The shipping emission profiles included in the two inventories are also slightly different. For instance, ECLIPSEE provides higher NOx emissions in the Yellow Sea, South China Sea and Bay of Bengal than HTAP (Fig. S1). Therefore, the differences between the two inventories may not have a large influence on global simulations but may have larger impact on regional modelling at higher spatial resolution.     (Hauglustaine et al., 2014;Xu and Penner, 2012;Pringle et al., 2010). The model-measurement comparisons we carry out for this study cover the majority of these hot spot regions.

Figure 3. (Top) Annual mean surface concentrations and (bottom) annual total (wet + dry) depositions of reduced N (NH3 + NH4 + ) for 2010 based on the ECLIPSEE inventory.
The influences of the two emission inventories on model simulated surface concentration differs according to consideration of primary or secondary component and varies from one region to another. Globally, the difference in modelled area-weighted 260 annual mean surface NH3 concentration using the two 2010 inventories is 18% (HTAP: 0.26 µg m -3 ; ECLIPSEE: 0.31 µg m -3 ). The relative difference is the same when considering land-only area-weighted mean surface NH3 concentration (0.83 and 0.99 µg m -3 for HTAP and ECLIPSEE respectively). In contrast, the difference for global area-weighted mean surface NH4 + concentration is only 3.5% for NH4 + (HTAP: 0.316 µg m -3 ; ECLIPSEE: 0.305 µg m -3 ), or 5.0% for the land-only area-weighted NH4 + concentrations of 0.755 and 0.718 µg m -3 , respectively. 265 For a regional perspective, Fig. 4 and Fig. S5 respectively compare the modelled NH3 and NH4 + concentrations using the two emission inventories for the grids in which there are also available measurements from the monitoring networks. Considering all measurement locations globally, the model simulated concentrations using the two inventories are extremely well spatially correlated with each other at R = 0.95 for NH3 and 0.98 for NH4 + . The average difference in global surface NH3 concentration between model simulations using ECLIPSEE and HTAP based on measurement locations is 0.34 µg m -3 , which 270 corresponds to only 15% of the model average concentration of 2.30 µg m -3 using the ECLISPSEE inventory or 17% of the model average concentration of 1.96 µg m -3 using the HTAP inventory.
The model concentrations using the two emission inventories are similarly linearly correlated with measurements ( Fig. 4).
As discussed above, systematic differences between modelled and measured concentrations of NH3 in East Asia and Southeast Asia can be attributed at least in part to local differences in NH3 emissions among different inventories. The average modelled 275 NH3 concentrations in China derived from ECLIPSEE and HTAP (based on measurement locations) are 12.3 and 7.9 µg m -3 respectively. The systematically greater modelled NH3 concentrations using ECLIPSEE compared to HTAP is consistent with the ECLIPSEE inventory's larger NH3 emissions over eastern and southern China (Fig. 2), where the majority of the NNDMN measurement sites are located ( Fig. 1).
For measurement locations in Southeast Asia, Fig. 4 shows that modelled NH3 concentrations are generally lower than 280 their respective measured concentrations, for simulations using both emissions inventories. However, as for China, model simulations of NH3 using the two inventories are spatially well correlated with each other (R = 0.92). The overall average modelled NH3 concentration (based on grids containing EANET sites) of 1.99 µg m -3 using the HTAP inventory is slightly greater than the average concentration of 1.50 µg m -3 using the ECLIPSEE inventory. Using the HTAP inventory also gives a slightly larger range in simulated NH3 concentrations (0.00-9.14 µg m -3 ) for the grids with measurement sites than the range 285 (0.01-6.54 µg m -3 ) when using the ECLIPSEE inventory. This is again consistent with the smaller emissions for ECLIPSEE in most south-eastern Asian countries in 2010 (Fig. 2).
In North America and Europe there are similar linearities between the modelled and measured NH3 concentrations when using either of the HTAP and ECLIPSEE inventories (Fig. 4). In general, both inventories produce smaller concentrations than measurements in Europe, with ECLIPSEE underestimating more, and higher concentrations than measurements in North 290 America, with ECLIPSEE overestimating more. In other words, the ECLIPSEE inventory yields smaller NH3 concentrations in Europe but higher concentrations in North America compared with the HTAP inventory. The differences in NH3 emissions between the two inventories are very similar in these two regions: Fig. 2 shows that the differences in emissions are generally close to zero and that differences are both positive and negative. Therefore, it is the location of the measurement site that likely influences the model evaluation statistics. The modelled NH3 concentrations in North America (based on network locations) 295 are in the ranges 0.01-3.30 µg m -3 and 0.04-3.64 µg m -3 for simulations with HTAP and ECLIPSEE inventories respectively, while in Europe the equivalent modelled NH3 concentration ranges are 0.00-4.36 µg m -3 and 0.00-3.95 µg m -3 . The average NH3 concentration difference (based on network locations) in North America between the two emission inventories is 0.47 µg m -3 (ECLIPSEE -HTAP), whilst this difference in Europe is only 0.03 µg m -3 .
The impact of emission inventory differences on concentrations of secondary pollutants is much smaller than for primary 300 pollutants since the former are influenced by multiple emissions and the timescales for their formation act to smooth out spatial differentials in primary emissions. This is illustrated by the generally better agreement between model outputs for both the HTAP and ECLIPSEE emissions inventories and the network measurements of annual NH4 + concentrations in Fig The differences in NH4 + concentrations in simulations using the two emission inventories (Fig. S5) are also smaller than for NH3 (Fig. 4), as shown by concentrations that are closer to 1:1 in all regions. For example, whilst modelled NH3 concentrations in China derived using the ECLIPSEE inventory are on average 56% higher than those derived using the HTAP inventory, the NH4 + concentrations are very similar. The annual average NH4 + concentrations (based on network locations) in 310 China are 7.30 and 7.15 µg m -3 for HTAP and ECLIPSEE respectively, which is a difference of only 2%. More detail is presented in Supplementary Material. In summary, whilst there are some spatial differences in annual emissions between the HTAP and ECLIPSEE inventories,

Comparisons of modelled surface concentrations of Nr and SIA species with measurements
Evaluations of modelled versus measured concentrations were undertaken for both 2010 and 2015. The comparisons for the two years show similar characteristics. To avoid repetition, the following section presents and discusses the comparisons for 2015, using the ECLIPSEE inventory, as more measurement data were available for 2015. Figure 5 shows the spatial distribution of modelled and measured 2015 annual average NH3 concentrations for regions covered by the NNDMN (China) and EANET (East Asia) networks. Scatter plots of the paired model versus measurement annual concentrations for NH3, NH4 + and other gaseous and particle-phase inorganic components are shown in Fig. 6, illustrating the extent of model-measurement spatial correlations. A summary of model evaluation statistics is presented in Table 2.   consistent with them being regions of intensive agricultural activities that apply large amounts of fertilizers (Xu et al., 2015).

East and Southeast Asia 340
Most areas in other East and South-East Asia countries such as Japan, Thailand, Vietnam and Malaysia have lower NH3 365 concentrations (typically <5 µg m -3 ) for both model and measurements. Relative differences between model and measurement are generally small for the majority of sampling sites, and where they are large it is a consequence of expressing a difference relative to a small measured concentration. For example, the largest relative difference of 420%, which is in Vietnam, applies to a very small measured NH3 concentration of 0.83 µg m -3 .
The modelled annual NH3 concentrations at the NNMDN locations in China are slightly higher than the measurements 370 (NMB = 0.29, Table 2), with 62% of the sites having positive model minus measurement differences. The sampling site with the largest positive difference is Zhumadian, where modelled NH3 exceeds the measurement by 16.9 µg m -3 (98% relative to measurement). The largest negative difference (-13.0 µg m -3 , -82% relative to measurement) is for the Wuwei site. The large concentration differences reflect the much larger NH3 concentrations in China. By contrast, there is no significant difference between model and measurement in most Southeast Asia countries. The average difference (mean bias) of annual NH3 375 concentrations across all locations in the EANET network is 0.29 µg m -3 , which is a factor of ten smaller than the mean bias of 2.90 µg m -3 for the NNDMN network. Figure 6 and Table 2 also present the statistical relationships between modelled and measured annual average concentrations in China for NO2, NH4 + , HNO3 and NO3 -. Both NH3 and NO2 display strong linear relationships, while the secondary species NH4 + and NO3show poorer correlations. The poorest agreement is for HNO3 (Table 2). However, modelled 380 HNO3 concentrations agree much better with measurements in EANET and other networks (shown later), which suggests differences in measurement data among networks. Artefact-free measurement of HNO3 is a known challenge (Tang et al., 2018b;Cheng et al., 2012;Sickles et al., 1999). The biases between model and NNDMN measurement are quite small for most species except for HNO3. The overall annual average NH3 concentrations are 13.0 and 10.1 µg m -3 for model and measurement respectively. The annual modelled network average NO2 concentration of 28.6 µg m -3 is only 22% greater than 385 the measured network average NO2 of 23.5 µg m -3 . The modelled and measured network average annual mean NH4 + concentrations are equal at 8.1 µg m -3 . The proportions of modelled and measured data that are within a factor of 2 are 75% for NH3, 83% for NO2, 78% for NH4 + , and 71% for NO3 -; the Fac2 for HNO3 is, however, only 21%.

Europe
The annual-mean NH3 concentration map for Europe (Fig. 7) shows the highest NH3 concentrations (>8 µg m -3 ) are in the 400 Netherlands, Germany and Italy. Concentrations in northern Europe, such as Scandinavia, are smaller (<2 µg m -3 ), which is consistent with less anthropogenic activities and colder temperatures in this region. The model simulations of large NH3 concentrations in the Po Plain in northern Italy arise from the large NH3 emissions associated with intensive farming of pigs, cattle and poultry (Carozzi et al., 2013;Skjøth et al., 2011). In the UK, NH3 concentrations generally display a decreasing trend from south to north for both model and measurement although Northern Ireland is a relatively high NH3 region as well. is at the Brompton site in England which also has the highest observed NH3 concentrations for the UK. However, it is important to note that the UK NAMN is a high spatial density NH3 monitoring network, with many sites deliberately located near local 415 emission sources of NH3 (Tang et al., 2018a), which the global model grid-average cannot capture. The linear relationships between model and measurement for 2015 annual average NH3, NO2, SO2, NH4 + , NO3 -, and SO4 2concentrations in Europe are shown in Fig. 8 and a summary of the statistical comparisons is shown in Table 3. A few UK NAMN sites are part of the European EMEP/CCC network. Where a model grid contains multiple measurement sites, the 425 average of the measured values is used.  There is a clear linear correlation between model and measurement for both primary and secondary species (Fig. 8).
Correlation is highest for NO3 -(RT = 0.80), followed by NO2 and NH4 + (RT = 0.71) and weakest for NH3 (RT = 0.51). However, the NH3 data appear to be distributed into two groups, one characterized by positive model bias mainly associated with 440 EMEP/CCC network locations, and one characterised by negative model bias mainly associated with the UK network. The former may be a result of overestimation of NH3 in the emission inventory, the latter may be caused by UK measurement locations adjacent to agricultural NH3 sources (Tang et al., 2018a). The model-measurement comparisons of other gaseous species (NO2, SO2 and HNO3) all show better correlations (R = 0.60-0.71) and smaller differences (NME 0.50-0.70) in comparison with NH3. 445 The modelled concentrations of secondary components, NH4 + , NO3 -, and SO4 2-, all match well with the spatial variations of measurements, with RT varying from 0.69 to 0.80 (Fig. 8). All three components show higher modelled than measured concentrations, to varying degree. The network-averaged NH4 + concentrations are 1.11 and 0.56 µg m -3 for model and measurement respectively. For NO3 -, the modelled average concentration is 2.18 µg m -3 which is around twice the measurement mean. In comparison with NH4 + and NO3 -, SO4 2shows a smaller NMB (0.32), and a larger Fac2 fraction (64%). 450 In conclusion, across Europe the model exhibits a good performance in simulating annual average concentrations and spatial variations of major inorganic air pollutants, but with an overestimation of secondary NH4 + , NO3 -, and SO4 2-. The overall agreement between model outputs and ambient measurements in Europe networks is as good as that in EANET network.    Table 4 provides the summary of statistical comparison metrics. The number of monitoring locations is greater than for the networks in East Asia, Southeast Asia, and Europe. The correlations between modelled and measured annual average NH3, NO2, HNO3 concentrations in North America (RT = 0.59 -0.72) are similar to those in Europe and Southeast Asia, but the correlation for SO2 is poor (RT = 0.27). The reason for the poorer correlation between modelled and measured SO2 is unknown but may have a few causes: the emission inventory for SO2 in North America may be too low, 475 or some sampling sites may be set close to SO2 point sources whilst grid-averaged model values are much lower. For the other three gaseous species the biases between model and measurement are in reasonable ranges. The network-averaged modelled NH3 concentrations is 1.76 µg m -3 which is close to the measured average concentration of 1.28 µg m -3 . For HNO3, 78% of model data are within a factor of 2 of the measurements and the overall average concentrations are 0.51 µg m -3 and 0.39 µg m -3 respectively (Table 4). Compared to NH3 and HNO3, the modelled annual NO2 concentrations are generally smaller than 480 measurements, leading to a negative NMB of -0.39.

United States and Canada
Clear linear relationships are observed between modelled and measured annual average concentrations for all three secondary pollutants (Fig. 10, Table 4), among which SO4 2has the highest correlation coefficient (0.86), the largest Fac2 (87%) and the smallest NMB and NME. This reflects excellent capability by the model to capture the spatial variation of SIA constituents. In terms of absolute concentrations, modelled concentrations are on average higher than measured to varying 485 degrees for NH4 + , NO3 -, and SO4 2-, as is the case in Europe. This may be due to gas-to-particle conversion process being too fast in the model or sinks of these secondary species being too small. The network-averaged NH4 + concentrations are 1.

Comparison of temporal variation of modelled concentrations with measurements
The NNDMN, EANET, NAMN and EMEP/CCC monitoring networks also provide higher-temporal-resolution data, which 505 allows a comparative assessment of monthly variations in model simulations (Fig. 11). As well as model-imposed temporal variations in emissions, the NH3 concentrations are also driven by meteorological variations, in particular warmer temperatures favour partitioning of reduced N to gaseous NH3. Missing measurement data for certain months and sites means the number of comparisons varies from one month to another.   Table 5) and low in winter (mean: 6.54 µg m -3 ). The seasonal pattern in the model simulations is slightly different, with dual peaks of NH3 concentrations in March and August, but seasonal averages for spring and summer in model are similar to summer measurements at 14.6 µg m -3 and 14.8 µg m -3 respectively. Similar to measurements, the modelled NH3 concentration is also lowest in winter (9.09 µg m -3 ). For the EANET, both modelled and measured NH3 median concentrations 520 show a less clear varying trend than other networks, which might be due to the distributions of monitoring sites. A large number of sites in Southeast Asia are located in the tropics where the climate is characterised by a small temperature range and substantial rainfall, which leads to a very small range of fluctuations of NH3 concentrations. The monthly averages indicate that measurements peak in April and October and are minimum in March and August, while the model has higher concentrations in March, April, August and October, and lower concentrations in January and February. However, the 525 fluctuation in the all-site monthly averages is small, ranging from 1.21 µg m -3 to 3.21 µg m -3 , and from 1.77 µg m -3 to 2.30 µg m -3 , for model and measurement respectively. The variation in monthly medians is even smaller. For the UK NAMN, both mean and median concentrations (Fig. 11) show that model and measurement exhibit higher NH3 concentrations in spring and summer, and lower concentrations in winter. One small difference is in the timing of the NH3 concentration maximum. The highest measured NH3 concentrations are in spring, whereas modelled concentrations have a maximum in summer. The differences between all-site monthly mean and median concentrations, and between the maximum 535 and minimum values, in measurement are much larger than in the model, indicating a broad sub-grid variability that cannot be captured by the global model as the spatial averaging process smooths out these highly localised concentration gradients. For the European EMEP/CCC network, the model is in excellent agreement with measurement in respect of temporal pattern despite its higher absolute concentrations. Both model and measurement show a continuous period of higher NH3 concentrations from spring to summer and lower NH3 concentrations in autumn and winter. 540 Similar model-measurement monthly comparisons for NH4 + in 2015 are presented in Supplementary Materials (Fig. S6).

Comparisons of modelled precipitation and wet deposition with measurements 555
The evaluations of model performance for precipitation and wet deposition are based on the 4 monitoring networks (China, East Asia, Europe and United States) that report both precipitation and precipitation concentration measurements for 2015.
The total annual wet deposition (WDEP) is calculated as, where ̅ (also referred to here as Prec Conc) is the precipitation-weighted annual average concentration 560 and % is the concentration, and % is the depth, of each individual precipitation event i in the year. Prec Amount, ∑ % , is the total precipitation depth for the year. When % (and ̅ ) are expressed in units of mg L -1 , and % in mm, then WDEP has units of mg m -2 . Figure 12 shows for each location in each of the four networks the comparisons between modelled and measured annual 565 precipitation, precipitation-weighted annual average concentration of reduced N (in the form of NH4 + ) and annual total wet deposition of reduced N in 2015. Table 6  less extent the precipitation concentration. Across the four networks, agreement between modelled and measured wet 585 deposition of reduced N is best for the US NTN with R = 0.75 and Fac2 = 81%.
The comparison for global wet deposition of total oxidized N (in the form of NO3 -) exhibits similar results and is presented in the Supplementary Material ( Fig. S7 and Table S1). The modelled precipitation-weighted concentrations of NO3has relatively good agreements with measurements in EANET, EMEP/CCC and US NTN networks with R ranging from 0.69 to 0.80, while the comparisons in NNDMN show a poorer linear correlation between model and measurement (R = 0.40). In 590 terms of biases, the model tends to simulate higher NO3concentrations in precipitations in EANET (NMB = 0.52) and US NTN (NMB = 1.04) networks but underestimate in NNDMN (NMB = -0.37). In general, the greatest model-measurement agreement for oxidized N wet deposition is found in US NTN, followed by EMEP/CCC and EANET, and to the least extent NNDMN, which again suggests systematic differences between monitoring networks rather than issues with the modelling of atmospheric chemistry and meteorology. 595 On the whole, the modelled reduced and oxidized N show similar linear relationships with measurements in precipitation and wet deposition in all regions, which further supports the utilization of the WRF and EMEP MSC-W modelling system to investigate Nr processes globally.    The work presented here is motivated by the use of the EMEP MSC-W-WRF model for global-scale analyses of atmospheric nitrogen and SIA chemistry, fluxes and budget, particularly species that contain reduced N (i.e. gaseous NH3 and particulate NH4 + ). The model evaluation, conducted both spatially and temporarily, is based on the available data in 2010 and 2015 from 9 monitoring networks that span the range of ambient measurements in East Asia, Southeast Asia, Europe, and North America. Table 7 summarises the global comparison between model and surface measurements in 2015. The correlation coefficients 620 (R) between modelled and measured concentrations of most species (i.e. NH3, NO2, NH4 + , NO3and SO4 2-) are all greater than 0.78 except for HNO3 and SO2. The wet deposition of reduced N shows a stronger linear correlation (R = 0.78) than oxidized N (R = 0.64). For reduced N species, the evaluation shows that the model overestimates NH3 and NH4 + worldwide with a NMB of 31% and 37% respectively. For oxidized N species, the NMB values for NO2 and NO3are 23% and 61% and, in contrast, HNO3 is underestimated by 34%. Slightly higher concentrations are also simulated by the model worldwide for both SO2 and 625 SO4 2with a NMB of 10% and 21% respectively. For wet deposition, the model has smaller values on average for reduced N (NMB = -29%) but larger values for oxidized N (NMB = 26%). Given the intrinsic discrepancies between local site measurement and a global-scale chemistry model grid, these comparisons are good and are comparable with model evaluation statistics determined for models of similar resolution (Hauglustaine et al., 2014;Bellouin et al., 2011;Pringle et al., 2010;Xu and Penner, 2012). 630 Table 7  Both model and measurement have uncertainty that constrains the extent to which statistical analyses between modelled and measured data can be utilized to assess a model's performance. A reliable evaluation of a model requires a high quality of 640 measurement as well. For instance, sampling and chemical analysis procedures such as the instrument calibration, the choice of sampling filters/tubes, the storage, extraction and chemical speciation of air samples all have different uncertainties propagated to the final measured variable. In particular, this study and the above-mentioned global modelling studies all show difficulties in representing surface NO3and NH4 + concentrations, which are currently overestimated by around a factor of 2 in Europe and North America. Such positive biases between modelled and measured NO3and NH4 + are speculated to be 645 partially associated with negative sampling artifacts in measurements as evaporation of NH4NO3 from sampling filters has been reported to cause losses of up to 50% in summer conditions (Hauglustaine et al., 2014;Vecchi et al., 2009;Yu et al., 2005). Further work is required to better characterize and quantify the uncertainty of individual NO3and NH4 + measurements.
In general, the relative measurement uncertainty increases markedly as concentration decreases Pernigotti et al., 2013). The EMEP/CCC data report for 2015 estimates a combined sampling and chemical analysis uncertainty range of 650 15-25% (Hjellbrekke, 2017), while the detailed uncertainty information in other monitoring networks is not publicly available.
Similarly, different input, configurations and computing processors also have influences on the model output, and the quantification of such influence is rather complicated (Kong et al., 2020). The choice of emission input is a good example (Aleksankina et al., 2019). The compilation of an emission inventory is partially based on reported measurement data and partially on expert estimation, which consequently leads to a certain uncertainty in emission magnitudes and temporal profiles 655 (EMEP/EEA, 2019;Hilde Fagerli, 2017;Klimont et al., 2017;Wiedinmyer et al., 2011;Zheng et al., 2012). The completeness and consistency of submitted emission data differs significantly across countries as well. As discussed in Section 3.1, the two global emission inventories used in this work, HTAP and ECLIPSEE, have shown large localised discrepancies in NH3, NO2 and SO2 emissions in certain world regions, which is presumably ascribable to the inclusion or exclusion of a particular local point source in the compilation process. The influence of these discrepancies on model-simulated surface concentration differs 660 in terms of primary or secondary component and varies from one region to another, although such greatly localised influences are diminished during the spatial averaging processes. It is therefore important to acknowledge that the performance of any model is subject to the quality of model input data which includes not only emissions but also meteorology and other aspects of model parameters. Moreover, no one can guarantee error-free models; in the same way that observations are likely to be not error free. Often in the atmospheric modelling community these potential model errors are not discussed or acknowledged. 665 Aside from intrinsic uncertainties in model and/or measurement values, the model and measurement may also not agree concerning the averaging time periods and the diameters of the sampled particles. A certain number of measurements may be missing from a time series due to unpredictable instrument failure and/or because the measurement averaging period does not match the model averaging time period. It is clear that the sampling time and size distributions of measurements vary from one monitoring network to another, and from species to species. For example, in Canada, NH4 + concentrations within PM2.5 670 are measured, while the particle size cut-off for the DELTA system used in the UK and China is around 4.5 µm (Tang et al., 2018a;Tang et al., 2018b;Xu et al., 2019). The modelled NH4 + , SO4 2-, and fine NO3are all in PM2.5. Another example is that in the US and Canada gaseous species like NO2 and SO2 are monitored continually throughout the year and thus the corresponding annual average concentrations are calculated in the same way as the model, whilst the aerosol components such as NO3and NH4 + are measured once per 6 days (or once per week). of that grid volume and at a specific height above the ground, which may often not reflect the average concentration for the 680 grid. Indeed, there are particular monitoring sites where measurements are exceedingly affected by local sources. The UK NAMN is a good example, in which quite a few sites are purposely set near agricultural sources and therefore yield higher NH3 concentrations than model grid-average predictions. The US EPA also has many monitors set up next to roads with heavy traffic and hence observed much higher SO2 levels. The representativeness of an urban (or rural) site for the air in the corresponding model grid will therefore depend on the relative size of that specific urban (or rural) area within that model grid. 685 The intention here is to provide an overview of how the EMEP-WRF model-measurement agreements vary among different monitoring networks and among different chemical species for evaluation of a chemistry transport model in a global context.
In general, the model shows better linear correlations with surface concentration measurements in East Asia ( " = 0.73 over 7 species), Europe ( " = 0.67 over 7 species) and North America ( " = 0.63 over 7 species) than in China ( " = 0.35 over 5 species). More specifically, comparisons in China show the model performs better in computing concentrations of primary 690 pollutants (i.e. NH3 and NO2) than secondary species (i.e. NH4 + and NO3 -), while the model evaluation statistics in East Asia, Europe and North America show almost equally good results over all species. This implies potential discrepancies in the measurements or emissions in China rather than general issues with meteorological and atmospheric chemistry modelling. The values of statistical metrics in this work are as good as other global model evaluation studies. A global model aerosol simulation study (Hauglustaine et al., 2014) reported that the R of global model results (LMDz-INCA global chemistry-aerosol-climate 695 model, 1.9° latitude ´ 3.75° longitude resolution) versus measurements in 2006 for surface concentrations of SO4 2-, NH4 + and NO3ranged 0.43-0.58 in Europe and 0.54-0.77 in North America, which is similar to our results presented here. The AeroCom phase III global nitrate experiment, which includes 9 models, reported slightly lower R ranges than here for annual NO3in 2008: 0.081-0.735 in North America, 0.393-0.585 in Europe, and 0.226-0.429 in Southeast Asia (Bian et al., 2017); and the agreements between model and observation for gas tracers in that study were even lower than here. 700 This work has utilized the EMEP MSC-W v4.34 coupled with WRF v3.9.1.1 model. As discussed above, modelmeasurement comparison statistics will vary in different global models to different extent. However, the broad discussions associated with fundamental differences between localised measurement and grid-volume averaged model output, unmatched temporal coverage, relatively higher uncertainties of emissions, and intrinsic limitations of measurement, are generalizable, as ACTMs and other climate models are constructed similarly. Allowance for these inherent model-measurement discrepancies 705 and uncertainties yield significantly less stringent requirements on acceptable model evaluation statistics than might initially be expected. Urban dispersion models (Denby et al., 2020;Hood et al., 2018) with higher resolutions have stronger capabilities of representing point sources and concentration gradients but are constrained even more by the accuracy of localised emission inventories and boundary conditions in the meantime, and therefore are only configured at an individual urban area. Globalscale model simulation as presented here, in spite of acknowledged limitations on coarser spatial resolution, has the advantage 710 of generating self-consistent chemistry fields and competence for investigating contemporary and potential future global reactive nitrogen and SIA atmospheric chemistry and their regional variations.

Conclusions
This model versus measurement study is motivated by the first application of a global version of the EMEP MSC-W model with WRF meteorology (1° × 1° horizontal resolution) to study global reactive N and S chemistry and deposition. A 715 comprehensive spatial and temporal comparison of model output against 9 monitoring networks from 4 world regions (East Asia, Southeast Asia, Europe and North America) has been undertaken, with a focus on the atmospheric concentrations and In general, capturing correlation is more important than bias given the intrinsic discrepancies and uncertainties between the modelled and measured variables. In this work the model shows better linear correlations with measurement networks in 735 Southeast Asia (Mean R for 7 species: ! """ = 0.73), Europe ( ! """ = 0.67) and North America ( ! """ = 0.63) than in China ( " """ = 0.35 over 5 species), which implies potential discrepancies with some measurements and emissions rather than issues with modelling meteorological and atmospheric chemistry processes. Model-measurement bias varies from one species to another in different networks. NH4 + and NO3are the species overestimated the most by the model in Europe and North America but not so much in East Asia and Southeast Asia networks, reflecting that the model production of the two species might be too 740 fast and/or the chemical and physical losses might be too slow in the two regions. The model performs the best in simulating SO4 2concentrations in North America regarding overall statistics among various species in all networks.
Both model and measurement exhibit higher NH3 concentrations in spring and summer, and lower concentrations in winter.
The greatest agreement of temporal profile for model and measurement is found in Europe. The fluctuation of monthly average NH3 concentrations in Southeast Asia throughout the year 2015 is fairly small for both model and measurement and the 745 temporal trend is therefore less clear. Small differences appear regarding the specific peak concentration months in China and UK. Measurements in China show highest monthly concentration in July, while the model simulates two peaks in August and March. Highest NH3 concentrations in the UK network are in spring, whereas the modelled concentrations peak in summer.
Such disagreements again reflect the likelihood that the major driver of model discrepancies is the inaccuracy of temporal profiles of emissions rather than the simulation of atmospheric chemistries and physics. 750 The evaluation of wet deposition shows that the model is capable of simulating spatial variation of annual precipitation correctly in all four world regions (0.65-0.78 R range) despite a 13-34% underestimation. Given that the spatial and temporal averaging smooths out highly localised effects of precipitation event, such model-measurement discrepancy is reasonable. In respect of the weighted precipitation concentrations, high linear correlations between measured and modelled NH4 + and NO3concentrations are observed in Southeast Asia, Europe and North America but not China, which may again suggest systematic 755 difference among measurement rather than model. In general, the model shows the greatest consistency of annual total wet deposition with measurements in North America (R: 0.75 and 0.81 for reduced and oxidised N respectively; similarly,

Author contribution
MH, DS and MV conceptualised and supervised the study. MV and PW contributed to model development and set-up and provided modelling support. MV provided computing resource. YG contributed to study design, undertook all model 775 simulations, compilation of measurement datasets, formal data analyses, visualisation of the results and data curation, with discussion and refinement by all authors The original draft of the paper was written by YG with editing by MH. All authors provided review comments and approval of the final version.

Competing interests
The authors declare that they have no conflict of interest.