Towards an online-coupled chemistry-climate model: evaluation of trace gases and aerosols in COSMO-ART

. The online-coupled, regional chemistry transport


Introduction
Aerosols affect climate through changes in the radiation budget (direct effect), the subsequent changes in atmospheric stratification (semi-direct effect, Haywood and Boucher, 2000) and through changes in cloud development and lifetime due to the differences in available cloud condensation/ice nuclei (indirect effects, Lohmann and Feichter, 2005). Aerosols also constitute a health concern if they are small enough to traverse the human respiratory tract (Laden et al., 2006;Dockery et al., 1996). Once in the lungs their toxicity depends on size (Donaldson et al., 2000) and chemical composition (Aktories et al., 2009;Hoek et al., 2002). Within the climate system, their influence on the radiation budget depends on their optical properties, and how they affect clouds is a function of size and hygroscopicity. Size, chemical composition, and optical properties are therefore indispensable parameters that need to be well represented if any study of aerosol effects should be accurate.
Up to now, climate modeling studies including aerosols often lack a comprehensive description of aerosol characteristics, due to the high computational demand of such a complex effort. Approaches range from simple bulk mass aerosol schemes with only externally mixed aerosols, up to multi-component, size-resolving aerosol modules including explicit aging of aerosols and interactions with radiation and clouds. Often these modules lack parts (or all) of the interaction between gas-and aerosol-phase. Nucleation of ammonium-sulfate particles is represented in most models, and also the condensation of organics onto particles is included in some. Nitrates, which can represent up to 50 % of ambient aerosol mass in polluted regions (Putaud et al., 2004), were missing for example in all but two models participating in the Fourth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC, Meehl et al., 2007). This was probably due to the lack of the necessary, but computationally expensive, gas-phase chemistry leading to nitrate formation.
Current efforts try to bridge the gap between accurate representation of all aerosol components while retaining the ability to model climatic timescales. To reach this goal it is necessary to couple climate and air quality models. One such modeling system which focuses on the regional scale combines the numerical weather prediction model of the Consortium for Small Scale Modeling (COSMO, Baldauf et al., 2011) with an extension for Aerosols and Reactive Trace gases: COSMO-ART (Vogel et al., 2009). It is based on state-of-the-art components for the description of meteorology, chemistry and aerosols and features an integrated approach to couple them. Such an "online"-coupling allows for consistent treatment of all components by the same parameterization (e.g. advection, diffusion, convection) and avoids unnecessary interpolation steps. Additionally, simulation of feedbacks between chemistry, aerosols and meteorology becomes possible. Grell and Baklanov (2011) showed the importance of this approach and its benefits compared to traditional "offline" models, and Zhang (2008) gave a comprehensive overview of the available modeling systems. COSMO-ART is in its composition very similar to the Weather Research and Forecasting model (WRF) extended by chemistry and aerosols: WRF/chem. Grell et al. (2005) presented a comprehensive evaluation for this modeling system. Most of the components of COSMO-ART are well known and tested. However, their interplay and integration into the modeling system lacks a thorough evaluation.
In this work we analyse COSMO-ART regarding its ability to represent ambient concentrations of gaseous and particulate matter constituents over Europe under different meteorological conditions. Through a detailed analysis of aerosol size distributions and chemical composition we set the basis for subsequent analyses of aerosol-climate interactions in COSMO-ART. We have collected an extensive evaluation dataset of satellite-derived NO 2 and aerosol optical depth (AOD), long-term station measurements for gas-phase tracers, bulk aerosol mass and optical properties, as well as aerosol mass spectrometer (AMS) measurements of aerosol chemical composition and measurements of aerosol size distribution. The comprehensive datasets of aerosol characteristics have been created during recent field campaigns of the European integrated Project on aerosol cloud climate air quality interactions (EUCAARI, Kulmala et al., 2009), during intensive measurement campaigns of the European Monitoring and Evaluation Programme (EMEP, http://www. emep.int) and in coordinated measurements of the European Supersites for Atmospheric Aerosol Research (EUSAAR, http://www.eusaar.net) and the German Ultrafine Aerosol Network (GUAN, Birmili et al., 2009).
Our simulations employ full gas-phase chemistry and aerosol dynamics. Spatial and temporal resolution of input data (meteorology, anthropogenic emissions) and model setup is on the top end of currently possible simulations. While the modeling system is currently still too expensive to be used for climate simulations, the results of our evaluation efforts can be seen as a benchmark for what degree of accuracy in simulation gas and aerosol characteristics can be expected in future fully-coupled regional chemistry-climate models, and identify model deficiencies which would need to be remedied before such simulations can be made.
We begin with a description of the system, its setup and the measurement datasets used in evaluation. The second chapter describes the findings of our evaluation against the different datasets and discusses the results. The last chapter provides a more in-depth discussion of simulated aerosol characteristics. We conclude with implications for future studies and give directions for further developments of the modeling system.

Modeling system
COSMO-ART is a regional chemistry transport model, online-coupled to the COSMO regional numerical weather prediction and climate model (Baldauf et al., 2011). COSMO is operationally used for numerical weather prediction (NWP) purposes by several European national meteorological services and research institutes. In its climate version (Rockel et al., 2008) it has been used in several studies of regional climate impact assessment (e.g. Jaeger and Seneviratne, 2010;Suklitsch et al., 2008;Hohenegger et al., 2008) and participated in the IPCC fourth assessment report modeling ensemble (Christensen et al., 2007). The extension for Aerosols and Reactive Trace gases (ART) contains a modified version of the Regional Acid Deposition Model, Version 2 (RADM2) gas-phase chemistry mechanism (Stockwell et al., 1990). It has been extended by a more sophisticated isoprene scheme of Geiger et al. (2003) for a better description of biogenic volatile organic compounds (VOC), but does not include recent findings regarding formation of secondary organic aerosols and OH recycling due to isoprene chemistry (e.g. Paulot et al., 2009). Aerosols are represented by the modal aerosol module MADE (Modal Aerosol Dynamics Model for Europe, Ackermann et al., 1998), improved by explicit treatment of soot aging through condensation of inorganic salts (Riemer et al., 2003) and additional modes for mineral dust (Stanelle et al., 2010) and sea salt. Nucleation of new particles is formulated according to Kerminen and Wexler (1994) allowing for binary homogeneous nucleation of sulfuric acid. The condensation of vapours from biogenic and anthropogenic VOCs is parametrized with the Secondary Organic Aerosol Model (SORGAM) of Schell et al. (2001). This is still a commonly used module, although Fast et al. (2009) showed that this scheme underpredicts SOA concentrations by up to a factor of 10 in very polluted regions. Biogenic VOC emission fluxes, considering isoprene, α-pinene, other monoterpenes and a class of unidentified compounds, are calculated online with a Guenthertype model presented in Vogel et al. (1995), using land use data from the Global Land Cover 2000 (GLC2000) dataset (Bartholomé and Belward, 2005). Seasalt emissions follow Lundgren (2006), and mineral dust is parameterized as described in Vogel et al. (2006). Dry deposition is modeled by a resistance approach (Baer and Nester, 1992). Washout of aerosols is included by a parameterization of Rinke (2008). Wet removal of gases and aqueous-phase chemistry are currently not considered. COSMO-ART is fully online-coupled, and currently allows for feedbacks of aerosols on radiation (direct/semi-indirect effects). Cloud feedbacks (indirect effects) have been included in a research version (Bangert et al., 2011) but were not used in this work. A complete description of the modeling system can be found in Vogel et al. (2009) and references therein. In our study, COSMO-ART based on COSMO version 4.17 is used.
For meteorology we used initial and boundary conditions from the European Centre for Medium-Range Weather Forecasts (ECMWF) Integrated Forecast System (IFS) model, with an update frequency of 3 h. For runs on climatic timescales boundary data could e.g. be provided by the ECHAM-HAM (Stier et al., 2005) or (for past episodes) by ERA-40 (Uppala et al., 2005)/ERA-Interim (Simmons et al., 2007) reanalyses, which are all based on the IFS and would therefore deliver comparable meteorology. Boundary data for gas-phase species, including most of the lumped NMVOC compounds, were provided through simulations of the Model for Ozone and Related chemical Tracers (MOZART) driven by meteorological data from the National Center for Environmental Prediction (NCEP) presented in Emmons et al. (2010), with an update frequency of 6 h. No boundary data for aerosol components were available from MOZART or other models that matched our aerosol mechanism. Therefore, we took the output of a previous (otherwise identical) simulation of COSMO-ART and chose one point in the Northern Atlantic (8.7 • W, 47.4 • N, see Fig. 1). We averaged the simulated aerosol characteristics over the complete simulation period, and used this vertical column as lateral boundary conditions for all aerosol variables. While this gives more realistic aerosol concentrations at the boundaries, the total inflow will still be underestimated. In this work we will show that simulated particulate matter concentrations are often underestimated, which will also be the case for boundary conditions based on such a simulation.
The emission inventory for Europe developed by TNO (Netherlands) within the Monitoring Atmospheric Composition and Climate (MACC) project (TNO/MACC, Kuenen et al., 2011;Denier van der Gon et al., 2010) provides anthropogenic emissions. This is a follow-up and improvement of the earlier TNO-GEMS emission database (Visschedijk et al., 2007). Therein, emissions from 10 different SNAP (Selected Nomenclature for sources of Air Pollution) source categories are represented by a spatial pattern of annual emission totals for the years 2003-2007, and statistical time functions for species, country and source category dependent monthly, weekly and daily cycles. Our speciation of non-methane volatile organic compounds (NMVOC) mass totals is done using composition information from Passant (2002) and a translation matrix to RADM2 (J. Keller, PSI, Switzerland, personal communication, 2009). Aerosol emissions are provided as mass totals of particulate matter below 10 µm (PM 10 ) and below 2.5 µm (PM 2.5 ) in diameter. We distribute them onto the different MADE modes following Elleman and Covert (2010), with a disaggregation into chemical components using a split   8 km (0.125 × 0.0625 • ), the description of the time evolution of emissions and the comprehensive set of emitted species this dataset is one of the most detailed currently available emission inventories for Europe. Preparation of all input datasets for COSMO-ART is done using INT2COSMO-ART (Appendix A).
Our modeling domain ( Fig. 1) covers the greater European region, with a horizontal resolution of 0.17 • and a grid of 200 × 190 points. Vertically, the model is discretized into 40 terrain-following hybrid sigma levels, with the lowest level at 10 m (layer thickness: 20 m) and ranging up to approx. 24 000 m (20 hPa). A Runge-Kutta time integration scheme is employed with time steps of 40 s. Tracers are advected horizontally via a semi-lagrangian method conserving mass over the total domain ("globally mass-conserving"). The overall model configuration closely follows the current operational setup of COSMO-EU of the German Meteorological Service (DWD).

Measurement data
Meteorological parameters have been taken from the operational surface synoptic observations (SYNOP) network, providing measurements for temperature, dew point temperature, wind speed and direction at or in the vicinity of most measurement points of chemical composition. In the EMEP programme a number of stations throughout Europe report quality-controlled, long-term measurements of gaseous precursor substances and aerosol variables. AIRBASE (European AIR quality dataBASE, http://airbase.eionet.europa. eu/) provides measurements at a much larger number of stations, but with heterogeneous quality and mostly at rather polluted locations not representative for the model grid size of 0.17 • (approx. 19 km at model domain center). While AIRBASE, in its recently published version 5, provides data up to the end of 2009, EMEP data were only available until 2008. As one of our simulation periods is in 2009, we settled on the following method to provide a homogeneous dataset of measurements for gas-phase species and aerosol mass for all periods: We retrieved data from AIRBASE, but restricted the stations used to those which also report to EMEP. As discrepancies between modelled and measured values might be related to the type and location of a measurement station, we have additionally disaggregated the selected stations into categories based on the representativeness study done by Henne et al. (2010), which includes a more comprehensive analysis of the surroundings of each station. Therein, stations are classified regarding their pollution burden and usability in a model evaluation. We have used the "alternative classification" described in the Supplement S3 in Henne et al. (2010), which gives classes ranging from very clean stations ("rural/remote"), via stations with very variable pollution levels ("rural/coastal") and stations representative for a larger area ("rural"), up to stations with a strong influence of large urban areas in their vincinity ("suburban/urban"). Most EMEP stations are found in the "rural" and "rural/coastal" classes, and are seen as the most representative when evaluating model results.
The Aerosol Robotic Network (AERONET) (Holben et al., 1998) provides measurements of aerosol optical depth (AOD) for analysis of the optical properties. Aerosol mass spectrometer (AMS) measurements give quantitative measurements of the chemical composition of submicron nonrefractory aerosol mass (NR−PM 1 ) with high temporal resolution . AMS data collected at several sites throughout Europe during measurement campaigns of the EMEP/EUCAARI project in October 2008 and March 2009 were used, as well as from an EMEP intensive campaign in June 2006. No evaluation of elemental carbon has been made, as the different measurement techniques used make even inter-station comparison difficult (Andreae and Gelencsér, 2006), and devising a homogenized dataset was out of scope for this work. Homogenized measurements of aerosol size distribution from scanning mobility particle Table 1. Contributions (in % mass) to PM 2.5 emissions as used in the TNO/GEMS emission inventory. Sodium (Na) is not used directly in the simulations, but added to the "other primary" category, representing the remaining, non-carbonaceous primary PM 2.5 part (including e.g. minerals, metal oxides, product emissions). Sulfate contributions have been calculated assuming 2 % of total emitted SO 2 mass (IIASA RAINS emissions for 2000) is H 2 SO 4 for all SNAP categories except SNAP 1 and 3. There, measured compositions of coal fly ash (as dominant contributor to source category) as reported by Lipsky et al. (2002) and Senior et al. (2000) are used as basis. OC depicts organic carbon, a ratio of 1.3 has been used to convert OC to organic aerosol (OA). sizer (SMPS) and differential mobility particle sizer (DMPS) instruments were provided in Asmi et al. (2011) as a result of the EUSAAR project and data from the GUAN network (Birmili et al., 2009), with 24 measurement sites in Europe. Figure 1 shows the locations of ground-based stations used in our evaluation. Finally, satellite-derived datasets provide a vertically integrated view on model performance. In our analysis, tropospheric columns of NO 2 from the Ozone Monitoring Instrument (OMI) were used for gas-phase comparison. The NO 2 columns are based on the Empa OMI NO 2 retrieval (EOMINO) which includes several improvements as compared to operational products in particular regarding a better representation of topography and surface reflectance using high-resolution data sets (Zhou et al., 2009(Zhou et al., , 2010. To estimate the accuracy of the spatial distribution of simulated aerosol loadings aerosol optical depth (AOD) retrieved from the Moderate Resolution Imaging Spectrometer (MODIS) (Levy et al., 2007, MOD04 L2 product) were used.

Investigation periods
The selection of the investigation periods was driven by two goals: to evaluate model performance under typical weather conditions and in all seasons, and to have AMS measurement data available for comparison. Apart from the campaign measurement data, AIRBASE and satellite data were available for all simulations. The following periods were chosen:

"Winter case": 23 January-11 February 2006
A stable high pressure system with very low surface temperatures was present over Europe from 23 January onwards, with only minor disturbances on 5-7 February. Over Switzerland and Eastern Europe, this resulted in an episode with strong temperature inversions and exceptionally high particulate matter (PM) concentrations. The Swiss legislative limit for daily mean PM 10 (particulate matter below 10 µm in diameter) of 50 µg m −3 was exceeded every day between 27 January and 5 February at several measurement stations. This episode represents a typical winter situation where high pollution levels are building up through strong inversions and local emissions are the strongest contributors to pollution levels (Holst et al., 2008).

"Summer case": 10-29 June 2006
This episode was characterized by dry, sunny and warm conditions due to a stable high pressure system from 10-24 June, and a transient low pressure system with embedded thunderstorms on 25 to 29 June. Such a situation is associated with strong photochemistry and high O 3 levels, representing a typical "summersmog" episode. AMS instruments were deployed in Payerne (CH), Harwell and Auchencorth (UK) during this period in the context of an EMEP intensive measurement campaign. We used data from Payerne and Auchencorth in our analysis.

"Autumn case": 1-20 October 2008
A low pressure system over Scandinavia brought polar airmasses towards Europe at the beginning of the month. From 5-20 October generally mild and sunny conditions prevailed. On 16 October a low pressure disturbance passed, bringing rain to Central Europe. Frequent disturbances by mesoscale systems gradually change a summertime atmosphere towards a wintertime one in this simulation. During this period, an EMEP/EUCAARI measurement campaign took place, from which we received AMS data for Payerne (CH), Melpitz (DE), Vavihill (SE), Hyytiälä (FI) and K-Puszta (HU). EUSAAR size distribution data were available for this period.

"Spring case": 1-20 March 2009
A low pressure system originating over the North Atlantic brought cold weather on 1 and 2 March. It was followed by spring-like conditions from 13-18 March, and a cold surge from NE on 20 March. We regard this situation as typical of spring, with first warm days including the initial onset of BVOC emissions, intermitted by "cleansing" periods with clouds, precipitation and strong mesoscale forcing. Another EMEP/EUCAARI campaign took place during this period, from which we present data from AMS instruments deployed in Payerne (CH), Melpitz (DE), Vavihill (SE), Hyytiälä (FI), Cabauw (NL), Helsinki (FI), Barcelona (ES), and Montseny (ES). EUSAAR size distribution data were available for this period.

Evaluation
The following section contains a description of the results of our evalution efforts, starting with meteorology, then trace gases and finally aerosol characteristics. Each section is accompagnied by a figure/table summarizing the results for the species discussed. Section 4 then further elaborates on the results for aerosol characteristics.

Meteorology
COSMO-ART is in its meteorological core code identical with the NWP model COSMO, and its performance is continuously verified by several European weather services and in more detail also within field campaigns like e.g. in Barthlott et al. (2011). Meteorological evaluation has therefore been limited in this work to surface parameters. In all periods, the comparison of simulated temperature, dew point temperature, wind direction and wind speed show very good agreement with SYNOP measurement data both in terms of temporal variability and average values (Fig. 2). Sometimes the (diurnal) variability is underestimated by the simulations (not shown), which is not unexpected for such coarse grid simulations due to the averaging onto a 0.17 • grid box (e.g. Schlünzen and Katzfey, 2003;Heinemann and Kerschgens, 2005). The means of temperature, wind speed and direction are well reproduced (Table 2). Except for the summer 2006 period, where the model shows a negative bias (Fig. 2), also relative humidity is realistically represented. The negative bias in summer 2006 might be related to an unrealistic initialization of soil moisture. Further investigation is needed to remedy this deficiency.  IFS analysis data were used to initialize and force the model at the lateral boundaries. Within the model domain COSMO runs freely, creating its own dynamics. This is not the best possible setup. Constant data assimilation from observations like it is done for operational analysis (e.g. nudging), or a reinitialization of meteorology after one or two days could further improve meteorology. However, using simple comparison with SYNOP data we found no significant loss in accuracy of the simulation over the whole integration period when compared against several SYNOP stations, suggesting that the lateral forcing provides a sufficiently strong constraint for the meteorology within the model domain. Some of the underestimated (diurnal) variability found would likely be improved at increased resolution.
The modest deficiencies found such as an underestimated diurnal variability are well known to NWP modellers and represent problems such models are currently faced with in general (e.g. Schlünzen and Katzfey, 2003;Heinemann and Kerschgens, 2005). Mean wind speeds simulated by the model, for example, are below 5 % biases at nearly all stations in all periods (Table 2), and temperatures show essentially no bias. Overall, meteorology is well represented and these findings set the basis for a successful air quality simulation. They also highlight one of the key benefits of this modeling system: its direct coupling to an operational weather prediction model.

Mean concentrations
We have calculated the distribution of median pollutant concentrations at all stations in the model domain over each simulation period. Shown in Fig. 3 are the distributions of O 3 , NO 2 , NO, SO 2 , PM 10 and PM 2.5 for different station classes. They are presented by boxplots of the distribution of measured and modelled median values during each season (afternoon values of hours 12:00-18:00 local time) and allow to evaluate accuracy and potentially existing biases in our simulations. Table 2 gives a summary of the mean biases found. O 3 is the measure air quality models often have been "tuned" for. COSMO-ART is no different from other models in its ability to represent this quantity very well. A small but consistent underestimation is visible, but seasonal differences are well captured. In winter 2006 largest (negative) biases are observed, while autumn 2008 matches measurements best (Table 2). Overall biases in the median never exceed 10 ppbv and are often below 5 ppbv. Variability within the distributions is comparable with observations. Overall, a correlation of 0.7 (r) with hourly station values shows that the performance of our O 3 simulations are in the same range as results from simulations with comparable modeling systems like WRF/Chem in Grell et al. (2005).
The O 3 precursors NO and NO 2 measured within the AIR-BASE network show a much larger variability than O 3 itself. The differences between rural and rural/remote stations in concentrations of NO and NO 2 are well reproduced by the model. Spring 2009, summer 2006 and autumn 2008 concentrations are in a similar range, while values more than twice the median of the other seasons were measured during the high pollution episode of winter 2006. The model reproduces this finding very well. NO 2 concentrations vary strongly between station types and season, which the model also represents. However, a comparably strong underestimation is found in summer 2006. Steinbacher et al. (2007) and Dunlea et al. (2007) showed that the often used molybdenum converter based NO 2 measurements are biased high due to the additional conversion of other oxidized nitrogen compounds. This will influence the comparison especially during this period, which is characterized by the high oxidative capacity of the atmosphere due to warm, sunny conditions.  "Rural/coastal" stations show an overestimation throughout all simulation periods (Table 2), a first indication that shipping emissions might be overestimated.
SO 2 levels are generally overestimated, again especially at coastal stations. Only during the summer 2006 period, "rural" stations compare well to modelled results. The increase in SO 2 concentrations during the polluted winter 2006 episode is reproduced, though exaggerated. We argue that  a missing parameterization in COSMO-ART for wet scavenging of gases and the associated aq.-phase oxidation of SO 2 to particulate SO 2− 4 can explain a large part of this SO 2 overestimation. A possible overestimation of SO 2 emissions in the TNO/MACC inventory can also contribute to the observed mismatch. Uncertainties in emission inventories for SO 2 have been shown to be generally large (de Meij et al.,   2006), and even more so for their strongest contributor, international shipping (Endresen et al., 2005), consistent with the stronger overestimation at coastal stations. However, no other species shows a similar overestimation (over land) in our simulations.
Very few measurements were available for NH 3 (3 stations in the Netherlands). At those points, NH 3 levels are on average well represented, but show large variability throughout the simulation period (not shown).
NMVOCs, the components missing to assess the tropospheric chemistry as a whole, could not be thoroughly evaluated due to a lack of long-term, European-wide measurements. A preliminary comparison with total NMVOC measured at Duebendorf (CH) showed good agreement (not shown), which gave confidence that our NMVOC levels are in the correct range, but we could not assess the spatial distribution.

Spatial distribution
Maps of mean afternoon (hours 12:00-18:00 UTC) concentrations over the whole simulation period were produced, overlaid with point indicators of the same mean concentrations at each measurement station ( Fig. 4 for summer 2006 and in the Supplement for the other periods).
The spatial distribution of O 3 and NO x concentrations corresponds with observed values. Only minor differences are found, as for example a large inter-station variability of measured O 3 in Eastern Europe which is not seen in the model, and an underestimation of O 3 concentrations over the Iberian Peninsula during the spring 2009 period. NO x values show no region with exceptional biases over land. Striking, however, are the high modelled values of NO x , but also of SO 2 , over water, along shipping routes in the Mediterranean Sea and the English Channel. The general overestimation of SO 2 concentrations found in evaluation of the mean quantities is clearly visibile throughout Europe for the autumn 2008 period, but less so in the other periods. Modelled SO 2 concentrations at coastal stations in NE Spain are consistently too high, again pointing towards high shipping emission contributions. Apart from that no distinct spatial pattern of overestimation could be found.

Diurnal cycles
The representation of the diurnal cycle of atmospheric constituents was evaluated by means of ensemble plots. The ensemble consisted of all stations which had measurement data for the compound of interest, disaggregated by the classification of Henne et al. (2010). The distribution of concentrations was then calculated for each hour of day, over the whole simulation period. The median and the range covering 70 and 90 % of all stations are shown in Fig. 5 (see Supplement for plots of the other periods).
The simulated daily cycle of O 3 is accurate throughout most seasons and station types. The slight underestimation of mean O 3 concentrations found is visible as a shift of the diurnal cycle to lower values. Only in the autumn 2008 period the modelled diurnal amplitude is noticeably smaller than the measured one.
Simulated NO 2 diurnal cycles also correspond well with observations in most cases. Important aspects like the peaks during morning and evening hours ("rush-hour") visible in the spring 2009 and autumn 2008 periods are reproduced. NO 2 levels during nighttime are overestimated in spring 2009 for rural stations, and in autumn 2008 for rural and rural/remote stations. This overestimation at night could be a consequence of the fact that in reality the station is away from emission sources of NO 2 , though in the model NO 2 is emitted directly into the grid box the station is located in. In spring 2009 (rural stations) and summer 2006 (rural and rural/remote), an exaggerated diurnal amplitude leads to underestimations of NO 2 concentrations during daytime. Here again, the positive measurement bias will have an influence on our comparison with high levels of oxidized nitrogen compounds such as peroxyacetylnitrates (PAN) and HNO 3 in the afternoon, leading to positive biases in the measured NO 2 concentrations (Steinbacher et al., 2007;Dunlea et al., 2007). Simulated inter-station-type variability is comparable with measurements.
Nitric oxide compares well to observations during daytime, but is underestimated at night. The relatively high measured concentrations at nighttime could be an indication for local sources affecting the measurement sites since NO x is mostly emitted in the form of NO and then rapidly converted to NO 2 by reaction with ozone. This interpretation is supported by the comparatively high NO:NO 2 ratios of the measurements. The model, conversely, shows very low NO values as expected for truly remote sites (Carroll et al., 1992;Brown et al., 2004). Overall the diurnal cycle with low values during nighttime, a distinct peak during morning hours and a slow reduction towards evening is captured accurately in all simulated periods.
Only 3 measurement points were available to investigate the simulation quality of NH 3 , and all were located in the (highly NH 3 loaded) Netherlands, making this comparison relatively uncertain. While NH 3 mean concentrations were comparable to measurements, the diurnal cycles were not (both not shown). The measured cycles were very variable throughout seasons and stations, and we see a clear deficiency of the modeling system to account for this variability. The main sources of NH 3 emissions are agricultural activities, especially livestock and manure. NH 3 concentrations are mostly dominated by local emissions. It is known that the diurnal cycle of NH 3 emissions strongly depends on the emission source (Reidy et al., 2009). Ellis et al. (2011) showed that bi-directional fluxes between the atmosphere and land surfaces might be needed to accurately simulate NH 3 (and associated aerosol) levels. All this makes  modeling such emissions a major challenge which is currently not accurately addressed in most models , as emission inventories based on spatially distributed emission totals and associated, statistically averaged time functions cannot capture such process-based emissions.

Satellite observations
For comparison with OMI satellite information, vertical tropospheric columns (VTCs) of NO 2 were calculated from model output for the hour of the satellite overpass (13:30 local time, approx. 12:30 UTC over Europe). The height of the troposphere was assumed to be fixed over all simulations at 10 km geometric height, the exact choice has little influence on the NO 2 columns. The comparison was made only where OMI data were available at each overpass and the conditions were nearly cloud-free (cloud radiance fraction reported by OMI retrieval <50 %, corresponding to approx. <20 % cloud coverage). The arithmetic mean over each simulation period was calculated and the results are shown in Fig. 6. The aggregated mean biases for all grid points, land points and sea points can be found in Table 2. We compared the model simulated NO 2 columns directly with the respective EOMINO columns without taking into account the averaging kernels which would remove the dependency of the result on the a priori NO 2 profiles used in the EOMINO retrieval. Not accounting for the averaging kernels might introduce biases of the order of 30 % with EOMINO columns tending to be too high over remote locations and too low over polluted areas (Russell et al., 2011), while differences averaged over Europe are likely to be small (Huijnen et al., 2010).
Spatial distribution and magnitude of NO 2 is in good agreement with our modeling results. Highly polluted regions over the Netherlands and southern United Kingdom, as well as the Po Valley (Italy) are accurately captured. Plumes of large urban agglomerations (Paris, Madrid, Berlin, Warszaw) are comparable in extent and magnitude. Also, cleaner regions like for example southern France are reproduced. Notable differences are mostly found in polluted coastal areas, especially in the Mediterranean Sea, where the model tends to overestimate NO 2 concentrations over water, particularly in the autumn 2008 and spring 2009 period. This overestimation is also visible in the mean over all grid points over sea in Table 2. Emission estimates for ship traffic are known to have large error margins both in magnitude (Corbett and Koehler, 2003) and spatial allocation . From the magnitude of the error and the spatial correlation with main shipping routes an overestimation of ship emissions by the inventory used is likely. This would also explain the consistent overestimation of SO 2 concentrations at coastal stations in NE Spain. Seasonal differences are captured for spring, summer and autumn, only the model results for the winter 2006 period overestimate NO 2 columns noteably in Northern and Eastern Europe.

Aerosol characteristics
All comparisons of measured and modelled particulate matter were made in an as rigorous as possible manner. For PM 10 and PM 2.5 bulk mass and NR−PM 1 AMS measurements, the modelled log-normal distribution functions were integrated over the respective size ranges, and size cut functions were employed to simulate the size-dependent transmission efficiency that is typically found in the measurement instruments used. See Appendix B for a description of the transmission functions used. For the AMS the modelled quantities were additionally converted to vacuum aerodynamic diameter (DeCarlo et al., 2004). No transmission functions were applied to number size distribution measurements, the modelled values are derived from integration over the exact intervals given: 30 to 50 nm, 50 to 500 nm, 100 to 500 nm and 250 to 500 nm, respectively.

Bulk mass
Continous bulk aerosol mass measurements are the least available within the measurement dataset, making the ensemble of stations for comparison very small (max. 8 stations). When looking at PM 10 concentrations (Fig. 3 (Table 2).
"Rural"-type stations are deemed the most representative for such a model evaluation, and they show (except in autumn 2008) an underestimation typical for many regional models (see e.g. Stern et al., 2008), probably due to missing sources (e.g. resuspension, secondary organics, local mineral dust sources, missing aq.-phase conversion of SO 2 to SO 2− 4 ). Stations of type "rural/coastal", in contrast, have a tendency towards more positive biases, which is reasoned by the high amounts of seasalt aerosols found at these stations in the modeling results. The overestimation could also be an artefact of the limited model resolution: coastal stations may be located in grid cells partly covered by sea where sea salt aerosols are therefore emitted directly. Further investigations, e.g. comparisons with filter samples, are needed to assess if the amount of seasalt from the parameterization in COSMO-ART is realistic. The very high PM 10 concentrations in winter 2006 are not accurately represented in the model. There is in fact no visible increase in PM 10 concentrations in the model results compared to the other seasons at all.
The diurnal cycles for PM 10 show that simulated concentrations are often in the same order as the measured values, both in variability and evolution in time, although overall the simulated values are mostly too low. Winter 2006, the period with very high PM levels, has no observable diurnal cycle. In spring 2009 and summer 2006, the diurnal cyles at rural stations show a PM 10 maximum during night and a minimum at noon, which is -although shifted to lower values -reproduced by the model. The diurnal cycle for rural stations in autumn 2008 is characterized by high but constant PM 10 levels during nighttime and a drop in concentrations during the day. The model reproduces this finding to a certain degree, although the amplitude of the drop is underestimated.
Only 7 stations, from 3 different categories, had measurements for PM 2.5 for our simulation periods. From this uncertain data basis we see equally large disagreements as have been found for PM 10 . McKeen et al. (2007) compared PM 2.5 measurements with several air quality forecast models in North America and concluded that, while most of the models are able to accurately represent daily average PM 2.5 concentrations, there are substantial inconsistencies in representing the diurnal cycle. Most models show a negative bias and exaggerate the diurnal variability, something we can observe also for the (single) rural PM 2.5 station in our comparison (Fig. 5).
The errors are in a similar range as found in other model simulations. Vautard et al. (2007) showed similar performance problems in simulating PM 10 in Europe. Stern et al. (2008) saw better agreement with measurements (i.e. less underestimation) for PM 2.5 simulations than for PM 10 , which we could not confirm with the dataset mentioned above.

Aerosol optical depth
For comparison with MODIS AOD data, a similar procedure was employed as for OMI NO 2 vertical tropospheric columns, only using grid points for which satellite data were available and which were cloud-free also in the model. The whole vertical column in the model was used in the calculation of aerosol optical depth with the method described in Vogel et al. (2009). All aerosol categories (internally and externally mixed Aitken and accumulation modes, soot, mineral dust and sea salt modes) contribute to calculated AOD. Then, the median was calculated over the whole simulation period. We chose the median instead of the mean to be more robust against outliers. Figure 7 presents the results. Furthermore, as for the comparison with OMI NO 2 VTCs, aggregated biases have been calculated and can be found in Table 2.
For all AERONET stations in the model domain, timelines of AOD at 550 nm were calculated from model output and compared against measured values. AERONET data were interpolated (if no direct measurement at 550 nm was available) linearly in log-log space. In case MODIS data were available also this information was added to the plots. The results for selected stations are shown in Fig. 8, plots for the remaining stations can be found in the Supplement, and Table 2 shows the mean biases for these comparisons.
The comparison against these two independent sets of AOD measurements leaves a mixed picture: compared with MODIS, the model shows consistently lower values than derived from the satellite. We can capture regions with continuously high AOD values like the Po valley (northern Italy) or Saharan dust events like e.g. in the summer 2006 period over the western Mediterranean Sea. The magnitude of the dust event is underestimated, which might be explained by the fact that modelled "dust" is only created within the region of the model domain which covers only a small part of the Sahara.  At the boundaries only very low dust concentrations are prescribed due to the way aerosol boundaries are treated (see Sect. 2.1). Contribution of sea salt to AOD is visible over the Atlantic ocean, but the absolute values are much lower than MODIS derived values, except for winter 2006. Some very polluted regions in south-eastern Europe are captured in location and magnitude (e.g. in Northern Croatia/Southern Hungary), while several other "hot-spots" visible from the satellite (e.g. Eastern UK coast) are missed. AERONET data has been cloud-screened by the data provider. MODIS data is also cloud screened. Modelled values are masked if simulated total cloud cover was above 25 %.
Comparison with AERONET station data reveals additional details. Although the absolute levels are often too low, which is consistent with our comparison with MODIS data, the temporal evolution is often well represented and most high AOD events visible in station data are also observed in our simulations. Differences between MODIS and AERONET derived AOD on the other hand are at several occasions as big as the differences between model and AERONET, and non-negligible on average (up to 10 % compared to up to 60 % difference between model and measurements, see Table 2). We suggest that the water in the aerosol (both simulated and in reality) will play a major role in the differences found. Both, MODIS and AERONET data, are "cloud-screened", i.e. data points contaminated by clouds were removed, as they give erroneously high AOD values. Capturing the onset of a cloud is difficult, so some increase in AOD due to aerosol water might be left in the dataset. These effects are visible within the satellite data shown in Fig. 7 (e.g. over Germany in autumn 2008 or west of Ireland in spring 2009) near regions with missing (cloud-screened) pixels. Also in several AERONET stations the sudden steep increase of AOD just before measurements are filtered (for clouds) can be found. While we tried to remove this error by using median values instead of the arithmetic mean to calculate the MODIS-model comparison, we probably could not exclude all of those situations. As the effect is non-linear and acts towards very high AOD values, this will probably bias AOD results. Secondly, differences in simulated and real aerosol chemical composition will also have an effect on AOD. The next section addresses a comparison of aerosol chemical composition.  A clear negative bias in absolute AOD is seen in our model when compared with two independent measurement datasets which appears to be consistent with the too low simulated PM 10 and PM 2.5 levels. Fair correlation of the evolution in time is visible from the AERONET comparison. Performance of our AOD simulations is well in range of results for comparable modeling systems (e.g. Zhang et al., 2010;Aan de Brugh et al., 2011). We argue that both missing aerosol mass at the lateral boundaries and inaccuracies of simulated aerosols within the domain contribute to the underestimated AOD. Especially for aerosol components from natural sources (Saharan dust) the missing lateral contribution could be substantial. Although we tried to remedy this by using averaged profiles from a previous run, we could not -especially for those categories -represent the absolute mass contributions correctly. The impact of the missing pathway to form sulfate in clouds and the known too small yield of SOA in the SORGAM model are additional sources of error that impact the overall accuracy of the comparison.

Chemical composition
Aerosol chemical composition was evaluated by comparison with AMS data. In summer 2006, AMS measurements were available at Payerne (CH) and Bush (UK) (Lanz et al., 2010). Several AMS instruments were deployed during the 2008 (autumn) and 2009 (spring) periods at stations throughout Europe. Timelines of the composition of NR−PM 1 are presented for both measurement and simulation at these stations. Shown in Figs. 9, 10a, and 10b are the timelines for the autumn 2008 and spring 2009 periods. The comparison for summer 2006 (3 stations) can be found in the Supplement. In the figures, colors typically used in the AMS community are used to represent each species: ammonium (NH 4 ) in orange, sulfate (SO 4 ) in red, and nitrate (NO 3 ) in blue. Organic aerosols (OA) are represented as shades of green. Charges are omitted intentionally for the AMS in the figure legends, as also contributions from organosulfates, organonitrates are included which are not ions (Farmer et al., 2010). In case of modelled values, a distinction can be made between anthropogenic primary organics (aPOA), secondary organics from anthropogenic (aSOA) and biogenic (bSOA) sources. Table 2 presents the mean biases for each species over all stations in each season.
At all stations the time evolution of NR−PM 1 is represented well by our simulations, sometimes however for the wrong reasons due to a mismatch in chemical composition. Single events with higher aerosol concentrations (e.g. in Vavihill, 2008, Fig. 9) correspond in time and magnitude with the observations in most cases. Several model deficiencies can also be seen throughout the comparison, namely an overestimation of nitrate components and an underestimation of sulfate and, sometimes, organic mass. In the following we will briefly discuss the result for each station.
In Switzerland, measurements at Payerne were available for three periods. The time evolution of total aerosol mass corresponds best in spring 2009, and worst in the summer 2006 period. The weak correlation in summer 2006 is mostly due to a severe underestimation of OA, especially during   is also reported by the model. OA are, however, underestimated, and nitrate aerosols overestimated. Here also, modelled aerosol nitrate shows a persistence to remain in the aerosol phase during daytime that is not found in the observed values. Melpitz in Germany differs from Payerne in a generally higher sulfate content. Otherwise those stations report similar aerosol composition. Striking is the stronger overestimation of nitrate aerosols at Melpitz, compared to Payerne, in both periods. Simulated sulfate is in the same range as in Payerne, and therefore even more strongly underestimated. The concentrations of organics are lower in Melpitz, and simulated values are comparable here. The third station with more than one period of measurements is Vavihill (SE). Generally low aerosol concentrations alternate with isolated peaks in aerosol mass with high contents of inorganic secondary components. This burst pattern is captured in our simulations, and also the timing fits mostly well. Especially in spring 2009 the model lacks, though, the OA mass necessary to fit the measurements. While ammonia levels are comparable in autumn 2008, they are above measured    Asmi et al. (2011) for number size distributions during the autumn 2008 simulation. N 30to50 : 30 to 50 nm, N 50 : above 50 nm, N 100 : above 100 nm, N 250 : above 250 nm. Note that the N 250 parameter has a larger uncertainty than the others due to very low sampling rates. All other stations only report data for one period. In autumn 2008, measurements of aerosol chemical composition were also available for K-Puszta, Hungary. The station reported high aerosol concentrations with levels up to 30 µg m −3 total mass. While the model represents the buildup of aerosols towards the middle of the observation period, the overall mass is underestimated. Too high nitrate levels are simulated. Organics and ammonium match observations better, but sulfate tends to be understimated also at this location. Four more stations reported data during spring 2009: Cabauw (NL), Helsinki (FI), Barcelona (ES) and Montseny (ES). Cabauw (NL) has lower concentrations than e.g. Payerne or Melpitz, and a big gap in measurements during the first half of our simulations. There is some resemblence in the peaks of aerosol mass during the second half of the simulation between model and station values. Nitrates are overestimated while ammonium and sulfate are too low. Organics are well captured. Helsinki (FI), an urban background station is, like Hyytiälä (FI) characterized by a strong contribution from sulfate. The simulated total aerosol loadings are comparable to the observed concentrations but do not match in composition. We can tentatively explain this difference by looking beyond the border of the model domain: both stations are in the vicinity of large sources of SO 2 on the Kola peninsula in Russia (Tuovinen et al., 1993) which are still found to be underestimated in current emission inventories (Prank et al., 2010). Additionally, due to the setup of aerosol boundary conditions in our modeling system, we very likely underestimate direct sulfate inflow in this region. In Barcelona (ES), also an urban background location, a very variable time series is reported, with the highest absolute concentrations of all stations used in this analysis. Several peaks of aerosol concentration each day are common, containing relatively high sulfate levels compared to other stations. The model produces a similar variability, although it overestimates nitrate. Sulfate levels are comparable at this site with a large influence from shipping. OA concentrations are, in contrast to most other stations, overestimated at Barcelona and Helsinki. The largest contributor to simulated total organic mass at these stations is primary emitted organics. Statistical analysis of the organic fraction (positive matrix factorization (PMF), Paatero and Tapper, 1994) indicates that organics in urban stations are comprised of similar amounts of SOA and POA, while in the model it is almost exclusively POA. This points towards a strong underestimation of secondary organics in polluted regions as it has been found already by Fast et al. (2009). Finally, the AMS in Montseny (ES) measured a time-series with several periods with increased aerosol loadings, and during the first third a period where almost no aerosols were found due to an episode of strong Atlantic advection. The model captures this period well. Total organics are comparable throughout the simulation period, although a PMF analysis gives about 5 % mass contribution from urban primary organics (Minguillón et al., 2011), instead of about 30 % as given in the model. Nitrates are too high, and lacking the diurnal cycle visible in the measurement. Simulated sulfate is below measurements.

Number concentrations and size distributions
The dataset compiled by Asmi et al. (2011) provides a homogenized overview of the statistical characteristics of aerosol size distributions in Europe during the years 2008 and 2009. We evaluate different particle dry size separated subsets of the number concentrations, following Asmi et al. (2011). The number of particles from 50 (N 50 ) and 100 (N 100 ) nm up to 500 nm have been chosen as proxies to study climate effects. Health concerns are related to very small particles, which are assessed by comparing number concentrations of particles between 30 and 50 nm (N 30to50 ). This concentration can also serve as an indicator of new particle formation and emissions from combustion processes. Finally, the number of particles with diameters between 250 and 500 nm (N 250 ) are given to show the contribution of larger particles to total aerosol number concentrations. We have calculated the corresponding model values by integrating the aerosol modes over the respective intervals. Data were available in up to hourly resolution, so a direct comparison could be made between modelled and simulated values. Table 3 shows the resulting comparison for the autumn 2008 period, Table 4 for spring 2009. Table 2 gives a summary overview of the mean biases over all stations.
We also studied the histograms (occurrence distribution) of logarithms of the number concentrations in the particle size ranges (not shown). The analysis was done in logarithmic concentration space as most of the aerosol number concentrations are log-normally distributed (Asmi et al., 2011). It shows the model's ability to produce similar distributions of number concentrations as measured and provides a more detailed way to analyze the differences. We also performed a Mann-Whitney U-test (Higgins, 2004) on the modelled and measured concentration distributions to see with what p-value they could be considered to be from the same distribution with similar mean and distribution shape.
The histograms of number concentrations show that the agreement is better in greater diameter size ranges (N 100 and N 250 ) in comparison to concentrations in N 30to50 size range. The model seems to overestimate the number concentrations in the smaller size ranges by a factor of two to five, especially in Harwell (UK), Ispra (IT) and the two Swedish stations (Aspvreten and Vavihill). This overestimation could be explained by a relatively low fraction of new particle formation in the modelled environment. COSMO-ART uses the nucleation parametrization from Kerminen and Wexler (1994), which does not generally produce the observed amounts of nucleated particles in the European boundary layer. Thus the overestimation could be due to a disproportioned amount of emitted sulphur to be considered as primary Aitken particles, which have a much higher lifetime in the atmosphere compared to newly nucleated particles in these regions. For the larger particle sizes (N 100 and N 250 ), the model-measurement comparison is more successful. At Central European stations the modelled and measured concentration distributions are generally of similar shape and median, which is well demonstrated by p-values ranging from 0.31 to 0.66 in the U-test test parameter for Kosetice and Melpitz. The overall shapes of the concentration histograms are generally similar in all the stations, although some discrepancies in lower-concentration regions are visible. The agreement is generally poorer in lower-concentration regions of Northern Europe, but also in Cabauw (NL) and Harwell (UK) N 250 concentrations, where the model overestimated the concentrations by a factor of 2 in 2008.
A second dataset available from Asmi et al. (2011) is seasonal statistics of aerosol number size distributions. We have calculated a distribution function as mean over all modelled values in each simulation period, and compared it against the measured distribution statistics of the corresponding season ( Fig. 11 for autumn 2008, plots for spring 2009 can be found in the Supplement. Note that there is no exact match between the time periods covered by the measurements and the simulations (3 weeks out of the 3 months). Overall, the modelled size distributions are at several stations close to the observed ones. At most stations, simulated size distributions were within the central 67 % percentiles of the values reported by Asmi et al. (2011) when comparing the 20 to 200 nm size range, for which the instruments were reported to compare the best . Concerning the shape of the size distributions, stations with the best match between model and measurements were Melpitz (DE), Waldhof (DE) and Kosetice (CZ), with only very small deviances in both years. Aerosol number size distributions at the rather polluted sites Ispra (IT) and K-Pustza (HU) show distribution functions with comparable peak values but opposite skewness. While model values lean towards smaller diameters, measurements have their peak in number concentration at much larger aerosol diameters. For Ispra (IT) this is probably due to the influence of the Milan urban agglomeration. Due to the coarse horizontal resolution, fresh emissions (with smaller diameter) contribute much more to aerosol composition at Ispra in the model than in reality, where the aerosol had more time to age. This ageing would shift the size distribution towards larger diameters via coagulation as observed in the measured distributions. A similar explanation might hold for K-Puszta (HU), which is located near Budapest, the capital of Hungary. Cloud processing of aerosols is missing in COSMO-ART and might be responsible in general for a bias towards small peak diameters. Cabauw (NL) and Vavihill (SE) show comparable shape but model and measurements disagree in number concentration. Both, Birkenes (NO) and Harwell (UK) show a tendency towards a bimodal size distribution, which is captured by the model in 2008, but missed in the 2009 case. Finally, Mace Head (IE), with its large variability in number concentrations reasoned by the stations setting at the coast in western Ireland, representing mostly clean maritime air masses, occasionally interrupted by continental influences, shows acceptable agreement in terms of total number concentrations, but no clear agreement in size distribution.
In general the model has an acceptable representation of the variability of number concentrations between stations (Table 3)

Sulfate
This aerosol species is virtually always underestimated. Several factors contribute to this error: Besides some minor direct emissions of sulfate particles, most of the aerosol sulfate is secondary, created from oxidation of SO 2 in the gasphase and within the aqueous-phase in cloud droplets. Studies have shown that the amount of sulfate produced in clouds is substantial and even dominating (Walcek and Taylor, 1986;Rasch et al., 2000). COSMO-ART currently lacks a parameterization for this pathway. Therefore, especially during periods with cloudy conditions, the underestimation of SO 2− 4 is likely explained by this missing process. The missing conversion of SO 2 to SO 2− 4 is also consistent with too high levels of SO 2 in our model. Aksoyoglu et al. (2011) simulated the summer 2006 period with another modeling system including in-cloud oxidation of SO 2 and found better agreement.
Whether this can be attributed to in-cloud sulfur oxidation is unclear. In addition to the oxidation issue it was shown that the regional (e.g. Wagstrom and Pandis, 2011) and even intercontinental (e.g. Liu and Mauzerall, 2007) contributions to sulfate aerosol mass are higher than for other aerosol categories like nitrate. Inflow of aerosol concentrations at the lateral boundaries is realized by a smooth transition to values from a given profile or a coarser grid model, this is called relaxation. While we do relax our model at the lateral boundaries against data from a global chemistry transport model (CTM) for gas-phase species, we could not provide similar boundary conditions for aerosol species. Instead we relax against a mean profile from a previous run (which is also low in sulfate). Therefore, only very little long-range transport of sulfate is simulated (approx. 0.2 to 0.4 µg m −3 surface concentration), contributing to this underestimation. A sensitivity study with strongly increased lateral sulfate showed a noticeable but insufficient increase of sulfate at the grid boxes of the AMS measurement stations. Finally, oceanic emissions of dimethyl sulfate (DMS) have also been shown to contribute to aerosol SO 2− 4 levels (Gondwe et al., 2003). A parameterization has recently been included (Lundgren, 2010) in COSMO-ART but was not yet used in our studies. Sensitivity studies showed, though, that sulfate originating from maritime DMS emissions has no substantial influence over continential regions, which again indicates the importance of cloud processing of SO 2 . Oxidation of sulfates in clouds will be included via a comprehensive wet scavenging and aqueous-phase chemistry scheme, currently under development at Empa.

Organics
Often also organic aerosol contributions are underestimated. This is a well-known problem of current CTMs (Volkamer et al., 2006;Hodzic et al., 2009;Hallquist et al., 2009), in our case reasoned by the use of an older parameterization of the conversion of condensable organic vapours to secondary organic aerosols (SOA) (Schell et al., 2001), based on the two-product method by Odum et al. (1996).
Our total OA underestimations are substantial and reach factors of 2. Underestimations for SOA alone by a factor of 10 or more were summarized by Volkamer et al. (2006) and Hodzic et al. (2010) for multiple polluted regions in 3 continents using SOA modules similar to ours. Compared to the current state of knowledge our SOA parameterization has too low yields, and is lacking the description of semi-volatile and intermediate volatility species as implemented in e.g. the volatility basis set approach (Donahue et al., 2006;Murphy et al., 2011). The particular SOA module used in this work (MADE/SORGAM) has been shown to underpredict SOA formation by about a factor of 10 in the Mexico City region ). Thus it is very likely that a strong underprediction of pollution-related SOA is compensated by an overprediction of anthropogenic POA , to result in a lower underprediction of total OA. The comparisons with the AMS deployed in Barcelona/Helsinki further support this hypothesis: these two stations were located within an urban area. There, the model overestimates total organics, and attributes the large majority of the mass to primary organics, while the measurements show a major fraction of secondary organics, as is typical of most urban areas Jimenez et al., 2009). Furthermore, emissions from forest fires were not included in our simulations, althought it is known that they can be a major OA contributor (e.g. Aiken et al., 2010). Emissions of biogenic SOA precursors, and the effectiveness of the conversion pathways, are still in discussion and will also contribute to the discrepancies found. Finally, domestic wood burning has been shown to release substantial amounts of OA in wintertime , but also these emissions were not included. Work is currently underway to integrate all these recent developments in SOA and emissions modeling in COSMO-ART.

Nitrate
The most substantial bias found in our simulations is an overestimation of nitrate aerosol components. This is not a new phenomenon and seen also in other model evaluations (e.g. Stern et al., 2008). Accurately modeling this species is challenging (Dentener and Crutzen, 1994), as it represents the result of a dynamic, coupled system between gas-and aerosol-phase, depending on the amount of gas-phase precursors, temperature, relative humidity and aerosol composition (cf. Chapter 9 in Seinfeld and Pandis, 2006). We have tested several hypotheses to understand this deficiency in our model. We could exclude an erroneous nighttime or daytime chemistry (e.g. providing too much HNO 3 ) and emission sources (too high levels of NO x ). Evaluation against nitrate totals (daily averages of gas-phase HNO 3 + particulate NO − 3 from impregnated filter packs at the station Payerne (CH)) showed some high bias, but the overestimation is much smaller than for nitrate alone. Three hypotheses seem likely: the lack of sulfate, missing wet deposition of HNO 3 and inaccuracies in the model's ability to reproduce relative humidity and temperature well enough.
In experiments, available ammonia is first neutralized by sulfuric acid and only if no more sulfuric acid is available, nitric acid serves as a replacement to form NH 4 NO 3 aerosols (Seinfeld and Pandis, 2006). Suppose now that ammonia is limited, then the mass of nitrate found in the aerosol depends also on the amount of sulfate substantially. As our simulations currently underestimate sulfate, the higher amount of available NH 3 will combine to "excess" NH 4 NO 3 . As mentioned earlier, we could not assess the accuracy of the modeling system regarding NH 3 concentrations due to the lack of measurements. If ammonia is overestimated this will strongly influence this system as well.
Secondly, due to the missing wet scavenging of gases, HNO 3 is not removed from the atmosphere as effectively as in reality. As HNO 3 is a very hydrophilic substance it will easily transfer to the aqueous-phase and is therefore efficiently scavenged. This coud lead to the observed too high levels of total nitrate, which are then subject to gas-aerosol partitioning.
Thirdly, the gas-aerosol partitioning for nitrates has a strong temperature dependence. If our model cannot represent daytime temperature maxima and minima to a high degree of accuracy, this will lead to errors in the partitioning. It is also known that the phase-state solid/liquid is a strongly non-linear function of aerosol chemical composition and relative humidity (see Figs. 2, 5 and 7 in Nenes et al., 1998), consequently changing the partitioning behaviour between gas and aerosol phase. As the lifetime of gas-phase HNO 3 is much shorter than for particulate NH 4 NO 3 , too strong partitioning to the aerosol phase results in too strong survival of total HNO 3 + nitrate in the atmosphere, which contributes to overpredictions of nitrate at later times.
In summary: firstly, if the model underestimates sulfate more ammonia is available to form nitrate aerosols, which leads to an overestimation. Secondly, if not enough HNO 3 is removed more total nitrate is available for partitioning. Thirdly, even small differences in modelled temperature and relative humidity compared to the situation at the instrument could change the nitrate gas/aerosol partitioning. Finally, if NH 3 concentrations are overestimated themselves, even more NH 3 is available to neutralize NO − 3 , and even more ammonium-nitrate is formed. We think those processes together explain a large part of our overestimation of nitrate aerosols. Other models are better able to simulate mean nitrate concentrations, for example the Comprehensive Air quality Model with extensions (CAMx)/particulate matter CAMx (PMCAMx) (e.g. Aksoyoglu et al., 2011;Andreani-Aksoyoglu et al., 2007) or WRF/Chem (e.g. Li et al., 2010), but once the diurnal cycle of gas-/aerosol partitioning of nitrate is looked at, also these modeling systems exhibit problems.
We have seen that there are substantial model deficiencies to accurately describe nitrate aerosols and that this is a general problem also for other, comparable model systems. One straightforward way to improve the situation will be to increase the horizontal resolution of the simulations to better represent the variability in temperature and relative humidity. A better knowledge on NH 3 emissions and concentrations is needed. The impact of the implementation of a comprehensive wet phase chemistry scheme will be investigated in a future study.

Number size distributions and concentrations
The evaluation showed a small but consistent high bias of modelled number concentrations in both periods. As it can be seen from comparison with the number size distributions this is often caused by overestimated particle numbers in smaller diameter regions, pointing towards either a too high number of particles emitted in the Aitken mode, overestimated nucleation rates, or underestimated coagulation. We consider the nucleation scheme of Kerminen and Wexler (1994) used to contribute to these differences. No explicit nucleation mode exists in COSMO-ART, hence secondary generated sulfate particles are transferred directly into the Aitken mode. A fixed factor is applied here to reasonably reduce number concentrations that get lost through coagulation during the growth from freshly nucleated clusters to Aitken-mode-size particles. In case the number of existing particles (e.g. in strongly polluted regions) does not match the assumptions made for this conversion factor, formation rates of Aitkenmode-sized particles through nucleation of SO 2− 4 are under-/overestimated. The distribution of emitted particle mass on accumulation and Aitken mode follows a recent publication of Elleman and Covert (2010), which used a similar aerosol module. They did not find high variability in used emission diameters for different categories in North American simulations, and therefore allowed a very simple description of the size distribution of emitted particles (a time and emission source category invariant split based on total emitted mass). However, their study did not consider modern new particle formation parametrizations. As a result they could have considered at least some of the particle precursors as primary emissions. A study by Spracklen et al. (2010) over Europe came to a different conclusion, indicating substantial variability in emitted number size distributions and high importance of adequate representation of new particle formation.
Even though COSMO-ART does not consider some of the more recent findings on new particle formation or nucleation (e.g. Kerminen et al., 2010), the overall ability of the model to reproduce the measured size distributions was adequate. We consider this to be an evidence that using properly derived emission factors, the overall transformation from emissions to CCN sized particles can be somewhat captured using mostly primary-emission based methodologies. However, the overall agreement between modelled and measured values were in general poorer in N 30to50 range in comparison to larger size ranges, especially in more remote areas, suggesting a need for a better mechanism to account for the differences between primary and secondary formation. We did not assess, however, the relative contributions of nucleation (and condensation) versus primary emissions (and condensation) to number concentrations in smaller diameters, so these findings will need further study. Finally, the missing description of cloud processing will influence number size distributions and likely shift the distribution to larger diameters. This could explain part of the overestimation for small particles found.
We conclude for the number size comparison that the approach used is comparable with methods currently taken by other modeling groups, but that there is considerable uncertainty that needs to be better understood for future simulations. The overall acceptable agreement between modelled and measured N 100 and N 250 concentrations suggests that the pool of CCN-sized particles simulated in these periods are generally well captured in comparison with measurements.

Conclusions
Our goal has been to thoroughly evaluate the online-coupled, regional-scale chemistry-transport-model COSMO-ART for its ability to simulate trace gas concentrations and aerosol characteristics. The evaluation dataset we have collected allows for a comprehensive assessment of model performance at the surface throughout Europe. Comparison with only recently available measurements of aerosol chemical composition (AMS) and aerosol size distribution data was particularly valuable. Not included in our work has been an evaluation of vertical profiles and upper-air variables of chemical and meteorological parameters with aircraft or radiosonde measurements.
Surface meteorological conditions are very well simulated in all periods investigated without any need for tuning. However, there is room left for improvements through data assimilation and nudging. Results for gas-phase tracer and bulk aerosol mass concentrations are encouraging, also for rather difficult periods like winter 2006. Both, temporal and spatial distributions of O 3 and NO x are in good agreement with observations. The lack of a coordinated, european-wide measurement network for NH 3 and NMVOC impairs our ability to wholly evaluate gas-phase chemistry, and a missing homogenized elemental carbon dataset hinders evaluation of this aerosol component. From the more advanced datasets, aerosol chemical composition and size distributions, we can conclude that the modeling system is able to represent those quantities with an acceptable degree of accuracy, although nitrate aerosols tend to be overestimated and sulfate underestimated. Not only is the temporal evolution of aerosol mass correctly reproduced, including distinct peaks seen on several occasions and places, but also the chemical composition is quite comparable to reality though some deficiencies have been found. In addition, we could show that the modeling system is able to represent these quantities in an acceptably size-resolved manner -a quantity that is indispensable for correct quantification of climate and health effects. Some deficiencies have been identified in the model system. Most of them will be adressed in the near future by already ongoing developments.
These are: -wet scavenging for gases and wet-phase chemistry/parameterization of in-cloud oxidation of SO 2 to SO 2− 4 ; -update of the representation of secondary organic aerosol components; -realistic lateral boundary conditions for aerosol species; -representation of number size distribution and concentrations in primary emissions of aerosol particles; -inclusion of forest fire emissions.
Some discrepancies found are more likely related to the simulation setup rather than the model system itself. An increase in horizontal resolution will be key to address those issues. Continuous assimilation of meteorological measurement data is another method which will likely result in improvements.
The coupling to a meteorological core that is actively used and developed for both short-term weather forecasting as well as climate simulations is regarded as a key benefit. We conclude that the model is suitable for air-quality assessmentsand the framework is set to evaluate the accuracy of aerosol-climate interactions. Only after our evaluation results are known, more complex studies of e.g. climate impacts, can be conducted reliably. advice and help she gave us via email. The work of the AMS measurement group from Manchester has been funded by ACCENT and the UK Natural Environment Research Council (NERC). The Colorado AMS group was supported by NSF ATM-0919189 and NOAA NA08OAR4310565. The AMS measurements at Melpitz were supported by the Umweltbundesamt (UBA) grants no. 351 01 031 and no. 351 01 038, and UFOPLAN contract 3703 43 200.
Edited by: A. Stenke