How realistic are air quality hindcasts driven by forcings from climate model simulations ?

Introduction Conclusions References

Abstract.Predicting how European air quality could evolve over the next decades in the context of changing climate requires the use of climate models to produce results that can be averaged in a climatologically and statistically sound manner.This is a very different approach from the one that is generally used for air quality hindcasts for the present period; analysed meteorological fields are used to represent specifically each date and hour.Differences arise both from the fact that a climate model run results in a pure model output, with no influence from observations (which are useful to correct for a range of errors), and that in a "climate" set-up, simulations on a given day, month or even season cannot be related to any specific period of time (but can just be interpreted in a climatological sense).Hence, although an air quality model can be thoroughly validated in a "realistic" set-up using analysed meteorological fields, the question remains of how far its outputs can be interpreted in a "climate" set-up.For this purpose, we focus on Europe and on the current decade using three 5-yr simulations performed with the multiscale chemistry-transport model MOCAGE and use meteorological forcings either from operational meteorological analyses or from climate simulations.We investigate how statistical skill indicators compare in the different simulations, discriminating also the effects of meteorology on atmospheric fields (winds, temperature, humidity, pressure, etc.) and on the dependent emissions and deposition processes (volatile organic compound emissions, deposition velocities, etc.).Our results show in particular how differing boundary layer heights and deposition velocities affect horizontal and vertical distributions of species.When the model is driven by operational analyses, the simulation accurately reproduces the observed values of O 3 , NO x , SO 2 and, with some bias that can be explained by the set-up, PM 10 .We study how the simulations driven by climate forcings differ, both due to the realism of the forcings (lack of data assimilated and lower resolution) and due to the lack of representation of the actual chronology of events.We conclude that the indicators such as mean bias, mean normalized bias, RMSE and deviation standards can be used to interpret the results with some confidence as well as the health-related indicators such as the number of days of exceedance of regulatory thresholds.These metrics are thus considered to be suitable for the interpretation of simulations of the future evolution of European air quality.

Introduction
The issues of climate change and air quality are intertwined; anthropogenic emissions contribute to climate change, and the evolution of the climate through changes in meteorological parameters (temperature, precipitation) impacts concentrations and distributions of pollutants in the atmosphere.In the lower troposphere, ozone (O 3 ) is a pollutant that affects human health (WHO, 2004;Schlink et al., 2006) and causes damages to crops (Fuhrer and Booker, 2003) and ecosystems.O 3 is a secondary pollutant; its principal precursors are carbon monoxide (CO), volatile organic compounds (VOCs) and nitrogen oxides (NO x ) emitted by both natural (biogenic) and anthropogenic (transport, industries)  upon meteorological conditions such as temperature and precipitation.During summer, conditions of high temperature and low precipitation favor oxidant accumulation, and surface concentrations of O 3 reach high values (Guicherit and van Dop, 1977;Sillman, 2000) and have the potential to exceed air quality standards.These conditions also favor the production of secondary pollutants such as sulphate and nitrate aerosols, and organic aerosols which can contribute to the high levels of particulate matter (PM) during summertime.Nevertheless, the frequency and intensity of pollution episodes vary considerably from year to year depending on weather; as an example the summer 2003 heat wave in Europe has been associated with exceptionally high O 3 concentrations (Langner et al., 2005;Vautard et al., 2005Vautard et al., , 2007;;Guerova and Jones, 2007;Solberg et al., 2008).In fall and winter, stagnant conditions also enhance levels of primary pollutants (SO 2 , NO x ) in the atmosphere and thus concentrations of PM 10 (particles with an aerodynamic diameter smaller than 10 µm), another pollutant of concern connected to air quality and health problems.
The interactions between climate change and air quality have been already extensively studied.At the global scale, studies (Prather et al., 2003;Dentener, 2006) have for instance evaluated the effects of changing emissions and climate on surface O 3 concentrations under an A2 scenario (IPCC AR4).Dentener (2006) showed that global mean surface O 3 may increase by about 4.3 ± 2.2 ppbv by the year 2030 and the area of global natural ecosystems exposed to critical nitrogen deposition may increase up to 25 % by this time.Regional models centered over the continental United States have been used to examine US air quality in the future due to climate change alone independently of evolution in emissions in North America and elsewhere (Hogrefe et al., 2004;Knowlton et al., 2004;Dawson et al., 2009).Hogrefe et al. (2004) concluded that the average daily summertime maximum 8-h O 3 concentrations will increase by 2.7 ppbv and 4.2 ppbv for summers in the 2020s and 2050s, respectively.In the literature, a set of regional models have similarly focused on the European region to isolate the impacts of climate change (Langner et al., 2005;Meleux et al., 2007;Giorgi and Meleux, 2007;Carvalho et al., 2010;Andersson et al., 2010;Katragkou et al., 2011;Huszar et al., 2011;Juda-Rezler et al., 2012).Precisely, Zlatev (2007) and Langner et al. (2005) presented the impacts of climate change on air quality over Europe with a constant emission rate and showed an increase in photochemical production in future climate scenarios.In Meleux et al. (2007), the authors isolated the impacts of European summer climate change on the increase in O 3 levels by using the same emissions and global chemical boundary conditions for the present day and future periods.Katragkou et al. (2011) investigated the sensitivity of surface ozone to the future climates of the 2040s and 2090s by studying changes in meteorological parameters under an A1B scenario.Andersson et al. (2010) suggested changes in surface ozone between −4 to 13 ppbv on average from 1961-1990 to 2071-2100, based on the A2 scenario, and highlighted the role of surface deposition processes.Carvalho et al. (2010) concluded that PM 10 levels will be impacted by climate change depending on the month and region, with a maximum increase reaching 30 µg m −3 in September over Portugal.Szopa et al. (2006) investigated impacts of local anthropogenic emission changes and background O 3 changes.They estimated that the O 3 concentration in July may increase up to 5 ppbv across Europe by 2030.According to all these findings, the expectation of a warmer climate in the next decade may well affect air quality directly despite regulations to reduce the emission of pollutants (European Commission, 2008).
In most "climate" studies, air quality modeling systems are chemistry-transport models (CTM) that rely on global or regional climate models to provide the meteorological forcings for future periods.The purpose of this paper is to assess how realistic air quality hindcasts are when driven by forcings from climate models for the current period over Europe, in comparison to a reference obtained using instead analysed meteorological forcings (analyses), which include meteorological observations and are thus very realistic and specific for each single date.The results will be evaluated using a range of statistical tools and air quality indicators.In our study, we have used the chemistry and transport model MOCAGE of Météo-France (Peuch et al., 1999) for three multi-year simulations covering the present time (2004)(2005)(2006)(2007)(2008) over a European domain.Comparisons between the simulations and the AirBase observations allow us to infer how a range of statistical indicators are affected when using different types of forcings.This work will provide guidance on how far to interpret air quality hindcasts relying on climate model outputs, which is essential for the study of future air quality.
In this paper, Sect. 2 describes the modeling approaches and the numerical experiment design.We also discuss the statistical indicators and the representativeness of the measurement stations used for this study.Section 3 compares simulations in order to evaluate separately how emissions and meteorological changes affect the distributions of pollutants over Europe.Finally, the experiments run with the analyses and the climate forcings are compared against observations in Sect. 4. The statistical indicators that produce similar results for the two experiments will be the most useful ones to consider when examining future trends.

Model set-up and experimental design
The model used in this study is the three-dimensional multiscale chemistry-transport MOCAGE (Modèle de Chimie Atmosphérique à Grande Echelle), which simulates the interactions between the dynamical, physical and chemical processes in the troposphere and stratosphere (Peuch et al., 1999).The configuration allows for the representation of both long range transport of pollutants and regional impacts of pollutants on air quality.This model is used for operational air quality forecasting in France (http://www.prevair.org,Honoré et al., 2008) and in the context of the GMES atmospheric monitoring service (Hollingsworth et al., 2008), and has been evaluated during several campaigns, see for instance Dufour et al. (2004) and Bousserez et al. (2007).
MOCAGE uses a semi-Lagrangian advection scheme (Williamson and Rasch, 1989) to transport chemical species.On the vertical, the configuration has 47 hybrid levels from the surface up to 5 hPa with a resolution of about 150 m in the lower troposphere increasing to 800 m in the higher troposphere.Turbulent diffusion is parameterized with the scheme of Louis (1979) and convective processes with the scheme of Bechtold et al. (2001).The chemical scheme used in this study is RACMOBUS; it is a combination of the stratospheric scheme REPROBUS (Lefèvre et al., 1994) and the tropospheric scheme RACM (Stockwell et al., 1997).Overall, this chemical scheme includes 119 individual species with 94 prognostic variables and 377 chemical reactions.In our study, a sulfur cycle has been implemented; the oxidation reactions in the gaseous and aqueous phases lead to the formation of sulphate aerosols as in Ménégoz et al. (2012).These reaction mechanisms are provided in the Supplement.MOCAGE simulates the evolution of five types of aerosols: black carbon, sea salts, desert dusts, anthropogenic primary particulate matter and sulphates.They are compartmented in size bins (Martet et al., 2009) and divided into 6 bins for each aerosol compound: between 0.1 µm and 100 µm for dust aerosols, 0.001 µm and 10 µm for black carbon, 0.03 µm and 20 µm for sea salt, 0.005 µm and 10 µm for anthropogenic particulate matter and 0.01 µm and 20 µm for sulphates.Nitrate and organic aerosols are not taken into account in this study.A negative bias is thus expected by design on total PM.
The model uses two-way nested domains on a 2 • × 2 • horizontal grid over the globe and a 0.2 • × 0.2 • horizontal grid over Europe (30 • N-70 • N; 15 • W-35 • E).Threehourly forcings are used for meteorology in this study, from either operational analyses from Météo-France (ARPEGE, Courtier et al., 1991) or from climate simulations obtained with ARPEGE-Climate, (version 5.1, Déqué et al., 1994), for the present decade.The resolution of ARPEGE is in a T798 spectral stretched grid (resolution of around 15 km over Europe and 60 km in the Pacific) while ARPEGE-Climate operates on a T63 triangular truncation, equivalent to a resolution of about 2.8 • .Anthropogenic forcings of ARPEGE-Climate (GHG, aerosols) refer to the climatology of the present time.For the present simulation, ARPEGE-Climate is driven by prescribed observed SSTs (sea surface temperatures), and for the future simulations, the SSTs are thus from RCP8.5 scenario ocean-atmosphere coupled simulations.Meteorological forcings are interpolated horizontally on the two MOCAGE domains.
For the anthropogenic emissions, the inventory is the one (Visschedijk and Denier van der Gon, 2005;Pouliot et al., 2012) developed for the Global and regional Earth-system Monitoring using Satellite and in-situ data (GMES) project (Hollingsworth et al., 2008).This inventory has a high spatial resolution compatible with our model and a temporal resolution of 1 h.It is representative of the year 2003.We chose not to modify emissions depending on each specific year as this would be meaningless for runs driven by climate forcings.Biogenic emissions of isoprene and monoterpene are calculated offline with MEGANv2.04 model (Guenther et al., 2006).Two types of input files were required: the land cover variables (leaf area index, plant functional type and emissions factors) and the weather data.The land cover variables are available at a spatial resolution of ∼5 km (150 s longitude × 150 s latitude).The meteorological fields (temperature, solar radiation) are provided either by ARPEGE analyses or ARPEGE-Climate simulations.
The numerical experiment design is as follows (Table 1).Three five-year periods of the current climate were simulated using different meteorological forcings and surface processes.ANALY simulation acts as the reference.INT relies upon meteorological forcings from a climate model, and surface exchanges (weather-dependent emissions and deposition velocities) are the same as for ANALY.CLIM is driven by climate forcings, and surface processes are computed with meteorological conditions of ARPEGE-Climate.The summer heat wave of 2003 was uncharacteristic of the current climate, and studies have shown a similar pattern of heatwaves with future climate conditions (Meleux et al., 2007;Solberg et al., 2008).The climatological forcings from ARPEGE-Climate are representative of the current decade and do not reproduce the extremely hot and cold events.For this reason, year 2003 is not considered in the statistical comparisons.We chose the 2004-2008 period for our simulations.5 yr is a short time on the one hand to represent the meteorological variability over Europe.On the other hand, we require that emissions do not evolve too much in time during the period, and 5 yr is certainly at the limit for such an assumption.The choice of 5 yr is thus clearly a compromise.

Statistical indicators
In Europe, air quality thresholds of acceptable levels of O 3 (European Commission, 2002), NO 2 and SO 2 (European Commission, 1999, 2001) and PM 2.5 and PM 10 (European Commission, 1999, 2001, 2008) have been established in order to protect and inform populations, as described in Table 2.The impact of these air quality policies, in Europe and in the world as well, can be evaluated using numerical air quality modeling.As an example, in Europe, the CAFE (clean air for europe) program has been set up to assess the impacts of these policies on the pollutants' levels (Cuvelier et al., 2007).In order to forecast air quality, to understand the dynamics of air pollution and to develop regulations to reduce emissions, air quality modeling systems are needed.A variety of metrics has been used over the years to evaluate the performance of air quality models (US-EPA, 1984, 1991;Chang and Hanna, 2004;Boylan and Russell, 2006).Mean bias (MB), mean normalized bias (MNB), root mean square error (RMSE) and correlation coefficient (CORR) are common statistical parameters used by the modeling community.Furthermore, the mean normalized bias error (MNBE) and the mean normalized gross error (MNGE), normalizing the bias and error for each model-observed pair by the observations, are also useful parameters.For the evaluation of particulate matter concentration, Boylan and Russell (2006) suggested the consideration of the mean fractional bias (MFB) and the mean fractional error (MFE) parameters instead of MNBE and MNGE.They proposed that the model performance criteria would be met when both MFE ≤ 75 % and MFB ≤ ±60 %, respectively.The model performance goal would be met when both MFE ≤ 50 % and MFB ≤ ±30 %.The US-EPA suggested several performance criteria for simulated O 3 , such as MNBE ≤ ±15 % and MNGE ≤ ±35 % (US-EPA, 1991), while the EC proposes a modeling quality objective given as a relative uncertainty (%): 50 % and 30 % for PM 10 /PM 2.5 /O 3 and NO 2 /SO 2 annual average, respectively (European Commission, 2008).
The model to data statistics MB, MNB, RMSE, correlation coefficient and sigma ratio are selected for the present Root mean square error study.The definitions of these metrics are indicated in Table 3.We also considered the mean diurnal cycle and the temporal series.Mean diurnal cycles are averaged over all available days of concentrations for each 24 h period, while time series are based on the daily mean.Seasonal mean statistics are computed, with seasons corresponding to summer (June, July, August and September) and winter (December, January, February and March).We chose to study these two seasons of interest in air pollution while autumn and spring are rather transitional seasons.As summarized in Table 4, metrics are calculated for hourly values and daily averages for NO x , SO 2 and PM 10 , while the hourly value and daily maximum 8-h average concentration (M×8h) statistics are computed for O 3 , as the M×8h is one of the most important parameters to be considered for this species.

Observations and representativeness
In order to evaluate the performance of MOCAGE and to be in position to investigate and put into context the differences between the simulations, we used AirBase (Version 5) measurement data.The AirBase metadata describes both the site area (urban, suburban or rural) and the site type (traffic, industrial or background).Giving the spatial resolution of our model (0.2 • × 0.2 • ), not all the reporting sites are representative enough.In Joly and Peuch (2012), an objective classification of the AirBase sites based on past measurements has been proposed in order to overcome issues of lack of homogeneity and erroneous information in the metadata.This classification allows for selection of the monitoring sites that are representative of the spatial resolution of our model.Through 10 classes, the less polluted stations (class 1) are distinguished from the very polluted sites (class 10).The robustness of this approach is obtained with a pollutant-specific classification, taking into account that transport, chemistry and lifetime are specific to each pollutant.
In order to highlight the effect of site representativeness, we have compared the summertime (JJAS) average diurnal cycles for classes 1-2 (1 and 2), 1-5, 1-10 and 6  improvement in accounting for representativeness.For NO x , which are short lived species, it is necessary to reduce the sample of sites to classes 1-2 only to focus on sites that are representative enough for the model grid size.Due to transport effects, 5 classes (1-5) can be used to evaluate simulations for longer-lived species such as O 3 in order to have a larger geographical basis.The spatial distribution of PM 10 (not shown) has the same behavior as O 3 , and the same conclusion can be applied.In conclusion, the performances of MOCAGE will be assessed by comparing simulations against observations at sites of classes 1-5 for O 3 and PM 10 , and of classes 1-2 for NO x and SO 2 (not shown, but same behavior as NO x ).The number of sites finally taken into account for each pollutant and country are summarized in Table 5.

Results
In the following section, we discuss MOCAGE's capability to simulate realistic air quality hindcasts when driven by forcings from climate modeling for the current period.We will evaluate statistical tools and air quality indices and compare how they evolve with different sets of forcings.Two main parts can be distinguished.First, in Sect. 3.1,comparisons between ANALY and INT,as   CLIM, allow us to detect the effects due to the meteorological forcings and to changes in surface exchange fluxes, respectively.As described previously, ARPEGE analyses and ARPEGE-Climate fields differ regarding horizontal resolution.Surface exchanges (emissions that depend upon meteorology, as well as surface deposition) have been computed with two sets of meteorological conditions (ARPEGE and ARPEGE-Climate).The differences are for the biogenic volatile organic compounds, desert dust and sea salt emissions, as well as for deposition velocities, which depend on meteorology.Section 3.2 presents statistical skill scores of ANALY and CLIM against AirBase data (for "representative sites" only).

Impact of changes in meteorological fields on European air pollution levels
Figure 2 shows the mean differences in surface temperature, precipitation, humidity and planetary boundary layer height (PBL) for the JJAS period between ANALY and INT run.
The purpose is to evaluate briefly how climate meteorological forcings differ from the analyses.For the comparisons, we have thus averaged spatially the analyses fields to a horizontal resolution similar to the climate run.Focusing on the temperature first, similar structures are found over Europe and an increasing gradient from the northern to the Mediterranean areas.However, over the northeastern part of the domain, the temperatures simulated by ANALY are locally significantly higher up to 4-5 In Fig. 3, we represent the average surface concentrations of O 3 (a), isoprene (b), NO x (c), SO 2 (d) and sulphate aerosols (e) for the summertime 2004-2008 in ANALY and INT.Here, the observed changes in pollutant distributions are only due to differences in the meteorological conditions, as emissions and deposition velocities are identical.The spatial pattern of mean O 3 concentrations is similar for the two simulations over the European domain.The highest concentrations are found in Southern Europe, over the Mediterranean Sea (50-60 ppbv), caused by intense photochemical production of O 3 (EEA, 2005;Vautard et al., 2005).The meteorological fields such as temperature influence the production of O 3 (Meleux et al., 2007;Hedegaard et al., 2008).Fields of change in O 3 present some similarities to the changes in temperature (Fig. 2).As noticed previously, over Spain, Africa and Northern Europe ANALY outputs higher temperatures up to 4-5 • C and higher O 3 concentrations (+6-8 ppbv).The highest positive temperature differences seen over Europe relate to the highest positive O 3 differences.Other studies have shown that among all the meteorological parameters, the one that causes the greatest impact on ozone is temperature (Dawson et al., 2007).Nevertheless, as explained in Katragkou et al. (2010), other variables such as differences in solar radiation, zonal and meridional winds and changes in atmospheric stability also impact ozone concentrations.Similar temperatures and O 3 spatial patterns with the opposite sign are observed over the Mediterranean basin (Italy, Greece).
High concentrations of isoprene, in the range of 2.5-3 ppbv, are simulated over North Africa and Greece with the simulation INT.The biogenic emissions calculated to drive ANALY and INT are the same, but the differences in simulated isoprene cannot be explained by the isoprene emissions or the temperature fields.A longitudinal cross-section (not shown) at a latitude of 36 • N (across Africa) displays higher isoprene concentrations at the surface from 0 to 10 • with INT.The accumulation of isoprene near the surface can mainly be explained by the boundary layer, which is less well-mixed in INT than in ANALY.
The simulated distributions of summertime average NO x concentrations (NO+NO 2 ) show levels around 8-12 ppbv in the Netherlands, Belgium, central and eastern England and the industrial Po Valley (Fig. 3).In ANALY and INT, higher concentrations of NO x are also found over major shipping routes (North Sea, Gibraltar) and near emissions sources.Tropospheric columns of NO x (not shown) are identical in ANALY and INT; the differences seen over Europe at the surface are mainly explained by differing boundary layer mixing in the two experiments.
Simulated SO 2 concentrations over Europe display their highest levels over Spain, Eastern Europe (Poland, Romania, Greece) as well as Belgium and the UK.Over northern Spain (Fig. 3), the concentrations reach up to more than 8 ppbv    (due to large plant sources in coastal Spain in the emissions dataset).Similar geographical distributions and ground levels of SO 2 are observed in ANALY and INT.Once emitted in the atmosphere, SO 2 leads to the formation of sulphate aerosols.Over the northeastern part of the domain, the levels of sulphate are higher in INT than in ANALY.Figure 2 shows higher precipitation and humidity in ANALY than in INT over this area.These differences imply enhanced transformation of SO 2 into sulphate aerosols but also increased wet deposition; those two contrasting effects can explain the differences seen in sulphate concentrations for the two simulations.Figure 4  Tropospheric columns of sulphate (Fig. 4) indeed indicate very similar quantities in the two simulations.
To sum up, meteorological forcings (temperature, humidity, horizontal and vertical winds) differ in ANALY and INT.These differences lead to changes in the vertical and horizontal simulated distributions of pollutants.For all pollutants, primary as well as secondary, differences are primarily due to differing PBL mixing heights in the two simulations.

Impact of changes in surface exchanges on European air pollution levels
Comparisons between INT and CLIM indicate the contribution of surface processes on the pollutant level changes.The differences between these simulations are related to the biogenic emissions of isoprene and terpene, the desert dust and sea salt emissions, as well as the deposition velocities.In Fig. 5a, the spatial distributions of summer isoprene  (Guenther et al., 1993(Guenther et al., , 1995(Guenther et al., , 2006)).In accordance with the temperature field differences (Fig. 2), higher levels of isoprene are emitted in INT over central Spain, Central Europe, Scandinavia and the northeastern part of Africa than in CLIM.The changes in isoprene emissions induce corresponding changes in the geographical pattern of isoprene concentrations.Elevated concentrations of isoprene (> 1 ppbv) are observed over Central Europe, Greece and North Africa in INT compared to CLIM (Fig. 6b).The mean deposition fluxes (µg m −2 s −1 ) of isoprene (Fig. 5b) show smaller differences between INT and CLIM.Over Central Europe, O 3 deposition velocities are higher by up to 0.2-0.3cm s −1 in INT than in CLIM (Fig. 7).Average nighttime and daytime velocities have been calculated for both INT and CLIM; daytime is considered to be from 08:00 to 16:00 UTC and nighttime is considered to be from 20:00 to 04:00 UTC.For INT, daytime and nighttime mean deposition velocities reach 0.57 cm s −1 and 0.24 cm s −1 , respectively, over land (0.06 cm s −1 and 0.05 cm s −1 over sea).For CLIM, daytime and nighttime mean deposition velocities reach 0.54 cm s −1 and 0.24 cm s −1 over land (0.05 cm s −1 and 0.04 cm s −1 over sea).Over land, similar deposition velocities are thus calculated in INT and CLIM.Higher velocities are found during the day as it is known that O 3 deposition velocity has a strong diurnal cycle due to increase in surface resistance at night.The mean deposition fluxes (µg m −2 s −1 ) of O 3 , NO x , SO 2 and sulphate have been computed for the summertime period (Fig. 8).In Fig. 6, the changes in concentrations between INT and CLIM follow the changes in deposition fluxes and velocities.Where higher deposition fluxes are seen in INT than in CLIM (parts of Spain, England, Italy and the north and west of France), higher concentrations of ozone are simulated in CLIM.On the contrary, over other areas (mainly in Central Europe) the mean deposition flux is higher in CLIM compared to INT, leading to higher concentrations in INT than in CLIM.
In contrast with O 3 , smaller differences in flux deposition are observed for NO x , SO 2 and sulphate; nevertheless, these differences lead to differing concentrations between INT and CLIM.In the case of NO x , higher concentrations observed in CLIM over the northern area (as in England, Belgium) are related to lower flux deposition at the surface.In addition, SO 2 concentrations rise by 1-1.5 ppbv in CLIM over northern Spain and Belgium (Fig. 6c); the simulated changes are related to the SO 2 deposition fluxes (Fig. 8c).Very similar distribution and levels of sulphate aerosol are observed in INT and CLIM (Fig. 6e) across Europe.
In summary, the comparisons between ANALY and CLIM represented in Fig. 9 have revealed the contribution of both meteorological and flux changes to simulated air pollutants.The differences linked to the meteorological parameters or surface processes are pollutant dependent.Depending on the species that are considered, the differences can be driven mainly by the meteorological fields or the emission inventories.The meteorological and surface process effects can also compensate each other.Over the whole domain, the changes in sulphate concentrations between ANALY and CLIM are mostly determined by chemical, physical and dynamical processes due to the meteorological fields (humidity, precipitation).The major changes in isoprene concentrations (Spain, North Africa, Greece) are attributed to both changes in atmospheric circulation and stability (ANALY vs INT), as well as to differences in surface emissions and deposition (INT vs CLIM).For the short lived species NO x and SO 2 , we see that the larger changes are localized near the high emission spots.In case of SO 2 , the differences between ANALY and CLIM over Europe are explained by both the changes in deposition fluxes and by the meteorological fields.The O 3 concentration differences between the two simulations are partly related to the changes in meteorological fields (such as temperature) but are principally due to the changes in deposition velocities.

Statistical results: ANALY and CLIM against AirBase
First, in Sect.  the impacts of chronology of pollution events on the skill scores.7 show better performance in term of correlation in winter (CORRNO x H = 0.42; CORRNO x DM = 0.55) than in summer (CORRNO x H = 0.29; CORRNO x DM = 0.43).During winter, chemical processes that lead to O 3 production are less dominant compared to transport and could explain such differences (Bessagnet et al., 2004).

Model interannual variability
Time series of monthly mean concentrations of SO 2 from the AirBase stations and the model simulations are represented in Fig. 10c.Results show that the SO 2 concentrations are overestimated for both ANALY and CLIM.Overall, a good agreement is observed for the anomalies between ANALY and the observations in term of amplitude.From the year 2006, we notice a decrease in the observed SO 2 concentrations due to the regulations reducing anthropogenic emissions.Indeed, emissions have been reduced in the sector of power and heat generation with the emission abatement  strategies in some European countries during the period of 2000-2010(EEA, 2007)).In our simulations, we kept the same emissions inventory representative of the year 2003.
According to the time series of anomalies, and as described in Table 8, the biases calculated in ANALY and CLIM are of the same order of value (MBSO 2 DM = 0.37 µg m −3 for ANALY and MBSO 2 DM = 0.45 µg m −3 for CLIM).
Although the simulation ANALY presents a persistent negative bias (Table 9), it has the capability to reproduce the dynamics of PM 10 for each year.The underestimation of PM 10 can be explained principally by the lack of secondary particulate and nitrate aerosols in our representation of PM 10 .During summer, when photochemistry favors the formation of these particulates, the biases between simulated and observed concentrations become greater (MBPM 10 DM = −11.9µg m −3 , Fig. 10 and Table 9).In the case of CLIM, the time series of anomalies displays the capability of the model to reproduce the particulate matter events (CORRPM 10 DM = 0.39), although the model hardly reproduces their amplitude.

Statistical results
The statistics of the model are spatially displayed in Fig. 11; we illustrate the mean biases for O 3 daily M×8h values, as well as for NO x and SO 2 daily mean concentrations across the representative European stations.The scores are averaged for the summer season (JJAS).Regarding the results of simulation ANALY in the case of O 3 , two distinct spatial regimes can be distinguished from the figures; positive and low biases are shown over Germany while negative biases are noticed in Spain and Italy.The correlations (not shown) are also more elevated in the northern part of Europe, notably in Germany (0.6-0.8) while in Southern Europe (Italy and Spain), the performance of the model is rather low (Pay et al., 2010).Comparisons between the statistical metrics of ANALY and CLIM indicate comparable biases over Europe for the daily M×8h O 3 .The spatial distribution and the amplitude of the negative and positive biases are mostly similar, except for the stations in Germany and France, which display higher positive (30-40 µg m −3 ) and negative biases in CLIM than in ANALY, respectively.As shown in  To examine if the model is able to simulate the variability of O 3 concentrations, we used the sigma ratio, which is the standard deviation of the modeled time series divided by the standard deviation of the observed time series (Table 3).In Table 6 the sigma ratio values are summarized, averaged for the period of 2004-2008.For both simulations, the model underestimates the observed variability of hourly and daily M×8h values, except for the winter season.In addition, two health related parameters are considered: SOMO35 and the number of exceedance days.SOMO35 corresponds to the yearly sum of the differences between daily maximum 8 h running average concentrations that are greater than 35 ppb (Amann et al., 2005).It is used as an indicator for O 3 health impact and is recommended by the World Health Organization (WHO).The values of SOMO35 are summarized in Table 6.When averaged over all of the European stations considered, the observed seasonal levels reach 2221 µg m −3 d and 496 µg m −3 d in summer and winter, respectively (4118 µg m −3 d for the all year).Both simulations catch the levels of SOMO35 and the seasonal variation (van Loon et al., 2007).According to ANALY, about 40 % of the SOMO35 is produced during summer and 20 % during winter.Over the summer period, the number of days with O 3 exceeding the 120 µg m −3 threshold for the daily maximum 8-h average concentration is underestimated in ANALY (n = 5.7 days) and fairly well estimated in CLIM (n = 12 days), in comparison to the observations (n = 15.4 days) from the European stations.The mean ozone concentrations above the threshold of 120 µg m −3 for the M×8h simulated by ANALY are mostly in line with the observations or within the interannual variability.More elevated values are reached in CLIM than ANALY, as shown for the French and Italian stations (Fig. 12a).Figure 12b shows the percentile of daily O 3 maximum simulated by ANALY and CLIM simulations.The interval between the 20th and 70th percentiles display similar values for both simulations.The occurrence of extreme values (maxima) is underestimated by the model for both simulations.As seen previously, above the threshold of 180 µg m −3 , CLIM simulates a higher number of occurrences than the observations.These figures depict, however, overall that MOCAGE driven by climate model outputs as forcings is able to simulate realistic ozone concentrations over Europe.
As exposed in Table 5, 354 stations were used to provide NO x measurements throughout Europe.Considering the spatial distribution of mean biases for daily mean NO x , statistical results show satisfactory seasonal mean bias for ANALY (MBNO x DM = −2.4µg m −3 during summer) without spatial pattern between north and south.Similar geographical distributions and values of mean bias are displayed for CLIM (Fig. 11), while the summer mean bias reaches −2.1 µg m −3 (Table 7).The spatial distribution of the correlation coefficients shows a large variability per station; while northern stations display high correlations (0.6 < r < 0.8), low correlations are observed in Southern Europe (r < 0.4).The performances of the model are reduced with CLIM for all seasons, notably during winter (Fig. 7) when the concentrations of NO x are overestimated, as shown in Fig. 10.Thus, in winter, MNBNO x H and MNBNO x DM reach 55.1 % and 54.3 %, respectively; also, RMSENO x H and RMSENO x DM reach 29.2 µg m −3 and 25.7 µg m −3 , respectively.In summer, the MNB values for hourly and daily mean are near the uncertainty proposed by EC and US-EPA for the ANALY simulation.Globally, the annual and seasonal daily mean statistics   present better performances in comparison with the hourly values.
For SO 2 , low correlations are mainly concentrated in Southern Europe (coefficients under 0.2), as in Spain, while some northern stations display high correlations (r > 0.7) in regard to statistical results of ANALY.Averaged over all European stations, the summertime mean correlation of daily mean SO 2 reaches 0.3.Considering the mean bias for summer (Fig. 11), low biases are depicted across all the stations (MBSO 2 DM = −0.07µg m −3 for ANALY and CLIM).Nevertheless, the stations located in Poland display high positive biases (> 2 µg m −3 ).The uncertainties of the emissions inventory in Eastern Europe may contribute to the higher bias observed.In some stations in Spain, higher bias is also observed, due to the emission inventory in part.The regulation of SO 2 emissions in Spain have lead to an emission reduction of 50 % (Spain Environment Ministry, 2011).As shown in Table 8, the highest correlations are obtained in winter (CORRSO 2 H = 0.27; CORRSO 2 DM = 0.40), but the concentrations are overestimated leading to a high value of MNB close to the EC criteria.
As seen in Fig. 10, the model presents a systematic negative bias for the simulated concentrations of PM 10 .For the ANALY simulation, the correlation coefficient for the annual daily mean is 0.39, while it reaches 0.2 and 0.48 for the summer and winter season, respectively (Table 9).The spatial distribution of mean bias and correlations (not shown here) present a homogeneous pattern over Europe.The annual MFE (MFEPM 10 DM = 75.1 %) and MFB (MFBPM 10 DM = −59.5 %) calculated for ANALY does not meet the performance criteria or the performance goal proposed by Boylan and Russell (2006).The performance of the model is better during winter when the MBPM 10 DM is about −4.6 µg m −3 (against −11.9 µg m −3 in summer) and the mean correlation reaches 0.48 (against 0.2 for the summer).The underestimation of PM 10 can be explained by the lack of secondary particulate and nitrate aerosols in our representation of PM 10 .Differences between the seasons are linked to chemical processes, dominant in summer, which favor the formation of these particulates and increase the bias between simulated concentrations (lacking these chemical processes) and observations.
Several air quality models operated in Europe have been evaluated either individually or in comparison to other models in the literature.In the following discussion, a quick comparison with other regional air quality models (Hass et al., 2003;van Loon et al., 2004van Loon et al., , 2007) and MOCAGE will be carried out in order to situate our model among the community.For this reason, we used the studies similar with our simulation ANALY, which had a long time scale of 1 yr over the European domain on a regional scale with horizontal resolutions similar to MOCAGE.Also, these models were evaluated against ground observations at rural sites from AirBase and EMEP.Concerning O 3 daily M×8h, satisfactory performances are displayed with MOCAGE, in terms of annual MNB values: 0.04 % versus −1 to 10 %; correlations: 0.7 versus 0.69-0.84(van Loon et al., 2007); and RMSE: 21.3 µg m −3 versus 18.1-25.5µg m −3 (van Loon et al., 2004).Values for summer and winter daily M×8h O 3 are also in the range of other models, as for the correlations (0.63 versus 0.61-0.77for summer; 0.65 versus 0.45-0.62 for winter) and the MNB (−9.44 % versus −5 to 8 % for summer; 9.8 % versus −20 to 15 % for winter) according to the study of van Loon et al. (2007).The MOCAGE performances for NO x can be compared with the performance of NO 2 in other models.The annual correlation of daily mean NO x obtained in this study reaches 0.61, compared to 0.03-0.52(Hass et al., 2003;van Loon et al., 2004) and the RMSE value is around 9.8 µg m −3 versus 8.5-13.9µg m −3 (Hass et al., 2003;van Loon et al., 2004).As for O 3 , the MOCAGE results for SO 2 show good performances in comparison with the other studies.The annual daily mean correlation is among the higher values (0.36 versus 0.24-0.49).The calculated RMSE reaches 3.2 µg m −3 against 2.7-10.9µg m −3 for the other models (Hass et al., 2003;van Loon et al., 2004).For PM 10 , statistical results are in the same range as for other studies; nevertheless, the annual daily mean correlation is rather low and reaches 0.39 compared to 0.38-0.55; the annual RMSE is 15.1 µg m −3 (versus 12.4-16.6µg m −3 ).
To summarize, MOCAGE performs well according to the comparisons between ANALY and AirBase observations, as discussed previously.The statistical scores of O 3 , NO x and SO 2 display satisfactory performances compared to other studies, while the accuracy in our representation of PM 10 exhibits poorer results, which are expected by design.Comparisons between the simulations ANALY and CLIM have shown that the geographical distribution of mean biases are quite similar for each pollutant considered.In this section, the model to observation comparisons were based on a common approach, which consists in comparing each year of the simulation with the matching measured values from the Air-Base database.In CLIM, the meteorological forcings are representative of the current decade; there is no particular match in the sequence of years, and the representativeness of the skill scores can be assessed by permutations of all years.By doing the same for ANALY, the comparisons will allow us to determine which statistical tools are useful to consider for future studies.

Impacts of chronology of pollution events
We evaluated each year of the simulations with 5 yr of measurements, giving 25 model-to-date pairs of statistics for both ANALY and CLIM.On the time basis of the 2004-2008 period, we calculated every possible permutation, but on this period of 5 yr, we did not consider the same year of measurements more than once.We also filtered out the cases when one or more simulated years correspond with the same years of data.Finally, these conditions let us consider 44 realizations, which provide a large statistical basis.
In order to give a concise statistical summary, we used the Taylor diagrams, which indicate how well observed and simulated patterns match each other in terms of correlation and normalized standard deviation (NSD) (Taylor, 2001).The correlation coefficient R gives a measure of the co-variance of simulated and observed values.The NSD gives a measure of the amplitude of the variance in modeled values versus observed values.When NSD reaches a value lower than 1, it means that the temporal standard deviation in simulated values is lower than observed.Figure 13 shows the normalized Taylor plots that summarize the ANALY and CLIM permutations.The statistics are computed for the summertime daily M×8h O 3 concentrations (a) and daily averages of NO x (b) and SO 2 (c).For each plot, "ANALY" refers to the reference model-observation analyses (black cross) whereas "ANALY-p" (blue symbols) and "CLIM-p" (red symbols) refer to the permutation cases.Considering  the results of ozone first, comparisons between ANALY and ANALY-p confirm there is no day-to-day variability with the permutations of ANALY-p as shown by the correlations.As summarized in Table 10, for ANALY-p, the median value of the correlations reach 0.04 (against 0.63 for ANALY).
The median RMSE of ANALY-p are not as good as the median RMSE for ANALY for both the hourly (RMSEO 3 H = 31 µg m −3 for ANALY-p and 21.4 µg m −3 for ANALY) and daily M×8h (RMSEO 3 MAX = 30.9µg m −3 for ANALY-p and 24.9 µg m −3 for ANALY) of O 3 .Nevertheless, very similar values of MNBO 3 MAX and MNBO 3 H in the ranges of −9 % and −4.5 %, respectively, are obtained.The standard deviation values (σ ) are similar in ANALY and ANALY-p and range around 0.66, meaning that the model underestimates the daily M×8h ozone variability.The same conclusions can be extended to the daily averages of NO x , SO 2 (Table 11) and PM 10 (Table 12) concentrations.ANALY and ANALY-p only differ by the correlations while the MB, RMSE and variances are quite similar.For the daily mean NO x (Fig. 13), the NSD are close to the reference, meaning that the amplitude of the simulated NO x agrees with the observations.To sum up, these results point out that, for all the species, the correlations calculated for ANALY-p are weaker For O 3 (Fig. 13), low and similar correlations of daily M×8h levels are calculated for both ANALY-p and CLIM-p.As shown in Table 10, the correlations between the observed and simulated hourly values (0.33 for ANALY-p and 0.34 for CLIM-p) are higher than daily M×8h.The daily variability of ozone, characterized by higher levels during afternoon and lower values during nighttime hours is still captured with the permutations.The RMSE values show lower performances for CLIM-p compared to ANALY-p for both daily M×8h (+10 %) and hourly O 3 (+13 %) levels.The σ values indicate the tendency of ANALY-p to underestimate ozone variance in summer.The CLIM-p simulations show better σ values (0.9 and 0.98 versus 0.67 and 0.75 for ANALY-p), which is unexpected and cannot be interpreted as greater performance.The MNB of M×8h ozone show a tendency of model underestimation as the median reaches −9.1 % in ANALY-p and −6.4 % in CLIM-p.For the hourly and daily average NO x and SO 2 concentrations, the simulations ANALY-p and CLIM-p are now well correlated (Table 11).During summer, ANALY-p and CLIM-p underestimate the daily mean and hourly NO x concentrations.From ANALY-p to CLIM-p, the median MNBNO x DM and MNBNO x H change by about 14 % while the RMSENO x DM and RMSENO x H change by about 7 %.Concerning the amplitude of the NO x variances (Fig. 13), a satisfactory agreement is observed; σ reaches 0.98 in ANALY-p and 1.07 in CLIM-p for daily mean NO x (Table 11).For the SO 2 results, the variance is overestimated by the model for both ANALYp and CLIM-p.For the daily mean of PM 10 concentrations, the amplitude of the variances are in line for ANALY-p and CLIM-p simulations.The simulations underestimate the variability of the PM 10 (σ PM 10 DM = 0.49 for ANALY-p and σ PM 10 DM = 0.59 for CLIM-p).MBPM 10 DM reaches similar values for ANALY-p and CLIM-p.MFB and MFE metrics do not reach the performance criterion.
To summarize, the use of permutations have made the comparisons between the simulations suitable for discussion of the statistical results.Comparisons between the reference case "ANALY" with "ANALY-p" corroborate the incorrect phasing between the measurements and simulations when the day-to-day variability is not reproduced.Finally, the results allow us to conclude that statistical metrics such as variances, MB, MNB and RMSE give robust and sound information when climate forcings are used to drive the model.

Conclusions
This paper has investigated how different the hindcasts of an air quality modeling system are when using two different types of meteorological forcings: meteorological analyses or fields from a climate model.This is ground work needed to qualify and properly interpret statistical conclusions that can be drawn from simulations of air quality in a future climate.
The comparisons between three 5-yr experiments allow us to quantify the relative importance of changes in surface fields and upper air meteorology.We find that both elements contribute to changes in O 3 concentrations.Differences in sulphate aerosols and in isoprene (as a proxy for biogenic volatile organic compounds) are mainly related to changes in meteorology and mixing, while it is the contrary for SO 2 and NO x , which essentially depend upon changes in surface fluxes.
The skill of the reference simulation (analysed forcings) to reproduce European surface observations is in the range of previous reference evaluation studies (Hass et al., 2003;van Loon et al., 2004van Loon et al., , 2007)).As expected, the simulation based upon forcings from a climate model is not as skillful at reproducing observations, since it cannot follow day-to-day variations by design.Nevertheless, the geographical distributions of the mean biases are similar in the two simulations.The comparisons of SOMO35 and the distribution of O 3 maximum percentiles support the capability of MOCAGE, driven by meteorological fields from a climate model, to simulate realistic European ozone levels.
The objective of this work was to determine useful statistical metrics that can be used for models driven by climate model meteorological parameters.For O 3 , we show that simulations using either analyses or climate model forcings follow the same tendency: hourly and M×8h concentrations are slightly underestimated and the biases and RMSE are in the same range of values.Similar conclusions are observed for the daily averaged and hourly values of NO x and SO 2 , as well as daily PM 10 .The amplitude of variance is accurately reproduced when the model is driven by climate fields.Finally, as for the standard deviation, statistical results of MB, MNB and RMSE can be interpreted with some degree of confidence.
sources.Several studies have shown how O 3 photochemistry depends Published by Copernicus Publications on behalf of the European Geosciences Union.G. Lacressonnière et al.: Air quality hindcasts driven by forcings from climate model

Fig. 2 .Fig. 2 .
Fig. 2. From top to bottom: average summertime surface temperature ( • C), precipitation (mmd −1 humidity (gkg −1 ) and planetary boundary layer height (m) for the summer period (JJAS) of ANALY and INT.Differences between ANALY and INT are shown on the right.

Fig. 3 .Fig. 4 .Fig. 4 .
Fig. 3. From top to bottom: simulated average JJAS surface O 3 , isoprene, NO x , SO 2 and sulphate fields for ANALY and INT.Differences between ANALY and INT are shown on the right.Units are in µg m −3 .

Fig. 5 .
Fig. 5. (a) Emissions of isoprene for the summertime period, averaged for 2004-2008 in the INT (left) and CLIM (middle) simulations.Differences between INT and CLIM are shown on the right figure.(b) Deposition flux (µg m −2 s −1 ) of isoprene, averaged for the summertime of INT and CLIM simulations.

Fig. 5 .
Fig. 5. (a) Emissions of isoprene for the summertime period, averaged for 2004-2008 in the INT (left) and CLIM (middle) simulations.(b) Deposition flux (µg m −2 s −1 ) of isoprene, averaged for the summertime of INT and CLIM simulations.Differences between INT and CLIM are shown on the right.

Fig. 6 .Fig. 6 .
Fig. 6.Differences in simulated average surface O 3 (a), isoprene (b), NO x (c), SO 2 (d) and sulphate fields between INT and CLIM for the summertime (JJAS).Species are in unit of µg m −3 .44 represents a latitudinal cross-section of sulphate at the longitude of 30 • E, averaged for the summertime period of2004 -2008 (JJAS) (JJAS).From 60 • N to 70 • N, the vertical extent of the sulphate distribution is lower in INT than in ANALY.The difference in sulphate near the surface is due to differing PBL mixing properties in the two simulations.

Fig. 7 .Fig. 7 .
Fig. 7. O 3 deposition velocity averaged for the summertime period simulated by INT (left) and C (middle).Differences between INT and CLIM are shown on the right.

Fig. 8 .Fig. 8 .
Fig. 8. From top to bottom: (a) deposition flux of O 3 , (b) deposition flux of NO x , (c) deposition of SO 2 and (d) deposition flux of sulphate.Deposition flux are in µg m −2 s −1 and averaged fo summertime period of INT and CLIM simulations.

Figure 10
Figure 10 represents the temporal series of the model (ANALY in black line; CLIM in gray line) and monthly measured AirBase data (red line) as an average of the daily mean O 3 (a), NO x (b), SO 2 (c) and particulate matter PM 10 (d) from 2004 to 2008 across the European domain.If we subtract the observed and simulated annual cycle averaged for the period 2004-2008 from these time series, positive and negative anomalies remain.The meteorology of ANALY is expected to follow the day-to-day variability in a more

Fig. 10 .
Fig. 10. 1) Simulated (ANALY: black lines; CLIM: gray lines) and measured at the AirBase stations (red lines) time series of monthly mean concentrations of O 3 (a), NO x (b), SO 2 (c) and PM 10 (d).The time series are plotted from 1 January 2004 to 31 December 2008 and averaged over the European domain.Concentrations are in µg m −3 .2) Anomalies calculated when substracting the average annual series from the time series in 1).
Fig. 12.(a) Summertime average ozone concentrations (over the sta of 120 µg m −3 over 6 European countries.FR is France, ES is Spa IT is Italy, PL is Poland.The standard deviation measuring the inter the vertical bars.(b) Distribution of daily O 3 maximum percentiles MOCAGE simulations (ANALY in black, CLIM in gray). 50

Fig. 12 .
Fig. 12.(a) Summertime average ozone concentrations (over the stations available) above the threshold of 120 µg m −3 over 6 European countries.FR is France, ES is Spain, DE is germany, GB is England, IT is Italy, and PL is Poland.The standard deviation measuring the interannual variability is represented by the vertical bars.(b) Distribution of daily O 3 maximum percentiles for AirBase measurements (red) and MOCAGE simulations (ANALY in black, CLIM in gray).

Fig. 13 .Fig. 13 .
Fig. 13.Taylor plots of the comparison between modelled and observed M×8h O 3 concentrations, daily mean NO x and daily mean SO 2 .The radial distance from the origin corresponds to NSD and R corresponds to the azimuthal position.51

Table 3 .
Definition of the metrics used in the evaluation of the MOCAGE model performance.M refers to the model, O refers to the observations

Table 4 .
Metrics considered in the evaluation of O 3 , NO x , SO 2 and PM 10 concentrations.
* SOMO35: annual sum of excess of daily maximum 8-h mean ozone over the cut-off of 35 ppb.

Table 5 .
Number of representative sites available by countries and species.Countries are, from left to right, France (FR), Spain (ES), England (GB), Germany (DE), Italy (IT) and Poland (PL).

Table 7 .
Same as Table6for hourly value and daily mean NO x concentrations.

Table 8 .
Same as Table6for hourly value and daily mean SO 2 concentrations.

Table 10 .
Seasonal JJAS statistics obtained with the permutations ANALY-p and CLIM-p.Statistics are the median values of the permutations.The calculated statistics are mean bias (µg m −3 ), mean normalized bias (%), correlation coefficient, root mean square error (µg m −3 ) and sigma ratio.Statistics are computed for the O 3 hourly value and daily M×8h concentrations.

Table 11 .
Same as Table10for hourly values and daily average of NO x and SO 2 .

Table 12 .
Seasonal JJAS statistics obtained with the permutations ANALY-p and CLIM-p.Statistics represent the median values of the permutations.The calculated statistics are mean bias (µg m −3 ), mean fractional bias (%), mean fractional error (%), correlation coefficient, root mean square error (µg m −3 ) and sigma ratio.Statistics are computed for the PM 10 daily mean concentrations.ANALY due to the permutations.The RMSE values increase by 19 %, 11 % and 8 % for the hourly values of O 3 , NO x and SO 2 , respectively, from ANALY to ANALYp.However, for the metrics MB, MNB and σ , similar values are computed. than