Introduction

GMD

Geoscientific Model Development

GMD

Geosci. Model Dev.

1991-9603

Copernicus Publications

Göttingen, Germany

10.5194/gmd-10-1199-2017

An alternative way to evaluate chemistry-transport model variability

Menut

Laurent

menut@lmd.polytechnique.fr

https://orcid.org/0000-0001-9776-0812

Mailler

Sylvain

Bessagnet

Bertrand

Siour

Guillaume

Colette

Augustin

Couvidat

Florian

Meleux

Frédérik

1Laboratoire de Météorologie Dynamique, Ecole Polytechnique, IPSL Research University, Ecole Normale Supérieure, Université Paris-Saclay, Sorbonne Universités, UPMC Univ Paris 06, CNRS, Route de Saclay, 91128 Palaiseau, France 2INERIS, National Institute for Industrial Environment and Risks, Parc Technologique ALATA, 60550 Verneuil-en-Halatte, France 3Laboratoire Inter-Universitaire des Systèmes Atmosphériques, UMR CNRS 7583, Université Paris Est Créteil et Université Paris Diderot, Institut Pierre Simon Laplace, Créteil, France

Laurent Menut (menut@lmd.polytechnique.fr)

17March2017

10 3 11991208 17June2016 24June2016 18February2017 23February2017

This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/

This article is available from https://gmd.copernicus.org/articles/10/1199/2017/gmd-10-1199-2017.html

The full text article is available as a PDF file from https://gmd.copernicus.org/articles/10/1199/2017/gmd-10-1199-2017.pdf

A simple and complementary model evaluation technique for regional chemistry transport is discussed. The methodology is based on the concept that we can learn about model performance by comparing the simulation results with observational data available for time periods other than the period originally targeted. First, the statistical indicators selected in this study (spatial and temporal correlations) are computed for a given time period, using colocated observation and simulation data in time and space. Second, the same indicators are used to calculate scores for several other years while conserving the spatial locations and Julian days of the year. The difference between the results provides useful insights on the model capability to reproduce the observed day-to-day and spatial variability. In order to synthesize the large amount of results, a new indicator is proposed, designed to compare several error statistics between all the years of validation and to quantify whether the period and area being studied were well captured by the model for the correct reasons.

Introduction

Chemistry-transport models (CTMs) aim at simulating the atmospheric composition where humans and the environment can be affected by air pollution. Air pollution results from the presence of chemical compounds emitted into the atmosphere due to anthropogenic activities and natural sources (biogenic emissions from vegetation, soil erosion, sea salt, volcanic activity and wildfires). CTMs are used to represent the dynamical and chemical processes that drive spatial and temporal features of the atmospheric composition.

To estimate the quality of CTMs, model output results are usually compared with available observations. These comparisons have been performed for as long as the models have existed; they are crucial for quantifying the ability of models to reproduce particular events or a general behavior. The quantification of the model quality is performed in every research work. It depends on the case being studied, the modeled variables and the spatial and temporal resolutions. The comparison between observations and model outputs is a complex task and has to take into account numerous factors such as the spatial representativeness of the monitoring stations . For many years, the best approach to evaluate a model's results has been discussed and, in the field of atmospheric composition, numerous methods were proposed. It is not possible to give an exhaustive list of all validation studies and we present some examples here.

and proposed the use of error statistics like correlation, bias and root mean squared error (RMSE) in the specific framework of air quality, i.e., the atmospheric composition when criteria pollutant concentrations exceed predefined limit values. also proposed an evaluation framework dedicated to air quality model performance and explained that there is not “a single best evaluation methodology” and how important it is to use as many evaluation criteria as possible to really understand model results well. Later, and in order to ensure the use of systematic procedures in the evaluation process, dedicated tools were developed for the model evaluation. For example, and proposed complex statistical modules to extract all possible information related to the capability of a model to reproduce an observed event. In parallel, some studies were dedicated to revisit the way to evaluate models such as , dedicated to air quality in a policy framework. In this study, the authors proposed the “target diagram” to have the bias and the RMSE on the same plot. Complementary to the definition of performance indicators to be used, used these indicators to compile photochemical model performance for a large set of data over several years of simulation. This kind of evaluation may also be done in dedicated projects, such as the recent AQMEII (Air Quality Model Evaluation International Initiative), comparing chemistry-transport models running both in Europe and North America ; the EURODELTA project ; and the EMEP (European Monitoring and Evaluation Programme) context in the framework of the United Nations Convention on Long-range Transboundary Air Pollution . Using comparisons between observations and model outputs, some studies proposed methodologies to decompose the statistical scores in order to estimate the main source of errors . Finally, other studies also use observations to adjust the result by implementing methods to unbias simulation without changing the model, as in for ozone over the United States. The common point of all these studies is that they are always using, as best as possible, the observations corresponding in time and location to the model grid cell.

In the present study, a simple method is proposed to add information about the model performance with a focus on its spatial and temporal variability. To reach this objective, we propose to use observations corresponding to the modeled period and geographical domain but also to use observations for the same domain but other periods. In this way, we want to extract the information about the model variability and to answer the following question: is the performance of the model satisfactory because the model is accurate or just because the model is able to reproduce a situation which is recurrent from year to year? The issue to be solved and the tools developed are presented in Sect. . The new methodology with the presentation of the indicator developed for this study are presented in Sect. . The results and discussions to point out the drivers of model errors are presented in Sects. and for the new indicator.

Methodology

In the present study, a simple method is developed to improve the evaluation of model variability and to identify the processes responsible for discrepancies of model outputs vs. observations. The methodology is general and could be applied to all types of models. In this study, the methodology is presented for the specific case of the regional atmospheric composition modeling: a topic mixing meteorology and chemistry, with a high spatial and temporal variability, thus having a good potential to test the relevance of our methodology.

Regional chemistry-transport modeling

In chemistry-transport modeling, several processes are involved, some of them directly influencing the others. When studying both meteorological and chemical variables, the dependencies between all variables are helpful to better interpret the model results.

The boundary conditions prescribe the concentrations of chemical species which may enter the simulation domain. Usually for large domains, they are issued from global models as monthly climatologies. They correspond to averaged values suitable to characterize the background concentrations of long-lived species such as ozone, carbon monoxide and mineral dust. Anthropogenic emissions are prescribed from databases and the influence of meteorology is limited in the model. Vegetation, fire and mineral dust emissions depend both on land-use data and meteorology. These emissions are not measurable; it is almost impossible to directly assess their quality.

The meteorological variables influence transport and mixing processes, with a direct effect on gas and aerosol plume locations and their vertical distribution. Cloudiness and temperature impact the photolysis efficiency; the boundary layer height impacts the surface mixing of pollutants; rainfall impacts the wet deposition. Moreover, meteorology also has an impact on emissions: wind variability is the prevalent driver for dust emissions, and it has also a major impact on wildfire emissions. Both temperature and solar irradiance influence the magnitude of biogenic emissions from vegetation. The spatial variability of land-use data also has a strong impact on all these natural emissions.

The chemistry-transport model is a numerical integration tool of all forcings and processes. The chemical mechanism handles the life cycle of chemical species (production and loss) when the deposition processes are the only net sink of species. In the model, the spatial (horizontal and vertical) and temporal resolutions are also prescribed, directly impacting the simulation representativeness and thus the quality of the modeled air pollutant concentrations when they are compared to available observations.

The studied case

The study focuses on the summer 2013 period (1 May to 31 August) over the European Mediterranean region. This period is called the “reference period” in this paper. This case has already been modeled (using the same models, WRF and CHIMERE) and the results were discussed in . The same simulation is used in this study; all parameters are identical.

List of measurement data used for the statistical comparison with the model results. All data used are issued from surface stations representative of their own environment. Originally provided hourly or 3-hourly, they are used as daily averages in this work. The abbreviation “ad.” is used to indicate dimensionless units.

Variable Network Spatial Vertical Temporal Unit coverage coverage frequency O3, NO2 EBAS/EMEP Europe Surface Hourly ppb PM2.5, PM10 EBAS/EMEP Europe Surface Hourly µg m-3 AOD, Ångström AERONET Global Column Hourly ad.

T2 m

BADC Global Surface Tri-hourly ∘C

U10 m

BADC Global Surface Tri-hourly m s-1 Precipitation BADC Global Surface Daily mm day-1

The observational data come from different sources depending on the variables (see Table ). In this region, where the monitoring networks are dense enough, comparisons are performed with observations from surface stations that provide hourly O3, NO2 surface concentrations for gases and PM2.5 and PM10 (particulate matter with mean mass median diameter lower than 2.5 and 10 µm, respectively) for particles. Complementary to surface concentration data, evaluated using the EBAS database , the meteorology is also evaluated for 2 m temperature (T2 m), 10 m wind speed (U10 m) and precipitation rates (in mm day-1) from the BADC (British Atmospheric Data Centre). In order to quantify the transport of aerosols in dense plumes aloft, observations from the AERONET (AErosol RObotic NETwork) program are used for the aerosol optical depth (AOD) and the Ångström exponent. In this study, all variables are used as the daily mean (except for precipitation corresponding to daily cumulated values) in order to (i) have homogeneous scores between the variables and (ii) be able to separate the systematic and the day-to-day variabilities. The use of an hourly time frequency was ruled out to avoid a too strong weight of the diurnal cycle in the temporal variability.

Proposed methodology

As discussed in the introduction, many statistical indicators (SIs) exist to quantify the model ability to simulate observed pollution events. The correlations (temporal and spatial), the RMSE, its normalized expression nRMSE and the bias (the difference between observations and modeled values) are widely used in regional air pollution modeling. The correlations are able to split the relative contributions of systematic meteorology or source-related variability and day-to-day variability. The RMSE and the bias are a direct quantification of the model error.

Principle of the multi-year variability indicator (Imv) calculation, using one modeled year and several years of observations. SI stands for “statistical indicator” and is related to spatial and temporal correlation.

The main goal of this study is to separate the contributions due to systematic and sporadic events. The systematic events correspond to yearly phenomena, while the sporadic events correspond to the events observed during one year but not the others. In addition, complementary to the model variability quantification, the model error is also important to estimate. The key points of this study are to (i) study the model variability which is statistically represented by the correlations and (ii) add complementary information on the model errors, which could be represented here by the RMSE (or the nRMSE).

First, as presented in Fig. , the SIs are calculated between observation data and model outputs for the simulation year (i.e., the reference year). Second, the SIs are calculated between the observation data for other years and the model output for the reference year. Logically, the scores calculated for the reference year for observations and model outputs would give the better results. By examining the difference with the scores calculated for other years (with the observations only), we expect to conclude whether the model is able to catch the observed variability for the correct reasons. Using this approach, the goal is to give complementary information to those usually obtained when using only SIs calculated for a single year (the studied year).

We apply this methodology for the simulation of the year 2013 and using observation data for years ranging from 2008 to 2013. In order to give some synthetic answers, the different SI scores are aggregated into a single indicator called Imv and presented in detail in the next section. Of course, it seems awkward to evaluate a model day by day with observational data from another year. For a given station, at a given day of the reference year, air concentrations will be affected by a different local meteorology, emissions and long-range transport of chemical species. However, we can consider that to take the same date for another year is strictly the same as randomly choosing a date in the same season. This trivial method can emphasize how a model is affected by large-scale patterns and long-term temporal cycles.

Calculation of correlations and nRMSE

In this study, we focus on three statistical indicators: the spatial correlation, the temporal correlation and the normalized RMSE. For these three indicators, it is important that, for all years of validation, the same list of stations with valid measurements is used.

The correlation used in this study is Pearson's correlation. Each correlation provides specific information on the quality of the simulation. The temporal correlation, noted Rt, is estimated station by station using daily averaged data in order to have homogeneous comparisons between all variables. This correlation is directly related to the variability from day to day for each station. Ot,i and Mt,i represent the observed and modeled values, respectively, at time t for the station i, for a total of T days and I stations. The mean time-averaged value Xi‾ is Xi‾=1T∑t=1TXt,i. The temporal correlation Rt,i for each station i is calculated as Rt,i=∑t=1T(Mt,i-Mi‾)(Ot,i-Oi‾)∑t=1T(Mt,i-Mi‾)2∑t=1T(Ot,i-Oi‾)2. The mean temporal correlation, Rt, used in this study is thus Rt=1I∑i=1IRt,i, with I the total number of stations. The spatial correlation, noted Rs, uses the same formula type except it is calculated from the temporal averaged values of observations and model for each location where observations are available. A good correlation shows that the model correctly locates the largest horizontal gradients as known sources and long-range transport plumes. The spatiotemporal averaged value is estimated as X‾‾=1I∑i=1IXi‾, and the spatial correlation is thus expressed as Rs=∑i=1I(Mi‾-M‾‾)(Oi‾-O‾‾)∑i=1I(Mi‾-M‾‾)2∑i=1I(Oi‾-O‾‾)2. The normalized RMSE is expressed as nRMSE=1T1I∑t=1T∑i=1IOt,i-Mt,iOt,i2 for all stations i and all times t.

Definition of the <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mtext>mv</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> indicator

For the specific purpose of the model variability (and not the model error), we define an indicator, Imv, dedicated to express in one value the results obtained with the temporal and spatial correlations. The goal of this indicator is to quantify how the correlation between measurement data (for different years) and model outputs (for the reference year) evolves from one year to another. This indicator does not replace the usual statistical indicators but aims at providing complementary information about the variability between years.

We first define the differences, D, between all years as D=1N-1∑i=1N-1|si-sN|, with sN the score of the indicator for the reference year being modeled and si the score of the indicator computed using observations corresponding to other meteorological years (from 1 to N-1 if there are N-1 other available years for the observations).

We now aim to develop a simple indicator, called Imv, which is a combination of the statistical indicator for the reference year and the differences between years. This Imv corresponds, in fact, to the SI itself weighted by the differences between the SI scores of all years. We expect that Imv follows the following rules:

Imv has the same evolution as the studied SI. If the correlation increases, Imv also increases.

Imv is bounded between 0 and 1, like the correlation. This enables us to compare the results for different variables (with different metrics).

In the case of a high correlation value found for the studied year, the obtained sN value is close to 1. This value may be lower for the following reasons:

If the differences between the other years are low (D tends to 0), it means that the model is correct for the studied year, but possibly because it reproduces a recurrent phenomena. In this case, we want Imv to decrease and tend to 0.

If the differences between the other years are high (D tends to 1), it means the model gives good results for the studied year, but it is not because it simulates a systematic event. In this case, we want Imv to remain close to the indicator value. With sN≈1 and Imv≈1, we can conclude that the model is very good for the studied year and this is not due to a recurrent process.

In the case of a low correlation value, and whatever the magnitude of differences between years, the model is not correct. Imv must be low, as it is the indicator value.

These constraints allow us to define an indicator having this kind of formulation: Imv=sN1-exp⁡(-Ds)4.

This means that Imv always has, as a maximum, the value of the indicator itself. The power of 4 is here defined to have a specific shape for Imv, respecting the rules presented below. Finally, this expression gives an indicator variability presented in Fig. . Considering the state of the art of chemistry-transport modeling, the model is considered accurate, having an acceptable variability for Imv>0.4: this means that the correlation is at least 0.5 and the differences are also at least greater than 0.5.

Scheme of the Imv values as a function of the studied year correlation values and the multi-year differences D.

Finally, this indicator is not calculated for nRMSE and bias. Two reasons explain this choice: the first reason is that, contrarily to correlations, RMSE and bias are not bounded between 0 and 1. This leads to indicator values possibly varying a lot between several years and thus being difficult to compare between years. The second reason is that the goal of the indicators is to extract a message from the model variability of the studied year compared to the other years. In this case, the correlations constitute a statistical indicator which is more appropriate for this evaluation.

Time series of statistical indicators

The calculations of differences are performed for the correlations and the nRMSE. These values are calculated for all variables described in Table for the years 2008 to 2013. For each year, it is noted that only the May to August period is considered. Results are presented as time series in Fig. and discussed in the following sections. Note also that some values discussed in these sections are also reported in the synthetic Table .

Multi-year scores for T2 m, u10 m, the precipitation rate, aerosol optical depth (AOD), the Ångström exponent (ANG), surface concentrations of O3, NO2, PM2.5, PM10, ammonium, sulfate and nitrate. The correlations and the nRMSE are calculated between the observations (2008–2013) and the model results (2013). The spatial correlation, Rs, is in black; the temporal correlation, Rt, in blue; the nRMSE in red.

Meteorological variables

The meteorological variables are T2 m, u10 m and the precipitation rate. The values of the statistical scores are provided, year by year, in Fig. . As an example, the same values are reported for T2 m in Table .

Scores for T2 m. The correlations and nRMSE are calculated between the observations (2008–2013) and the model results (2013).

Year

nRMSE 2008 0.58 0.34 0.31 2009 0.57 0.36 0.32 2010 0.61 0.30 0.34 2011 0.62 0.25 0.32 2012 0.61 0.37 0.32 2013 0.60 0.91 0.22

0.02 0.59 0.10

T2 m is a meteorological variable, constraining processes both for meteorology and chemistry. Its diurnal cycle is strong, as well as its latitudinal variability (for large model domains), often ensuring a good spatial correlation. In general, this variable is the least uncertain of all modeled meteorological parameters. The spatial correlation is good for all years, ranging from 0.57 (2009) to 0.62 (2011). For the studied year (2013), the score is 0.60, slightly lower than for 2011. Even if the correlation for the selected year is good, it is not significantly better than for the other year, with D=0.02. This means that the model reproduces fairly well a spatial pattern that is observed every year. Indeed, the simulation domain is large and the temperature has a latitudinal variability larger than between each measurement station. The temporal correlation ranges from 0.25 to 0.91 (2013). The variability of nRMSE is lower than for the correlations, with values ranging from 0.22 (2013) to 0.34 (2010). The lowest value is found for 2013, highlighting the fact that the model error is the lowest for the reference year. The model is thus performing well in capturing the day-to-day variability for T2 m for the correct reasons.

From Fig. , the calculation of u10 m also gives satisfactory results with Rt=0.60. The spatial correlation, Rs=0.09, is poor and very variable from one year to another. As for T2 m, we also have an effect of the model resolution and the representativeness of the variable.

Scores for the precipitation are correct, with a very good spatial correlation that is always exceeding 0.6. As for the temperature, the latitudinal effect plays a major role in the variability. Both the spatial and temporal correlations increase significantly for the reference year. The nRMSE is not on the plot, with the values being larger than 1.2. The model is biased in absolute values and overestimates the amount of daily precipitation. However, the day-to-day variability is correct and such variability is the most important feature for atmospheric composition modeling (the lower atmosphere is scavenged when a precipitation occurs, whatever its value).

For the meteorological variables, these scores showed that the meteorological forcing is well captured, and always better for the year being considered compared to other years.

Optical properties

The optical properties are directly linked to the atmospheric composition of aerosol and may be quantified using the AOD and the Ångström exponent (ANG).

For the AOD, the spatial correlation is very good for 2013, with Rs=0.97, but it is as good or better for other years. This means that we model a rather recurring phenomenon: every year, the same stations are, on average, exposed to aerosol plumes. The temporal correlation is lower with Rt=0.45 but much better than for other years. This indicates that the model partly reproduces the observed temporal variability but the events are changing from one year to another and the model captures these changes well. In the studied region, the AOD is sensitive to desert dust outbreaks in summer. This means that large-scale systems are driving the aerosol plumes; they are spatially recurrent and temporally better captured for the year being considered than for other years.

For the ANG, the spatial correlation is very good, with Rs=0.91, but also persistent in time. The temporal correlation is much better for 2013 than for other years. This is probably due to a size distribution that is not necessarily well simulated from one day to another (shown by AOD and explained in ) but the relative contributions of fine and coarse aerosol atmospheric load are fairly reproduced. This feature highlights the high sensitivity of the AOD calculation to the modeled aerosol size distribution, although the overall mass emitted and transported is realistic.

Globally, the AOD and ANG reflect the model's ability to retrieve the long-range transport of long-lived aerosols, which depends on several processes (emissions, transport and deposition). These scores show that the model is able to retrieve these yearly recurrent plumes but the model size distribution of particles clearly requires improvement.

Surface concentrations

For the surface concentrations of gaseous and aerosol species, the variability is much more related to local effects. As an example, the detailed values of the statistical indicators and the differences between years are extensively presented for NO2.

Scores for NO2. The correlations and nRMSE are calculated between the observations (2008–2013) and the model results (2013).

Year

nRMSE 2008 0.44 0.00 1.56 2009 0.42

-0.04

1.76 2010 0.66

-0.04

1.82 2011 0.79

-0.03

2.07 2012 0.76 0.04 2.84 2013 0.88 0.22 1.76

0.27 0.23 0.33

NO2 is both primary and secondary in origin. Mostly emitted in urbanized areas, the diurnal cycle of this species is well constrained. Depending on meteorological conditions, its lifetime may vary significantly from hours to days. Modeling this species with CTMs is challenging because several uncertainties are acting at the same time, including the spatial representativeness of the model cell. The scores show whether the sources are properly located and whether the photochemistry and transport processes have been well simulated. In general, at coarse model resolution, the model results for this species are worse than for ozone. The spatial correlation gives a score of Rs=0.88 for 2013. This corresponds to the best correlation compared to the other years. The anthropogenic emissions are strongly related to industrial activities and road traffic, and since these activity sectors are fixed in space, the good spatial correlation is more due to anthropogenic sources that vary in space, such as biogenic and vegetation fires. The temporal correlation is low for 2013, Rt=0.22, but is closer to 0 for other years and therefore significantly better for the reference year compared to the others. These two correlation values show that the model certainly captures the right location of emission sources (low variability of Rs). The nRMSE is large and shows that the concentrations are overestimated by the model. However, this overestimation appears for all years and can be due to the representativeness of the surface measurements compared to the size of model cells.

The spatial correlation is good for O3, NO2 and PM10, with Rs=0.69, 0.88 and 0.81, respectively. For PM2.5, this correlation is low, with Rs=0.16. The PM10 shows that the largest particles are well modeled over the whole domain, and this was also the conclusion for the AOD and ANG. The low score for PM2.5 indicates that for the aerosol distribution the fine mode is not as well modeled as the coarse mode. This is confirmed by the scores of the aerosol inorganic species, ammonium, sulfate and nitrate, which contribute to a large part of the fine fraction of particles. Except for sulfate (with Rs=0.51), the spatial correlations are 0.15 for nitrate and 0.20 for ammonium. Thus, the fine part of the aerosol is not well modeled mainly due to a deficiency in the modeling of nitrates.

The temporal correlations have a completely different behavior than the spatial correlations. The values are generally low, from Rt=0.09 for nitrate to Rt=0.32 for O3. Surprisingly, the PM10 concentrations display a good spatial correlation but a poor temporal correlation. This is due to the long lifetime in the atmosphere of nonreactive species such as mineral dust: plumes are correctly modeled over large areas but the day-to-day variability needs improvement. Another point is the good spatial correlation for NO2 but its low temporal correlation with Rt=0.22. In this case, this means we have a correctly spatialized anthropogenic emission inventory (mainly for NO2 sources), but difficulties to model the day-to-day chemistry still exist.

For the surface concentrations, we can conclude that O3, NO2 and PM10 concentrations are spatially well modeled, and this is not due to a recurrent behavior. For particles, the problem is more related to the fine mode, where PM2.5 concentrations are not well located. This modeling problem is highlighted by the low correlations and Imv values for the inorganic species. For the temporal correlations, the scores are always lower than for the spatial correlation but also always higher for the reference year than for the other years.

Results of the Imv scores for the spatial and temporal correlations. For each model variable, its value is represented using the correlation on the x axis and the difference between the studied year and the others on the y axis. The colors represent the Imv values.

Estimation of the <inline-formula><mml:math id="M133" display="inline"><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mtext>mv</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> indicator for all variables

To summarize the results obtained for each statistical indicator and the values of differences between all years, we apply the Imv formulation. This enables us to have one value for each SI (Rs and Rt) and each variable. Results are presented in Table and are also displayed on single plots in Fig. .

The Imv values for all variables: the meteorology with T2 m, u10 m and precipitation rate; the vertically integrated column of aerosols with the aerosol optical depth (AOD) and the Ångström exponent (ANG); the surface concentrations of all aerosols in terms of size distribution with PM2.5 and PM10; and the inorganic species with Dp<10 µm. Values of Imv above 0.4 are in bold. Units of the variables are detailed in Table .

Variable

Value

Imv

Value

Imv

T2 m

0.60 0.02 0.04 0.91 0.59 0.82

u10 m

0.09 0.23 0.05 0.59 0.56 0.53 Precip. 0.89 0.20 0.49 0.08 0.07 0.02 AOD 0.97 0.02 0.09 0.45 0.34 0.33 ANG 0.91 0.04 0.14 0.59 0.44 0.49 O3 0.69 0.13 0.29 0.32 0.27 0.21 NO2 0.88 0.27 0.58 0.22 0.23 0.13 PM2.5 0.16 0.15 0.07 0.27 0.32 0.20 PM10 0.81 0.10 0.27 0.17 0.14 0.07 Ammonium 0.20 0.13 0.08 0.21 0.20 0.12 Sulfate 0.51 0.21 0.29 0.31 0.34 0.23 Nitrate 0.15 0.51 0.13 0.09 0.08 0.03

In Table , the Imv values larger than 0.4 are highlighted. This threshold is clearly subjective but mentioned here to better highlight the variables being well modeled and with a correct variability from one year to another. As discussed in detail, the best scores are obtained for the meteorological variables and are better for the temporal variability than for the spatial variability.

In Fig. , the x axis represents the correlation (spatial or temporal) and the y axis represents the differences between all years D. For each studied variable, their values are reported on the figure, where the colors represent the values of Imv. The interpretation of these results follows the quality criteria presented in the academic schematic of Fig. . This presentation shows an important spread for the spatial correlation results. If the relative differences D range from 0 to 0.6, the correlations range from 0.09 (for the 10 m wind speed) to 0.97 (for AOD). The common point is that there is no variable with differences above 0.5. This means that, spatially, the studied problem shows systematic patterns from year to year. The low values of correlations show that some variables are systematically poorly estimated. This means that some meteorological structures (for u10 m) or emission sources (contributing to the PM2.5 surface concentrations) are systematically mislocated.

The representation of temporal correlations shows a specific linear pattern. The largest correlation values are positively correlated with differences. This temporal correlation represents the day-to-day variability at each location. This means that the studied problem is based on high day-to-day variability without similar consecutive days (in this case, one would have high correlations but low differences). This illustrates the fact that the studied problem is primarily an issue of sporadic events and the model is able to correctly find this variability from one day to another.

Conclusions

At first glance, using a different year than the simulated one for the day-to-day evaluation seems awkward. However, we can learn more about the performance of chemistry-transport models than by using a single year for the usual statistical indicators. Of course, this approach will never replace a strict evaluation of a pollution case analysis using time series, vertical profiles and usual error statistics. However, it offers a very fast and integrated vision of the strengths and weaknesses of a model with very little calculation. This methodology can also be deployed in intercomparison exercises.

To answer the questions presented in the introduction, for this particular model and simulated period, the following conclusions can be drawn. The model always simulates the studied year better than any other meteorological year and it is able to reproduce the day-to-day variability for high concentrations of pollutants.

The spatial correlation is good for 2 m temperature and precipitation rate but not for wind speed: this highlights the fact that the modeled domain is large and the resolution is not optimized for small-scale processes. The spatial correlation is also very good for the long-range transport of particles, as demonstrated with Rs=0.97 and 0.90 for AOD and ANG. However, since this feature occurs every year, this leads to low Imv values. This means that, for a large domain, the main spatial patterns of particle concentrations are recurrent and well modeled. The chemical species that are best modeled are either species with a long atmospheric lifetime (PM10) or species spatially well constrained on the domain (such as NO2, mainly due to anthropogenic emissions). For particles, the results depend on the size distribution: the coarse particles are better simulated than the fine ones.

The conclusions are different for the temporal correlation. The scores are calculated using daily observations and modeled outputs. Thus, these scores reflect the ability of the model to retrieve the day-to-day variability. As for the spatial correlation, scores are good for the meteorological variables. For the aerosol, and mainly for the long-lived species (such as mineral dust), the temporal correlation is also correct as the Imv values: Imv=0.33 and 0.49 for AOD and ANG, respectively. However, for the short-lived species, the temporal correlation and the Imv values are low. This means that improvements are required in priority for the day-to-day variability compared to the locations of emissions. This may probably be due to the atmospheric transport, with the spatial variability of 10 m wind speed being poorly simulated. However, overall, the temporal correlation is better for the studied year than for the others, showing that the problem is highly variable from year to year, but the model is able to capture the evolution of atmospheric composition.

This study presents a methodology using existing data and models; all required information is already included in this article.

The authors declare that they have no conflict of interest.

Acknowledgements

This study is partly funded by the French Ministry of Ecology. The authors thank the British Atmospheric Data Centre, which is part of the NERC National Centre for Atmospheric Science (NCAS), for making the meteorological data available; the EMEP network for providing atmospheric composition measurements; and the investigators and staff who maintained and provided the AERONET data. Edited by: S. Bekki Reviewed by: two anonymous referees

References Appel et al.(2011)

Appel, K. W., Gilliam, R. C., Davis, N., Zubrow, A., and Howard, S. C.: Overview of the atmospheric model evaluation tool (AMET) v1.1 for evaluating meteorological and air quality models, Environ. Modell. Softw., 26, 434–443, 10.1016/j.envsoft.2010.09.007, 2011.

Baldridge and Cox(1986)

Baldridge, K. and Cox, W.: Evaluating air quality model performance, Environ. Softw., 1, 182–187, 10.1016/0266-9838(86)90023-7, 1986.

Bessagnet et al.(2016)

Bessagnet, B., Pirovano, G., Mircea, M., Cuvelier, C., Aulinger, A., Calori, G., Ciarelli, G., Manders, A., Stern, R., Tsyro, S., García Vivanco, M., Thunis, P., Pay, M.-T., Colette, A., Couvidat, F., Meleux, F., Rouïl, L., Ung, A., Aksoyoglu, S., Baldasano, J. M., Bieser, J., Briganti, G., Cappelletti, A., D'Isidoro, M., Finardi, S., Kranenburg, R., Silibello, C., Carnevale, C., Aas, W., Dupont, J.-C., Fagerli, H., Gonzalez, L., Menut, L., Prévôt, A. S. H., Roberts, P., and White, L.: Presentation of the EURODELTA III intercomparison exercise – evaluation of the chemistry transport models' performance on criteria pollutants and joint analysis with meteorology, Atmos. Chem. Phys., 16, 12667–12701, 10.5194/acp-16-12667-2016, 2016.

Campbell et al.(2015)

Campbell, P., Zhang, Y., Yahya, K., Wang, K., Hogrefe, C., Pouliot, G., Knote, C., Hodzic, A., Jose, R. S., Perez, J. L., Guerrero, P. J., Baro, R., and Makar, P.: A multi-model assessment for the 2006 and 2010 simulations under the Air Quality Model Evaluation International Initiative (AQMEII) phase 2 over North America: Part I. Indicators of the sensitivity of O3 and PM2.5 formation regimes, Atmos. Environ., 115, 569–586, 10.1016/j.atmosenv.2014.12.026, 2015.

Chang and Hanna(2004)

Chang, J. and Hanna, S.: Air quality model performance evaluation, Meteorol. Atmos. Phys., 87, 167–196, 10.1007/s00703-003-0070-7, 2004.

Cox and Tikvart(1990)

Cox, W. M. and Tikvart, J. A.: A statistical procedure for determining the best performing air quality simulation model, Atmos. Environ. A-Gen., 24, 2387–2395, 10.1016/0960-1686(90)90331-G, 1990.

Galmarini et al.(2012)

Galmarini, S., Bianconi, R., Appel, W., Solazzo, E., Mosca, S., Grossi, P., Moran, M., Schere, K., and Rao, S.: {ENSEMBLE} and AMET: Two systems and approaches to a harmonized, simplified and efficient facility for air quality models development and evaluation, Atmos. Environ., 53, 51–59, 10.1016/j.atmosenv.2011.08.076, 2012.

Menut et al.(2015)

Menut, L., Mailler, S., Siour, G., Bessagnet, B., Turquety, S., Rea, G., Briant, R., Mallet, M., Sciare, J., Formenti, P., and Meleux, F.: Ozone and aerosol tropospheric concentrations variability analyzed using the ADRIMED measurements and the WRF and CHIMERE models, Atmos. Chem. Phys., 15, 6159–6182, 10.5194/acp-15-6159-2015, 2015.

Menut et al.(2016)Menut, Siour, Mailler, Couvidat, and Bessagnet

Menut, L., Siour, G., Mailler, S., Couvidat, F., and Bessagnet, B.: Observations and regional modeling of aerosol optical properties, speciation and size distribution over Northern Africa and western Europe, Atmos. Chem. Phys., 16, 12961–12982, 10.5194/acp-16-12961-2016, 2016.

Porter et al.(2015)

Porter, P. S., Rao, S. T., Hogrefe, C., Gego, E., and Mathur, R.: Methods for reducing biases and errors in regional photochemical model outputs for use in emission reduction and exposure assessments, Atmos. Environ., 112, 178–188, 10.1016/j.atmosenv.2015.04.039, 2015.

Prank et al.(2016)

Prank, M., Sofiev, M., Tsyro, S., Hendriks, C., Semeena, V., Vazhappilly Francis, X., Butler, T., Denier van der Gon, H., Friedrich, R., Hendricks, J., Kong, X., Lawrence, M., Righi, M., Samaras, Z., Sausen, R., Kukkonen, J., and Sokhi, R.: Evaluation of the performance of four chemical transport models in predicting the aerosol chemical composition in Europe in 2005, Atmos. Chem. Phys., 16, 6041–6070, 10.5194/acp-16-6041-2016, 2016.

Simon et al.(2012)

Simon, H., Baker, K., and Phillips, S.: Compilation and interpretation of photochemical model performance statistics published between 2006 and 2012, Atmos. Environ., 61, 124–139, 10.1016/j.atmosenv.2012.07.012, 2012.

Solazzo and Galmarini(2015)

Solazzo, E. and Galmarini, S.: Comparing apples with apples: Using spatially distributed time series of monitoring data for model evaluation, Atmos. Environ., 112, 234–245, 10.1016/j.atmosenv.2015.04.037, 2015.

Solazzo and Galmarini(2016)

Solazzo, E. and Galmarini, S.: Error apportionment for atmospheric chemistry-transport models – a new approach to model evaluation, Atmos. Chem. Phys., 16, 6263–6283, 10.5194/acp-16-6263-2016, 2016.

Thunis et al.(2012)

Thunis, P., Pederzoli, A., and Pernigotti, D.: Performance criteria to evaluate air quality modeling applications, Atmos. Environ., 59, 476–482, 10.1016/j.atmosenv.2012.05.043, 2012.

Tørseth et al.(2012)

Tørseth, K., Aas, W., Breivik, K., Fjæraa, A. M., Fiebig, M., Hjellbrekke, A. G., Lund Myhre, C., Solberg, S., and Yttri, K. E.: Introduction to the European Monitoring and Evaluation Programme (EMEP) and observed atmospheric composition change during 1972–2009, Atmos. Chem. Phys., 12, 5447–5481, 10.5194/acp-12-5447-2012, 2012.

Valari and Menut(2008)

Valari, M. and Menut, L.: Does increase in air quality models resolution bring surface ozone concentrations closer to reality?, J. Atmos. Ocean. Tech., 25, 1955–1968, 10.1175/2008JTECHA1123.1, 2008.

Vautard et al.(2012)

Vautard, R., Moran, M. D., Solazzo, E., Gilliam, R. C., Matthias, V., Bianconi, R., Chemel, C., Ferreira, J., Geyer, B., Hansen, A. B., Jericevic, A., Prank, M., Segers, A., Silver, J. D., Werhahn, J., Wolke, R., Rao, S., and Galmarini, S.: Evaluation of the meteorological forcing used for the Air Quality Model Evaluation International Initiative (AQMEII) air quality simulations, Atmos. Environ., 53, 15–37, 10.1016/j.atmosenv.2011.10.065, 2012.

</app></app-group></back> </article>