GMDGeoscientific Model DevelopmentGMDGeosci. Model Dev.1991-9603Copernicus PublicationsGöttingen, Germany10.5194/gmd-10-1199-2017An alternative way to evaluate chemistry-transport model variabilityMenutLaurentmenut@lmd.polytechnique.frhttps://orcid.org/0000-0001-9776-0812MaillerSylvainBessagnetBertrandSiourGuillaumeColetteAugustinCouvidatFlorianMeleuxFrédérikLaboratoire de Météorologie Dynamique, Ecole Polytechnique,
IPSL Research University, Ecole Normale Supérieure, Université Paris-Saclay,
Sorbonne Universités, UPMC Univ Paris 06, CNRS, Route de Saclay, 91128 Palaiseau, FranceINERIS, National Institute for Industrial Environment and Risks,
Parc Technologique ALATA, 60550 Verneuil-en-Halatte, FranceLaboratoire Inter-Universitaire des Systèmes Atmosphériques,
UMR CNRS 7583, Université Paris Est Créteil et Université Paris Diderot,
Institut Pierre Simon Laplace, Créteil, FranceLaurent Menut (menut@lmd.polytechnique.fr)17March20171031199120817June201624June201618February201723February2017This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/This article is available from https://gmd.copernicus.org/articles/10/1199/2017/gmd-10-1199-2017.htmlThe full text article is available as a PDF file from https://gmd.copernicus.org/articles/10/1199/2017/gmd-10-1199-2017.pdf
A simple and complementary model evaluation technique for regional
chemistry transport is discussed. The methodology is based on the concept
that we can learn about model performance by comparing the simulation results
with observational data available for time periods other than the period
originally targeted. First, the statistical indicators selected in this study
(spatial and temporal correlations) are computed for a given time period,
using colocated observation and simulation data in time and space. Second,
the same indicators are used to calculate scores for several other years
while conserving the spatial locations and Julian days of the year. The difference
between the results provides useful insights on the model capability to
reproduce the observed day-to-day and spatial variability. In order to
synthesize the large amount of results, a new indicator is proposed, designed
to compare several error statistics between all the years of validation
and to quantify whether the period and area being studied were well captured by the
model for the correct reasons.
Introduction
Chemistry-transport models (CTMs) aim at simulating the atmospheric
composition where humans and the environment can be affected by air
pollution. Air pollution results from the presence of chemical compounds
emitted into the atmosphere due to anthropogenic activities and natural
sources (biogenic emissions from vegetation, soil erosion, sea salt,
volcanic activity and wildfires). CTMs are used to represent the dynamical
and chemical processes that drive spatial and temporal features of the
atmospheric composition.
To estimate the quality of CTMs, model output results are usually compared
with available observations. These comparisons have been performed for as long as the models
have existed; they are crucial for quantifying the ability of models to reproduce
particular events or a general behavior. The quantification of the model
quality is performed in every research work. It depends on the case being
studied, the modeled variables and the spatial and temporal resolutions. The
comparison between observations and model outputs is a complex task and has
to take into account numerous factors such as the spatial
representativeness of the monitoring stations .
For many years, the best approach to evaluate a model's results has been
discussed and, in the field of atmospheric composition, numerous methods were
proposed. It is not possible to give an exhaustive list of all validation
studies and we present some examples here.
and proposed the use of error
statistics like correlation, bias and root mean squared error (RMSE) in the
specific framework of air quality, i.e., the atmospheric composition
when criteria pollutant concentrations exceed predefined limit values.
also proposed an evaluation framework dedicated to air
quality model performance and explained that there is not “a single best
evaluation methodology” and how important it is to use as many evaluation
criteria as possible to really understand model results well. Later, and in
order to ensure the use of systematic procedures in the evaluation process,
dedicated tools were developed for the model evaluation. For example,
and proposed complex statistical
modules to extract all possible information related to the capability of a
model to reproduce an observed event. In parallel, some studies were
dedicated to revisit the way to evaluate models such as ,
dedicated to air quality in a policy framework. In this study, the authors proposed
the “target diagram” to have the bias and the RMSE on the same plot.
Complementary to the definition of performance indicators to be used,
used these indicators to compile photochemical model
performance for a large set of data over several years of simulation. This
kind of evaluation may also be done in dedicated projects, such as the recent
AQMEII (Air Quality Model Evaluation International Initiative), comparing
chemistry-transport models running both in Europe and North America
; the EURODELTA project
; and the EMEP (European Monitoring and Evaluation
Programme) context in the framework of the United Nations Convention on Long-range
Transboundary Air Pollution . Using comparisons between
observations and model outputs, some studies proposed methodologies to
decompose the statistical scores in order to estimate the main source of
errors . Finally, other studies also use observations to
adjust the result by implementing methods to unbias simulation without
changing the model, as in for ozone over the United
States. The common point of all these studies is that they are always using,
as best as possible, the observations corresponding in time and location to
the model grid cell.
In the present study, a simple method is proposed to add information about
the model performance with a focus on its spatial and temporal variability.
To reach this objective, we propose to use observations corresponding to the
modeled period and geographical domain but also to use observations for the
same domain but other periods. In this way, we want to extract the
information about the model variability and to answer the following question:
is the performance of the model satisfactory because the model is accurate or
just because the model is able to reproduce a situation which is recurrent
from year to year? The issue to be solved and the tools developed are
presented in Sect. . The new methodology with the presentation
of the indicator developed for this study are presented in
Sect. . The results and discussions to point out the drivers of
model errors are presented in Sects. and for
the new indicator.
Methodology
In the present study, a simple method is developed to improve the evaluation
of model variability and to identify the processes responsible for
discrepancies of model outputs vs. observations. The methodology is
general and could be applied to all types of models. In this study, the
methodology is presented for the specific case of the regional atmospheric
composition modeling: a topic mixing meteorology and chemistry, with a high
spatial and temporal variability, thus having a good potential to test the
relevance of our methodology.
Regional chemistry-transport modeling
In chemistry-transport modeling, several processes are involved, some of
them directly influencing the others. When studying both meteorological and
chemical variables, the dependencies between all variables are helpful to
better interpret the model results.
The boundary conditions prescribe the concentrations of chemical species
which may enter the simulation domain. Usually for large domains, they are
issued from global models as monthly climatologies. They correspond to
averaged values suitable to characterize the background concentrations of
long-lived species such as ozone, carbon monoxide and mineral dust.
Anthropogenic emissions are prescribed from databases and the influence of
meteorology is limited in the model. Vegetation, fire and mineral dust
emissions depend both on land-use data and meteorology. These emissions are
not measurable; it is almost impossible to directly assess their quality.
The meteorological variables influence transport and mixing processes, with a
direct effect on gas and aerosol plume locations and their vertical
distribution. Cloudiness and temperature impact the photolysis efficiency;
the boundary layer height impacts the surface mixing of pollutants; rainfall
impacts the wet deposition. Moreover, meteorology also has an impact on
emissions: wind variability is the prevalent driver for dust emissions, and
it has also a major impact on wildfire emissions. Both temperature and solar
irradiance influence the magnitude of biogenic emissions from vegetation. The
spatial variability of land-use data also has a strong impact on all these
natural emissions.
The chemistry-transport model is a numerical integration tool of all forcings
and processes. The chemical mechanism handles the life cycle of chemical
species (production and loss) when the deposition processes are the only net
sink of species. In the model, the spatial (horizontal and vertical) and
temporal resolutions are also prescribed, directly impacting the simulation
representativeness and thus the quality of the modeled air pollutant
concentrations when they are compared to available observations.
The studied case
The study focuses on the summer 2013 period (1 May to 31 August) over the
European Mediterranean region. This period is called the “reference period” in this
paper. This case has already been modeled (using the same models, WRF and
CHIMERE) and the results were discussed in . The same
simulation is used in this study; all parameters are identical.
List of measurement data used for the statistical comparison with
the model results. All data used are issued from surface stations
representative of their own environment. Originally provided hourly or
3-hourly, they are used as daily averages in this work. The abbreviation “ad.”
is used to indicate dimensionless units.
The observational data come from different sources depending on the
variables (see Table ). In this region, where the monitoring networks are
dense enough, comparisons are performed with observations from surface
stations that provide hourly O3, NO2 surface concentrations for
gases and PM2.5 and PM10 (particulate matter with mean mass median
diameter lower than 2.5 and 10 µm, respectively) for particles.
Complementary to surface concentration data, evaluated using the EBAS
database , the meteorology is also evaluated for 2 m
temperature (T2 m), 10 m wind speed (U10 m) and
precipitation rates (in mm day-1) from the BADC (British Atmospheric
Data Centre). In order to quantify the transport of aerosols in dense plumes
aloft, observations from the AERONET (AErosol RObotic NETwork) program are used
for the aerosol optical depth (AOD) and the Ångström exponent. In this study, all
variables are used as the daily mean (except for precipitation corresponding to
daily cumulated values) in order to (i) have homogeneous scores between the
variables and (ii) be able to separate the systematic and the day-to-day
variabilities. The use of an hourly time frequency was ruled out to avoid a
too strong weight of the diurnal cycle in the temporal variability.
Proposed methodology
As discussed in the introduction, many statistical indicators (SIs) exist to
quantify the model ability to simulate observed pollution events. The
correlations (temporal and spatial), the RMSE, its
normalized expression nRMSE and the bias (the difference between
observations and modeled values) are widely used in regional air pollution
modeling. The correlations are able to split the relative contributions of
systematic meteorology or source-related variability and day-to-day
variability. The RMSE and the bias are a direct quantification of the model
error.
Principle of the multi-year variability indicator (Imv)
calculation, using one modeled year and several years of observations. SI
stands for “statistical indicator” and is related to spatial and temporal
correlation.
The main goal of this study is to separate the contributions due to
systematic and sporadic events. The systematic events correspond to yearly
phenomena, while the sporadic events correspond to the events observed during one
year but not the others. In addition, complementary to the model variability
quantification, the model error is also important to estimate. The key points
of this study are to (i) study the model variability which is statistically
represented by the correlations and (ii) add complementary information on
the model errors, which could be represented here by the RMSE (or the nRMSE).
First, as presented in Fig. , the SIs are calculated between
observation data and model outputs for the simulation year (i.e., the
reference year). Second, the SIs are calculated between the observation data
for other years and the model output for the reference year. Logically, the
scores calculated for the reference year for observations and model outputs
would give the better results. By examining the difference with the scores calculated for
other years (with the observations only), we expect to conclude whether the model
is able to catch the observed variability for the correct reasons. Using
this approach, the goal is to give complementary information to those usually
obtained when using only SIs calculated for a single year (the studied year).
We apply this methodology for the simulation of the year 2013 and using
observation data for years ranging from 2008 to 2013. In order to give some
synthetic answers, the different SI scores are aggregated into a single
indicator called Imv and presented in detail in the next section.
Of course, it seems awkward to evaluate a model day by day with
observational data from another year. For a given station, at a given day of
the reference year, air concentrations will be affected by a different local
meteorology, emissions and long-range transport of chemical species. However,
we can consider that to take the same date for another year is strictly the
same as randomly choosing a date in the same season. This trivial method
can emphasize how a model is affected by large-scale patterns and long-term
temporal cycles.
Calculation of correlations and nRMSE
In this study, we focus on three statistical indicators: the spatial
correlation, the temporal correlation and the normalized RMSE. For these
three indicators, it is important that, for all years of validation, the same
list of stations with valid measurements is used.
The correlation used in this study is Pearson's correlation. Each
correlation provides specific information on the quality of the simulation.
The temporal correlation, noted Rt, is estimated station by station
using daily averaged data in order to have homogeneous comparisons between
all variables. This correlation is directly related to the variability from
day to day for each station. Ot,i and Mt,i represent the observed
and modeled values, respectively, at time t for the station i, for a
total of T days and I stations. The mean time-averaged value
Xi‾ is
Xi‾=1T∑t=1TXt,i.
The temporal correlation Rt,i for each station i is calculated as
Rt,i=∑t=1T(Mt,i-Mi‾)(Ot,i-Oi‾)∑t=1T(Mt,i-Mi‾)2∑t=1T(Ot,i-Oi‾)2.
The mean temporal correlation, Rt, used in this study is thus
Rt=1I∑i=1IRt,i,
with I the total number of stations. The spatial correlation, noted
Rs, uses the same formula type except it is calculated from the
temporal averaged values of observations and model for each location
where observations are available. A good correlation shows that the model
correctly locates the largest horizontal gradients as known sources and long-range
transport plumes. The spatiotemporal averaged value is estimated as
X‾‾=1I∑i=1IXi‾,
and the spatial correlation is thus expressed as
Rs=∑i=1I(Mi‾-M‾‾)(Oi‾-O‾‾)∑i=1I(Mi‾-M‾‾)2∑i=1I(Oi‾-O‾‾)2.
The normalized RMSE is expressed as
nRMSE=1T1I∑t=1T∑i=1IOt,i-Mt,iOt,i2
for all stations i and all times t.
Definition of the Imv indicator
For the specific purpose of the model variability (and not the model error),
we define an indicator, Imv, dedicated to express in one value the
results obtained with the temporal and spatial correlations. The goal of this
indicator is to quantify how the correlation between measurement data (for
different years) and model outputs (for the reference year) evolves from one
year to another. This indicator does not replace the usual statistical
indicators but aims at providing complementary information about the
variability between years.
We first define the differences, D, between all years as
D=1N-1∑i=1N-1|si-sN|,
with sN the score of the indicator for the reference year being modeled
and si the score of the indicator computed using observations
corresponding to other meteorological years (from 1 to N-1 if there are
N-1 other available years for the observations).
We now aim to develop a simple indicator, called Imv, which is a
combination of the statistical indicator for the reference year and the
differences between years. This Imv corresponds, in fact, to the SI
itself weighted by the differences between the SI scores of all years. We
expect that Imv follows the following rules:
Imv has the same evolution as the studied SI. If the
correlation increases, Imv also increases.
Imv is bounded between 0 and 1, like the correlation. This
enables us to compare the results for different variables (with different metrics).
In the case of a high correlation value found for the studied year, the obtained sN
value is close to 1. This value may be lower for the following reasons:
If the differences between the other years are low (D tends to 0),
it means that the model is correct for the studied year, but possibly because
it reproduces a recurrent phenomena. In this case, we want Imv to
decrease and tend to 0.
If the differences between the other years are high (D tends to 1),
it means the model gives good results for the studied year, but it is not
because it simulates a systematic event. In this case, we want
Imv to remain close to the indicator value. With sN≈1 and
Imv≈1, we can conclude that the model is very good for the
studied year and this is not due to a recurrent process.
In the case of a low correlation value, and whatever the magnitude of differences
between years, the model is not correct. Imv must be low, as it is the
indicator value.
These constraints allow us to define an indicator having this kind of formulation:
Imv=sN1-exp(-Ds)4.
This means that Imv always has, as a maximum, the value of the
indicator itself. The power of 4 is here defined to have a specific shape for
Imv, respecting the rules presented below. Finally, this expression
gives an indicator variability presented in Fig. . Considering the
state of the art of chemistry-transport modeling, the model is considered
accurate, having an acceptable variability for Imv>0.4: this
means that the correlation is at least 0.5 and the differences are also at
least greater than 0.5.
Scheme of the Imv values as a function of the studied year
correlation values and the multi-year differences D.
Finally, this indicator is not calculated for nRMSE and bias. Two reasons
explain this choice: the first reason is that, contrarily to correlations, RMSE and bias are not
bounded between 0 and 1. This leads to indicator values possibly varying a
lot between several years and thus being difficult to compare between years.
The second reason is that the goal of the indicators is to extract a message from the model
variability of the studied year compared to the other years. In this case,
the correlations constitute a statistical indicator which is more
appropriate for this evaluation.
Time series of statistical indicators
The calculations of differences are performed for the correlations and the
nRMSE. These values are calculated for all variables described in
Table for the years 2008 to 2013. For each year, it is
noted that only the May to August period is considered. Results are
presented as time series in Fig. and discussed in the
following sections. Note also that some values discussed in these sections
are also reported in the synthetic Table .
Multi-year scores for T2 m, u10 m, the
precipitation rate, aerosol optical depth (AOD), the Ångström exponent (ANG),
surface concentrations of O3, NO2, PM2.5, PM10, ammonium,
sulfate and nitrate. The correlations and the nRMSE are calculated between
the observations (2008–2013) and the model results (2013). The spatial
correlation, Rs, is in black; the temporal correlation, Rt, in
blue; the nRMSE in red.
Meteorological variables
The meteorological variables are T2 m, u10 m and the
precipitation rate. The values of the statistical scores are provided, year
by year, in Fig. . As an example, the same values are
reported for T2 m in Table .
Scores for T2 m. The correlations and nRMSE are
calculated between the observations (2008–2013) and the model results
(2013).
T2 m is a meteorological variable, constraining processes both for
meteorology and chemistry. Its diurnal cycle is strong, as well as its
latitudinal variability (for large model domains), often ensuring a good
spatial correlation. In general, this variable is the least uncertain of all
modeled meteorological parameters. The spatial correlation is good for all
years, ranging from 0.57 (2009) to 0.62 (2011). For the studied year (2013),
the score is 0.60, slightly lower than for 2011. Even if the correlation for
the selected year is good, it is not significantly better than for the other
year, with D=0.02. This means that the model reproduces fairly well a
spatial pattern that is observed every year. Indeed, the simulation domain is
large and the temperature has a latitudinal variability larger than between
each measurement station. The temporal correlation ranges from 0.25 to 0.91
(2013). The variability of nRMSE is lower than for the correlations, with
values ranging from 0.22 (2013) to 0.34 (2010). The lowest value is found for
2013, highlighting the fact that the model error is the lowest for the
reference year. The model is thus performing well in capturing the day-to-day
variability for T2 m for the correct reasons.
From Fig. , the calculation of u10 m also
gives satisfactory results with Rt=0.60. The spatial correlation,
Rs=0.09, is poor and very variable from one year to another. As for
T2 m, we also have an effect of the model resolution and the
representativeness of the variable.
Scores for the precipitation are correct, with a very good spatial
correlation that is always exceeding 0.6. As for the temperature, the latitudinal
effect plays a major role in the variability. Both the spatial and temporal
correlations increase significantly for the reference year. The nRMSE is not
on the plot, with the values being larger than 1.2. The model is biased in
absolute values and overestimates the amount of daily precipitation. However, the
day-to-day variability is correct and such variability is the most important
feature for atmospheric composition modeling (the lower atmosphere is
scavenged when a precipitation occurs, whatever its value).
For the meteorological variables, these scores showed that the meteorological
forcing is well captured, and always better for the year being considered
compared to other years.
Optical properties
The optical properties are directly linked to the atmospheric composition of
aerosol and may be quantified using the AOD and the
Ångström exponent (ANG).
For the AOD, the spatial correlation is very good for 2013, with Rs=0.97, but
it is as good or better for other years. This means that we model a rather
recurring phenomenon: every year, the same stations are, on average, exposed to
aerosol plumes. The temporal correlation is lower with Rt=0.45 but much
better than for other years. This indicates that the model partly reproduces
the observed temporal variability but the events are changing from one year
to another and the model captures these changes well. In the studied region,
the AOD is sensitive to desert dust outbreaks in summer. This means that
large-scale systems are driving the aerosol plumes; they are spatially
recurrent and temporally better captured for the year being considered than
for other years.
For the ANG, the spatial correlation is very good, with Rs=0.91, but also
persistent in time. The temporal correlation is much better for 2013 than
for other years. This is probably due to a size distribution that is not
necessarily well simulated from one day to another (shown by AOD and
explained in ) but the relative contributions of fine and
coarse aerosol atmospheric load are fairly reproduced. This feature
highlights the high sensitivity of the AOD calculation to the modeled
aerosol size distribution, although the overall mass emitted and transported
is realistic.
Globally, the AOD and ANG reflect the model's ability to retrieve the long-range
transport of long-lived aerosols, which depends on several processes
(emissions, transport and deposition). These scores show that the model is
able to retrieve these yearly recurrent plumes but the model size
distribution of particles clearly requires improvement.
Surface concentrations
For the surface concentrations of gaseous and aerosol species, the
variability is much more related to local effects. As an example, the
detailed values of the statistical indicators and the differences between
years are extensively presented for NO2.
Scores for NO2. The correlations and nRMSE are calculated
between the observations (2008–2013) and the model results (2013).
NO2 is both primary and secondary in origin. Mostly emitted in urbanized
areas, the diurnal cycle of this species is well constrained. Depending on
meteorological conditions, its lifetime may vary significantly from hours to
days. Modeling this species with CTMs is challenging because several
uncertainties are acting at the same time, including the spatial
representativeness of the model cell. The scores show whether the sources are
properly located and whether the photochemistry and transport processes have been
well simulated. In general, at coarse model resolution, the model results for
this species are worse than for ozone. The spatial correlation gives a score
of Rs=0.88 for 2013. This corresponds to the best correlation compared
to the other years. The anthropogenic emissions are strongly related to
industrial activities and road traffic, and since these activity sectors are
fixed in space, the good spatial correlation is more due to anthropogenic
sources that vary in space, such as biogenic and vegetation fires. The
temporal correlation is low for 2013, Rt=0.22, but is closer to 0 for
other years and therefore significantly better for the reference year compared
to the others. These two correlation values show that the model certainly
captures the right location of emission sources (low variability of Rs).
The nRMSE is large and shows that the concentrations are overestimated by the
model. However, this overestimation appears for all years and can be due to the
representativeness of the surface measurements compared to the size of model
cells.
The spatial correlation is good for O3, NO2 and PM10, with
Rs=0.69, 0.88 and 0.81, respectively. For PM2.5, this correlation is
low, with Rs=0.16. The PM10 shows that the largest particles are
well modeled over the whole domain, and this was also the conclusion for the
AOD and ANG. The low score for PM2.5 indicates that for the aerosol
distribution the fine mode is not as well modeled as the coarse mode. This
is confirmed by the scores of the aerosol inorganic species, ammonium,
sulfate and nitrate, which contribute to a large part of the fine
fraction of particles. Except for sulfate (with Rs=0.51), the spatial
correlations are 0.15 for nitrate and 0.20 for ammonium. Thus, the fine part
of the aerosol is not well modeled mainly due to a deficiency in the
modeling of nitrates.
The temporal correlations have a completely different behavior than the
spatial correlations. The values are generally low, from Rt=0.09 for
nitrate to Rt=0.32 for O3. Surprisingly, the PM10
concentrations display a good spatial correlation but a poor temporal
correlation. This is due to the long lifetime in the atmosphere of
nonreactive species such as mineral dust: plumes are correctly modeled over
large areas but the day-to-day variability needs improvement. Another point
is the good spatial correlation for NO2 but its low temporal correlation
with Rt=0.22. In this case, this means we have a correctly spatialized
anthropogenic emission inventory (mainly for NO2 sources), but
difficulties to model the day-to-day chemistry still exist.
For the surface concentrations, we can conclude that O3, NO2 and
PM10 concentrations are spatially well modeled, and this is not due to a
recurrent behavior. For particles, the problem is more related to the fine
mode, where PM2.5 concentrations are not well located. This modeling
problem is highlighted by the low correlations and Imv values for
the inorganic species. For the temporal correlations, the scores are always
lower than for the spatial correlation but also always higher for the
reference year than for the other years.
Results of the Imv scores for the spatial and temporal
correlations. For each model variable, its value is represented using the
correlation on the x axis and the difference between the studied year and
the others on the y axis. The colors represent the Imv values.
Estimation of the Imv indicator for all variables
To summarize the results obtained for each statistical indicator and the
values of differences between all years, we apply the Imv
formulation. This enables us to have one value for each SI (Rs and
Rt) and each variable. Results are presented in Table and
are also displayed on single plots in Fig. .
The Imv values for all variables: the meteorology with
T2 m, u10 m and precipitation rate; the vertically
integrated column of aerosols with the aerosol optical depth (AOD) and the
Ångström exponent (ANG); the surface concentrations of all aerosols in
terms of size distribution with PM2.5 and PM10; and the inorganic
species with Dp<10µm. Values of Imv above 0.4
are in bold. Units of the variables are detailed in Table .
In Table , the Imv values larger than 0.4 are highlighted.
This threshold is clearly subjective but mentioned here to better highlight
the variables being well modeled and with a correct variability from one year
to another. As discussed in detail, the best scores are obtained for the
meteorological variables and are better for the temporal variability than
for the spatial variability.
In Fig. , the x axis represents the correlation (spatial
or temporal) and the y axis represents the differences between all years D.
For each studied variable, their values are reported on the figure, where the
colors represent the values of Imv. The interpretation of these
results follows the quality criteria presented in the academic schematic of
Fig. . This presentation shows an important spread for the spatial
correlation results. If the relative differences D range from 0 to 0.6, the
correlations range from 0.09 (for the 10 m wind speed) to 0.97 (for AOD).
The common point is that there is no variable with differences above 0.5.
This means that, spatially, the studied problem shows systematic patterns
from year to year. The low values of correlations show that some variables
are systematically poorly estimated. This means that some meteorological
structures (for u10 m) or emission sources (contributing to the
PM2.5 surface concentrations) are systematically mislocated.
The representation of temporal correlations shows a specific linear pattern.
The largest correlation values are positively correlated with differences.
This temporal correlation represents the day-to-day variability at each
location. This means that the studied problem is based on high day-to-day
variability without similar consecutive days (in this case, one would have
high correlations but low differences). This illustrates the fact that the
studied problem is primarily an issue of sporadic events and the model is
able to correctly find this variability from one day to another.
Conclusions
At first glance, using a different year than the simulated one for the day-to-day
evaluation seems awkward. However, we can learn more about the
performance of chemistry-transport models than by using a single year for the
usual statistical indicators. Of course, this approach will never replace a
strict evaluation of a pollution case analysis using time series, vertical
profiles and usual error statistics. However, it offers a very fast and
integrated vision of the strengths and weaknesses of a model with very little
calculation. This methodology can also be deployed in intercomparison
exercises.
To answer the questions presented in the introduction, for this
particular model and simulated period, the following conclusions can be
drawn. The model always simulates the studied year better than any other
meteorological year and it is able to reproduce the day-to-day variability
for high concentrations of pollutants.
The spatial correlation is good for 2 m temperature and precipitation rate
but not for wind speed: this highlights the fact that the modeled domain is
large and the resolution is not optimized for small-scale processes. The spatial
correlation is also very good for the long-range transport of particles, as
demonstrated with Rs=0.97 and 0.90 for AOD and ANG. However, since this
feature occurs every year, this leads to low Imv values. This means
that, for a large domain, the main spatial patterns of particle concentrations
are recurrent and well modeled. The chemical species that are best modeled
are either species with a long atmospheric lifetime (PM10) or species
spatially well constrained on the domain (such as NO2, mainly due to
anthropogenic emissions). For particles, the results depend on the size
distribution: the coarse particles are better simulated than the fine ones.
The conclusions are different for the temporal correlation. The scores are
calculated using daily observations and modeled outputs. Thus, these scores
reflect the ability of the model to retrieve the day-to-day variability. As
for the spatial correlation, scores are good for the meteorological
variables. For the aerosol, and mainly for the long-lived species (such as
mineral dust), the temporal correlation is also correct as the Imv
values: Imv=0.33 and 0.49 for AOD and ANG, respectively. However, for the
short-lived species, the temporal correlation and the Imv values are
low. This means that improvements are required in priority for the day-to-day
variability compared to the locations of emissions. This may probably be due
to the atmospheric transport, with the spatial variability of 10 m wind speed
being poorly simulated. However, overall, the temporal correlation is better for
the studied year than for the others, showing that the problem is highly
variable from year to year, but the model is able to capture the evolution of
atmospheric composition.
This study presents a methodology using existing data
and models; all required information is already included in this article.
The authors declare that they have no conflict of
interest.
Acknowledgements
This study is partly funded by the French Ministry of Ecology.
The authors thank the British Atmospheric Data Centre, which is part of the NERC
National Centre for Atmospheric Science (NCAS), for making the
meteorological data available; the EMEP network for providing atmospheric composition
measurements; and the investigators and staff who maintained and provided the
AERONET data. Edited by:
S. Bekki Reviewed by: two anonymous referees
ReferencesAppel, K. W., Gilliam, R. C., Davis, N., Zubrow, A., and Howard, S. C.:
Overview of the atmospheric model evaluation tool (AMET) v1.1 for evaluating
meteorological and air quality models, Environ. Modell. Softw., 26,
434–443, 10.1016/j.envsoft.2010.09.007, 2011.Baldridge, K. and Cox, W.: Evaluating air quality model performance, Environ.
Softw., 1, 182–187, 10.1016/0266-9838(86)90023-7, 1986.Bessagnet, B., Pirovano, G., Mircea, M., Cuvelier, C., Aulinger, A., Calori,
G., Ciarelli, G., Manders, A., Stern, R., Tsyro, S., García Vivanco, M.,
Thunis, P., Pay, M.-T., Colette, A., Couvidat, F., Meleux, F., Rouïl,
L., Ung, A., Aksoyoglu, S., Baldasano, J. M., Bieser, J., Briganti, G.,
Cappelletti, A., D'Isidoro, M., Finardi, S., Kranenburg, R., Silibello, C.,
Carnevale, C., Aas, W., Dupont, J.-C., Fagerli, H., Gonzalez, L., Menut, L.,
Prévôt, A. S. H., Roberts, P., and White, L.: Presentation of the
EURODELTA III intercomparison exercise – evaluation of the chemistry
transport models' performance on criteria pollutants and joint analysis with
meteorology, Atmos. Chem. Phys., 16, 12667–12701,
10.5194/acp-16-12667-2016, 2016.Campbell, P., Zhang, Y., Yahya, K., Wang, K., Hogrefe, C., Pouliot, G.,
Knote, C., Hodzic, A., Jose, R. S., Perez, J. L., Guerrero, P. J., Baro, R.,
and Makar, P.: A multi-model assessment for the 2006 and 2010 simulations
under the Air Quality Model Evaluation International Initiative (AQMEII)
phase 2 over North America: Part I. Indicators of the sensitivity of O3
and PM2.5 formation regimes, Atmos. Environ., 115, 569–586,
10.1016/j.atmosenv.2014.12.026, 2015.Chang, J. and Hanna, S.: Air quality model performance evaluation, Meteorol.
Atmos. Phys., 87, 167–196, 10.1007/s00703-003-0070-7, 2004.Cox, W. M. and Tikvart, J. A.: A statistical procedure for determining the
best performing air quality simulation model, Atmos. Environ. A-Gen., 24,
2387–2395, 10.1016/0960-1686(90)90331-G, 1990.Galmarini, S., Bianconi, R., Appel, W., Solazzo, E., Mosca, S., Grossi, P.,
Moran, M., Schere, K., and Rao, S.: {ENSEMBLE} and AMET: Two systems and
approaches to a harmonized, simplified and efficient facility for air quality
models development and evaluation, Atmos. Environ., 53, 51–59,
10.1016/j.atmosenv.2011.08.076, 2012.Menut, L., Mailler, S., Siour, G., Bessagnet, B., Turquety, S., Rea, G.,
Briant, R., Mallet, M., Sciare, J., Formenti, P., and Meleux, F.: Ozone and
aerosol tropospheric concentrations variability analyzed using the ADRIMED
measurements and the WRF and CHIMERE models, Atmos. Chem. Phys., 15,
6159–6182, 10.5194/acp-15-6159-2015, 2015.Menut, L., Siour, G., Mailler, S., Couvidat, F., and Bessagnet, B.:
Observations and regional modeling of aerosol optical properties, speciation
and size distribution over Northern Africa and western Europe, Atmos. Chem.
Phys., 16, 12961–12982, 10.5194/acp-16-12961-2016, 2016.Porter, P. S., Rao, S. T., Hogrefe, C., Gego, E., and Mathur, R.: Methods for
reducing biases and errors in regional photochemical model outputs for use in
emission reduction and exposure assessments, Atmos. Environ., 112, 178–188,
10.1016/j.atmosenv.2015.04.039, 2015.Prank, M., Sofiev, M., Tsyro, S., Hendriks, C., Semeena, V., Vazhappilly
Francis, X., Butler, T., Denier van der Gon, H., Friedrich, R., Hendricks,
J., Kong, X., Lawrence, M., Righi, M., Samaras, Z., Sausen, R., Kukkonen, J.,
and Sokhi, R.: Evaluation of the performance of four chemical transport
models in predicting the aerosol chemical composition in Europe in 2005,
Atmos. Chem. Phys., 16, 6041–6070, 10.5194/acp-16-6041-2016, 2016.Simon, H., Baker, K., and Phillips, S.: Compilation and interpretation of
photochemical model performance statistics published between 2006 and 2012,
Atmos. Environ., 61, 124–139, 10.1016/j.atmosenv.2012.07.012, 2012.Solazzo, E. and Galmarini, S.: Comparing apples with apples: Using spatially
distributed time series of monitoring data for model evaluation, Atmos.
Environ., 112, 234–245, 10.1016/j.atmosenv.2015.04.037, 2015.
Solazzo, E. and Galmarini, S.: Error apportionment for atmospheric
chemistry-transport models – a new approach to model evaluation, Atmos.
Chem. Phys., 16, 6263–6283, 10.5194/acp-16-6263-2016, 2016.Thunis, P., Pederzoli, A., and Pernigotti, D.: Performance criteria to
evaluate air quality modeling applications, Atmos. Environ., 59, 476–482,
10.1016/j.atmosenv.2012.05.043, 2012.Tørseth, K., Aas, W., Breivik, K., Fjæraa, A. M., Fiebig, M.,
Hjellbrekke, A. G., Lund Myhre, C., Solberg, S., and Yttri, K. E.:
Introduction to the European Monitoring and Evaluation Programme (EMEP) and
observed atmospheric composition change during 1972–2009, Atmos. Chem.
Phys., 12, 5447–5481, 10.5194/acp-12-5447-2012, 2012.Valari, M. and Menut, L.: Does increase in air quality models resolution
bring surface ozone concentrations closer to reality?, J. Atmos. Ocean.
Tech., 25, 1955–1968, 10.1175/2008JTECHA1123.1, 2008.Vautard, R., Moran, M. D., Solazzo, E., Gilliam, R. C., Matthias, V.,
Bianconi, R., Chemel, C., Ferreira, J., Geyer, B., Hansen, A. B., Jericevic,
A., Prank, M., Segers, A., Silver, J. D., Werhahn, J., Wolke, R., Rao, S.,
and Galmarini, S.: Evaluation of the meteorological forcing used for the Air
Quality Model Evaluation International Initiative (AQMEII) air quality
simulations, Atmos. Environ., 53, 15–37,
10.1016/j.atmosenv.2011.10.065, 2012.