Quantitative evaluation of ozone and selected climate parameters in a set of EMAC simulations

Four simulations with the ECHAM/MESSy Atmospheric Chemistry (EMAC) model have been evaluated with the Earth System Model Validation Tool (ESMValTool) to identify differences in simulated ozone and selected climate parameters that resulted from (i) different setups of the EMAC model (nudged vs. free-running) and (ii) different boundary conditions (emissions, sea surface temperatures (SSTs) and sea ice concentrations (SICs)). To assess the relative performance of the simulations, quantitative performance metrics are calculated consistently for the climate parameters and ozone. This is important for the interpretation of the evaluation results since biases in climate can impact on biases in chemistry and vice versa. The observational data sets used for the evaluation include ozonesonde and aircraft data, meteorological reanalyses and satellite measurements. The results from a previous EMAC evaluation of a model simulation with nudging towards realistic meteorology in the troposphere have been compared to new simulations with different model setups and updated emission data sets in free-running time slice and nudged quasi chemistrytransport model (QCTM) mode. The latter two configurations are particularly important for chemistry-climate projections and for the quantification of individual sources (e.g., the transport sector) that lead to small chemical perturbations of the climate system, respectively. With the exception of some specific features which are detailed in this study, no large differences that could be related to the different setups (nudged vs. free-running) of the EMAC simulations were found, which offers the possibility to evaluate and improve the overall model with the help of shorter nudged simulations. The main differences between the two setups is a better representation of the tropospheric and stratospheric temperature in the nudged simulations, which also better reproduce stratospheric water vapor concentrations, due to the improved simulation of the temperature in the tropical tropopause layer. Ozone and ozone precursor concentrations, on the other hand, are very similar in the different model setups, if similar boundary conditions are used. Different boundary conditions however lead to relevant differences in the four simulations. Biases which are common to all simulations are the underestimation of the ozone hole and the overestimation of tropospheric column ozone, the latter being significantly reduced when lower lightning emissions of nitrogen oxides are used. To further investigate possible other reasons for such bias, two sensitivity simulations with an updated scavenging routine and the addition of a newly proposed HNO3-forming channel of the HO2+NO reaction were performed. The update in the scavenging routine resulted in a slightly better representation of ozone compared to the reference simulation. The introduction of the new HNO3-forming channel significantly reduces the overestimation of tropospheric ozone. Therefore, including the new reaction rate could potentially be important for a realistic simulation of tropospheric ozone, although laboratory experiments and other model studies need to confirm this hypothesis and some modifications to the rate, which has a strong dependence on water vapor, might also still be needed. Published by Copernicus Publications on behalf of the European Geosciences Union. 734 M. Righi et al.: Quantitative evaluation of ozone and selected climate parameters in EMAC

Abstract.Four simulations with the ECHAM/MESSy Atmospheric Chemistry (EMAC) model have been evaluated with the Earth System Model Validation Tool (ESMValTool) to identify differences in simulated ozone and selected climate parameters that resulted from (i) different setups of the EMAC model (nudged vs. free-running) and (ii) different boundary conditions (emissions, sea surface temperatures (SSTs) and sea ice concentrations (SICs)).To assess the relative performance of the simulations, quantitative performance metrics are calculated consistently for the climate parameters and ozone.This is important for the interpretation of the evaluation results since biases in climate can impact on biases in chemistry and vice versa.The observational data sets used for the evaluation include ozonesonde and aircraft data, meteorological reanalyses and satellite measurements.The results from a previous EMAC evaluation of a model simulation with nudging towards realistic meteorology in the troposphere have been compared to new simulations with different model setups and updated emission data sets in free-running time slice and nudged quasi chemistrytransport model (QCTM) mode.The latter two configurations are particularly important for chemistry-climate projections and for the quantification of individual sources (e.g., the transport sector) that lead to small chemical perturbations of the climate system, respectively.With the exception of some specific features which are detailed in this study, no large differences that could be related to the different setups (nudged vs. free-running) of the EMAC simulations were found, which offers the possibility to evaluate and improve the overall model with the help of shorter nudged simulations.The main differences between the two setups is a better representation of the tropospheric and stratospheric temperature in the nudged simulations, which also better reproduce stratospheric water vapor concentrations, due to the improved simulation of the temperature in the tropical tropopause layer.Ozone and ozone precursor concentrations, on the other hand, are very similar in the different model setups, if similar boundary conditions are used.Different boundary conditions however lead to relevant differences in the four simulations.Biases which are common to all simulations are the underestimation of the ozone hole and the overestimation of tropospheric column ozone, the latter being significantly reduced when lower lightning emissions of nitrogen oxides are used.To further investigate possible other reasons for such bias, two sensitivity simulations with an updated scavenging routine and the addition of a newly proposed HNO 3 -forming channel of the HO 2 + NO reaction were performed.The update in the scavenging routine resulted in a slightly better representation of ozone compared to the reference simulation.The introduction of the new HNO 3 -forming channel significantly reduces the overestimation of tropospheric ozone.Therefore, including the new reaction rate could potentially be important for a realistic simulation of tropospheric ozone, although laboratory experiments and other model studies need to confirm this hypothesis and some modifications to the rate, which has a strong dependence on water vapor, might also still be needed.

Published by Copernicus Publications on behalf of the European Geosciences Union. 1 Introduction
A correct representation of tropospheric and stratospheric ozone is crucial for reproducing past trends in climate variables (e.g., temperature) as well as for providing reliable projections of the chemistry-climate system in the 21st century.Tropospheric ozone burden has increased by around 30 % between 1850 and 2010 to a level of ∼ 340 Tg (Young et al., 2013), leading to a global mean radiative forcing (RF) of ∼ 0.4 W m −2 (Stevenson et al., 2013).This increase is particularly strong in the Northern Hemisphere (NH) midlatitudes, due to the increased anthropogenic emissions.In the future, tropospheric ozone is projected to change, depending on the emission scenario and in particular the evolution of the ozone precursors nitrogen oxides (NO x = NO + NO 2 ), carbon monoxide (CO), methane (CH 4 ), and nonmethane hydrocarbons (NMHCs).For example, Cionni et al. (2011) found that trends in tropospheric column ozone contribute substantially to total column ozone trends in the 21st century in the four Representative Concentration Pathways (RCP; Moss et al., 2010), mainly because of the difference in methane concentrations and stratospheric input of ozone, which result in a 10 DU (∼ 109 Tg) increase compared to 2000 in RCP8.5 (Eyring et al., 2013a).On the other hand, stratospheric ozone has been subject to a major perturbation since the late 1970s due to anthropogenic emissions of ozone-depleting substances (ODSs), now successfully controlled under the Montreal Protocol and its amendments and adjustments (WMO, 2011).The ozone hole has been identified as the primary driver of changes in Southern Hemisphere (SH) summertime high-latitude surface climate over the past few decades (Thompson andSolomon, 2002, 2005;Thompson et al., 2005).Due to the projected disappearance of the ozone hole during the 21st century, a deceleration of the poleward side of the jet (a decrease in the southern annular mode) is expected (Perlwitz et al., 2008;Son et al., 2008Son et al., , 2010;;SPARC-CCMVal, 2010).In addition, the projected strengthening of the Brewer-Dobson circulation could result into a decrease in tropical ozone and an increase in extratropical ozone in the lower stratosphere, with impacts on RF (Butchart et al., 2006(Butchart et al., , 2010;;Eyring et al., 2007;Shepherd, 2008;SPARC-CCMVal, 2010).Chemistry-climate models (CCMs) or more generally earth system models (ESMs) with interactive chemistry simulate tropospheric and stratospheric ozone as well as the underlying key processes.
Here, we evaluate simulations performed with the ECHAM/MESSy Atmospheric Chemistry (EMAC) model, which is a numerical chemistry and climate simulation system that includes submodels describing tropospheric and middle atmosphere processes and their interaction with oceans, land and human influences (Jöckel et al., 2006).The focus of this study is to assess strengths and weaknesses in the representation of ozone in different setups of the EMAC model, to answer the question whether shorter nudged simulations can be used to evaluate the free-running version of the model, and to detect general biases in EMAC.We compare the conclusions from a previous evaluation of a model simulation in nudged mode that uses a Newtonian relaxation technique in the troposphere (Jöckel et al., 2006;Pozzer et al., 2007) to new simulations with different model setups and emissions data sets in free-running time slice and nudged quasi chemistry-transport model (QCTM; Deckert et al., 2011) mode.The model is driven by prescribed input parameters such as sea surface temperatures (SSTs) and sea ice concentrations (SICs), concentrations of long-lived greenhouse gases, and emissions from anthropogenic sources, biomass burning and natural processes (e.g., volcanic eruptions and lightning).The evaluation of tropospheric ozone is focused on ozone itself and its precursors (NO x , CO and NMHCs).Additionally, an evaluation of basic climate parameters (temperature, winds, geopotential height, specific humidity, and radiation) is performed to assess the different setups of EMAC simulations against each other.
This paper is organized as follows: the model and model simulations are described in Sects. 2 and 3, respectively.An overview of the evaluation diagnostics and performance metrics is given in Sect.4, together with a short description of the ESMValTool.The observational data used for the model evaluation are described in Sect. 5.The results of the evaluation are presented and discussed in Sect.6. Section 7 closes with a summary.pospheric and stratospheric chemistry.The chemical mechanism is integrated in the entire model domain, i.e., consistently from the surface to the stratosphere.It is important to highlight that no arbitrary or artificial intermediate boundary conditions (for instance at the tropopause or between layers) are prescribed.Chemical species are advected according to the algorithm of Lin and Rood (1996), which is part of ECHAM5.The chemical mechanism in the model setup used here consists of gas phase reactions (including ozone tropospheric chemistry, non-methane hydrocarbons up to isoprene and stratospheric chemistry for bromine and chlorine), photolysis reactions and heterogeneous reactions, involving more than 100 species overall.Additional heterogeneous, acid-base and aqueous-phase reactions are included in the submodel SCAV (Tost et al., 2006a).Interactive aerosols are not included in the current setup and are prescribed according to a climatology by Tanre et al. (1994).The convection processes are simulated following the Tiedtke (1989) scheme with the Nordeng (1994) closure, as in ECHAM5 (Roeckner et al., 2006).The radiation calculations take into account prognostic cloud cover, cloud water, cloud ice (from the CLOUD submodel) and prognostic specific humidity.Forcings from radiatively active gases (CO 2 , CH 4 , O 3 , N 2 O, CFCl 3 and CF 2 Cl 2 ) are computed from the corresponding prognostic tracers within the RAD4ALL submodel (RAD in MESSy2).Therefore these constituents are consistently used for the coupling between chemistry and dynamics in both directions via radiative forcing and tracer transport.

Model simulation setups
The four EMAC simulations discussed in this study have the same resolution but differ from each other in their setup.Two nudged, transient simulations (EVAL2 and QCTM) driven by the same meteorology (including SSTs) and emission inventories are compared to two free-running time slice simulations (ACCMIP and TS2000).As a reference, we use the nudged experiment described in Jöckel et al. (2010), which is an update using version 2.41 of MESSy of the S2 experiment discussed by Jöckel et al. (2006) andPozzer et al. (2007).The setup of this simulation (hereafter referred to as EVAL2) is described in Sect.3.1.The other three simulations are performed using MESSy version 1.10.A second nudged experiment (hereafter called QCTM) is run using the so-called QCTM mode, developed by Deckert et al. (2011), and is described in Sect.3.2.Two additional simulations in time slice mode under 2000 conditions are carried out: TS2000, using observed climatological SSTs and SICs, and ACCMIP, using simulated climatological SSTs and SICs (Sects. 3.3 and 3.4,respectively).The basic features of these four simulations are summarized in Table 2.
In the following, the specific features that characterize each EMAC simulation are briefly summarized (see also Table S1 in the Supplement).A more detailed description of the general model setup which applies to all the experiments is provided in the Supplement (Sect.S1).The four simulations were conducted as part of various projects.The specific requirements of each project (e.g., ACCMIP) motivated the different configurations that were applied.

Simulation in nudged mode: EVAL2
This simulation has been previously evaluated by Jöckel et al. (2010).It covers 12 years (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009), with the first year used for spin-up and not considered in the model analysis.Boundary conditions are, as much as possible, taken from observations.It is performed in nudged mode towards observed meteorology, namely to the operational analysis data from the European Centre for Medium-Range Weather Forecast (ECMWF), through the Newtonian relaxation of four prognostic model variables: temperature, divergence, vorticity and the logarithm of surface pressure (van Aalst et al., 2004).SSTs are prescribed from ECMWF operational analysis data as well.The nudging is applied in the spectral representation, well adapted to atmospheric wave phenomena and the spherical geometry.It is important to note that we do not nudge the wave zero (i.e., the global mean) but only wave patterns.With the exception of the logarithm of the surface pressure, the nudging in this method is applied only in the free troposphere, so that stratospheric dynamics is calculated freely and inconsistencies between the boundary layer representation of ECMWF and ECHAM5 models are avoided.The nudging (relaxation e-folding time in parentheses) of temperature (12 h), surface pressure (12 h), divergence (48 h), and vorticity (6 h) is applied between model levels 63 (∼ 97 hPa) and 84 (∼ 706 hPa), with reduced values between level 63 and 71 (∼ 204 hPa), as for the S2 model simulation in Jöckel et al. (2006).Transition zones (intermediate stepwise reduced nudging coefficients) are applied between levels 58 (∼ 62 hPa) and 62 (∼ 89 hPa), between 65 (∼ 116 hPa) and 70 (∼ 185 hPa), and between 85 (∼ 775 hPa) and 87 (∼ 909 hPa).The nudging is not applied throughout the whole model domain, since previous EMAC studies (Jöckel et al., 2006;Lelieveld et al., 2007) showed that a better representation of the stratosphere can be achieved if the nudging is applied up to 100 hPa and not further above.Moreover, we forced the quasi-biennial oscillation (QBO) externally by relaxation (nudging) of the stratospheric equatorial eastward wind to observed equatorial eastward wind profiles (Giorgetta and Bengtsson, 1999).
As this experiment is designed to (approximately) reproduce the meteorology and the atmospheric composition of the individual years, transient (i.e., varying year by year) emission data are used where available.For anthropogenic non-traffic emissions, we use the CMIP5 emission inventory of Lamarque et al. (2010) for the year 2000, which provides fluxes on a 0.5 • ×0.5 • grid.We used this source also for shipping emissions and rescaled the emissions using the scaling factors from Eyring et al. (2010) in order to get a transient set.For the road traffic sector we use the QUANTIFY data set for the year 2000 (Hoor et al., 2009), which has a spatial resolution of 1 • × 1 • .The aviation emissions come from Schmitt and Brunner (1997) and are available for the period 1960-2009, distributed on a 3.7 • × 3.7 • grid.Biomass burning emissions are taken from the GFED v3.1 inventory (van der Werf et al., 2010).These emissions are gridded with a resolution of 0.5 • × 0.5 • , for the period 1997-2009.For NH 3 we also use the EDGAR3.2FTdatabase (van Aardenne et al., 2005).Emission totals for all species in each sector are summarized in Table S2 and compared to the other setups.

Simulation in nudged QCTM mode
The QCTM simulation covers a period of 10 years (1998-2007, 1 year spin-up) and is based on a setup for EMAC (Deckert et al., 2011) in which chemical effects are decoupled from the dynamics (i.e., any feedback from chemistry on dynamics is realized via climatologies of the relevant trace gases instead of on-line coupling).This configuration is particularly useful when analyzing the effect of small chemical perturbations (like the addition of a specific emission source, e.g., shipping) on the climate system.Investigating such effects is usually hampered by the internal variability of the model, which induces very low signal-to-noise ratios and makes extracting a significant signal extremely hard.In the QCTM mode, the meteorological differences between different experiments are eliminated and the signal-to-noise ratio can be significantly increased, thus enabling the study of small perturbations even with a limited number of simulated years.
The QCTM mode is realized by driving the radiation with external climatological fields for the radiatively active gases (CO 2 , CH 4 , O 3 , N 2 O and chlorofluorocarbons). Furthermore, chemical water vapor tendencies are only affected by offline methane oxidation and offline mixing ratios of nitric acid are used to calculate the repartitioning and sedimentation in polar stratospheric clouds.
Like EVAL2, this simulation was carried out to approximate meteorology and atmospheric composition for individual years, therefore it is performed in nudged mode and using transient emissions.We use the same nudging coefficients as for EVAL2.The emission setup is also identical to EVAL2, with the exception of aviation emissions which were taken from QUANTIFY (Hoor et al., 2009), resulting however in a similar globally-integrated amount of emitted NO x .In addition, while using the same lightning NO x parametrization, the resulting total emission was tuned to a lower value in this experiment (see Table S2).

Simulation in free-running mode: TS2000
In contrast to the nudged simulations (EVAL2 and QCTM), the TS2000 simulation is a time slice experiment, performed in free-running mode over a period of 10 years under 2000 conditions.The boundary conditions are similar except that emissions and SSTs are climatological mean data sets representing 2000 conditions, instead of transient data sets.The distributions of SSTs and SICs are prescribed using the HadISST1 data set from the Met Office Hadley Centre (Rayner et al., 2003), containing monthly global fields on a 1 • × 1 • grid and regridded to the model T42 resolution.Here we use a 10-year climatology from 1995 to 2004.The emission setup is similar to the QCTM experiment, but it considers only the year 2000 and uses the CMIP5 data set instead of GFED and QUANTIFY for the biomass burning and the land transport sector, respectively, and instead of EDGAR for the NH 3 emissions.

Simulation in free-running mode: ACCMIP
This time slice simulation was performed in support of the Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP; Lamarque et al., 2013).The simulation is identical to the TS2000 setup, except that slightly different emission inventories were used (see Table S1), in order to conform to the project requirements.This time slice simulation is only one out of the ACCMIP series of experiments, covering the period 1850 to 2100.The corresponding EMAC simulations are evaluated and analyzed in a variety of ACCMIP papers (Fiore et al., 2012;Naik et al., 2013;Silva et al., 2013;Stevenson et al., 2013;Voulgarakis et al., 2013;Young et al., 2013).To allow a consistent use of SSTs/SICs that cover the full period without discontinuities, simulated SSTs/SICs from a long-term climate model simulation were prescribed instead of using observations as in TS2000.Monthly mean SSTs and SICs are prescribed as a 10-year climatological mean around the base year 2000 using the historical CMIP5 experiment carried out with the Centro Euro-Mediterraneo sui Cambiamenti Climatici (CMCC) climate model, which is based on ECHAM5, like EMAC.A comparison of the CMCC SSTs to the climatology from the HAdIIST data for the same period shows significant differences (up to ∼ 2 K) over large areas of the ocean (Fig. S1).Note that because of the too short period, this is not an evaluation of the CMCC SSTs/SICs but rather just documents the differences between the two data sets that are prescribed in the TS2000 and ACCMIP simulations.

Diagnostics, performance metrics and evaluation tool
In order to quantitatively assess and compare the ability of the different EMAC simulations in representing key features of observed climate and chemical composition, basic statistical measures are calculated in addition to the diagnostic plots that provide more detailed insights.For each diagnostic, the root-mean-square difference (RMSD), the overall mean bias, and the Taylor diagram are presented.The RMSD and bias metrics are calculated considering the space-time field (latitude, longitude plus annual cycle) where available, or only the annual cycle otherwise.
Following Gleckler et al. (2008), the RMSD and overall mean bias in the annual cycle of different mean climate parameters at a particular pressure level are calculated within four different domains (global, tropics, NH extratropics and SH extratropics).The results of such quantitative evaluation are presented as portrait diagrams, where the RMSD gives positive values only (due to squaring), whereas the overall mean bias is sensitive to the sign of the deviation, being positive (negative) when the model overestimates (underestimates) the observations.To compare the relative performance of the simulations, the RMSD and bias are normalized by dividing through their multi-model average (see Appendices A1 and A2 for details).
All diagnostics and performance metrics shown in this paper have been implemented into the Earth System Model Validation Tool (ESMValTool).This ensures that the analysis presented in this paper can be applied to other EMAC simulations and other ESMs in a routine manner.The ESMValTool was originally based on the previously-developed CCMVal Diagnostic Tool for chemistry-climate models (Gettelman et al., 2012), but has significantly changed since then, both with respect to its structure and scientific focus.
The ESMValTool is designed to work on model output formatted according to the Climate Model Output Rewriter (CMOR) tables metadata (see http://www2-pcmdi.llnl.gov/cmor).This metadata defines, for example, standard names for variables, units, coordinates names and values, etc.A reformatting routine is implemented in the ESMValTool that converts the original EMAC model output to the format required for the tool.Applying this reformatting routine to new EMAC simulations is straightforward, so that new simulations can be easily compared to the results shown here.The reformatting routine can also serve as an example for converting the output of other ESMs.
The ESMValTool is developed as an international community tool by multiple institutions with the goal to enhance routine benchmarking and evaluation of ESMs.The priority of the effort so far has been to target specific scientific themes focusing on selected essential climate variables (ECVs), tropical variability (e.g., Monsoon), Southern Ocean, continental dry bias and soil hydrology-climate interactions, carbon dioxide (CO 2 ), aerosols and ozone, but the package is being developed in such a way that additional analyses can be easily added.In this way the standard for model evaluation can be built up over time.

Observational data for model evaluation
A variety of different observations are used for the model evaluation.For most variables, we choose a reference and an alternative data set in order to estimate differences and uncertainties in observations.A summary of the main diagnostics applied in this study is given in Table 3, along with the variables, observations, the short names and period/domain for the performance metrics and corresponding references.

Temperature, winds, geopotential height and specific humidity
For global temperature, winds, geopotential height and specific humidity, meteorological reanalyses are the best available reference data.Reanalysis projects provide spatially complete and coherent records of atmospheric variables.
Given the improvement of models, input data and assimilation methods, reanalyses have significantly improved in reliability, cover longer time-periods and have increased in spatial and temporal resolution (Dee et al., 2011).
We use two different reanalysis data sets (ERA-Interim and NCEP/NCAR, see below) for the comparisons to simulated temperature, winds, geopotential height and specific humidity.The differences between the climatologies derived from these fields are an indicator of the uncertainties in the meteorological analyses.ERA-Interim reanalysis is produced by the ECMWF and covers the period from 1979 to present (Dee et al., 2011).All observations used in the reanalysis undergo quality control, selection steps (e.g., to sort out duplicate reports or data that are known to have large errors) and bias corrections (Dee et al., 2011).We therefore consider ERA-Interim as the main reference data set for meteorological fields in this work and analyse the period 1996-2005.
In addition, the NCEP/NCAR reanalysis is applied, which covers the period from 1948 to present (Kalnay et al., 1996).Over the reanalysis period, developments in the observation system took place, particularly when satellite observa-tions became available in the 1970s.Consistently with ERA-Interim, we analyse the period 1996-2005.
For specific humidity, we follow Gleckler et al. ( 2008) and use observations from the Atmospheric Infrared Sounder (AIRS) experiment (Aumann et al., 2003)  Vertical and meridional profiles of climatological zonal mean water vapor volume mixing ratios are compared to measurements taken by the HALogen Occultation Experiment (HALOE) on board of the Upper Atmosphere Research Satellite (UARS), launched in 1991 (Russell et al., 1993).Model climatologies are formed for the period 1991-2002 (Grooß and Russell III, 2005).HALOE data for H 2 O ranges from about 11 to 65 km altitude and cover 80 • S to 80 • N in latitude within one year.For all measured species the accuracy of the HALOE retrievals decreases near the tropopause (Brühl et al., 1996;Harries et al., 1996;Park et al., 1996;Russell et al., 1996) and sparse coverage of the polar regions increases the uncertainty in the HALOE climatologies there.

Radiation
For evaluating radiation fluxes, our primary data set is taken from the Surface Radiation Budget project (SRB; GEWEXnews, 2011) and the alternative data set is taken from the Clouds and the Earth's Radiant Energy System (CERES; Wielicki et al., 1996) experiment.The SRB data set in its current version (3.0) covers the period from July 1983 to December 2007.Here we consider the time range 1995-2005.The data set provides surface and top of the atmosphere (ToA) long-wave and short-wave fluxes derived from a variety of satellite-observed parameters, like cloud parameters, ozone fields and reanalysis meteorology (GEWEXnews, 2011).The CERES experiment products include information about solar and long-wave radiation for the surface and ToA between 2001 and 2012.

Total column ozone
For the evaluation of total column ozone, we use the NIWA combined total column ozone data set over the period 1998-2010 as the reference data set (Bodeker et al., 2005) and the data set GOME-type total ozone -essential climate variable (GTO-ECV), combining data from the satellite sensors GOME, SCIAMACHY and GOME-2, as the alternative for the same period (Loyola and Coldewey-Egbers, 2012;Loyola et al., 2009).The NIWA data set is an assimilated database that combines TOMS (Total ozone mapping spectrometer), GOME and SBUV (Solar backscatter ultra-violet radiometer) data.In order to obtain a global homogeneous data set, ground-based data from the Dobson spectrophotometer network are used, removing differences between the individual input data or filling existing gaps.

Tropospheric ozone
For the evaluation of tropospheric column ozone we use a global climatology based on the Aura ozone monitoring instrument (OMI) and microwave limb sounder (MLS) ozone measurements for the period 2005-2012 (Ziemke et al., 2006(Ziemke et al., , 2011)).The MLS/OMI gridded ozone climatology data are made available to the scientific community via the NASA Goddard Space Flight Center ozone and air quality web-page (http://ozoneaq.gsfc.nasa.gov/).
For the comparison of ozone vertical profiles in the troposphere, we use a recently updated global climatology by Tilmes et al. (2012), based on ozone soundings over the last 15 years and focusing on the troposphere and the lower stratosphere.This is an important extension to the Logan (1999) climatology, since it covers the more recent years included in the simulated period of the experiments evaluated here.Vertical ozone profiles for 41 stations around the globe have been compiled and averaged for the years 1980-2009.The climatology provides information about the median and the width of the ozone probability distribution function, as well as the interannual variability of ozone between 1995 and 2009, in pressure-and tropopause-referenced altitudes.In addition to single stations, regional aggregates are included, combining stations with similar ozone characteristics.We use these regional aggregates for model evaluation and focus on the 1995-2009 time period, corresponding to the simulated period of our experiments.
In addition, we use ozone data from a collection of aircraft campaigns (Emmons et al., 2000).These data are particularly valuable because they include additional species, measured at the same location and time of ozone, allowing a more detailed analysis on ozone precursor species.These data are provided as global distribution and vertical profiles and were validated against ozonesondes and measurements on board commercial aircraft.The ozone data cover only selected regions of the Earth and time periods vary for each region.The use of aircraft data for model evaluation might have some limitations, due to the fact that model and observations are not always temporally co-located.This could imply, for example, that observations taken in the vicinity of strong emission sources (as biomass burning) could be affected by large temporal variability and indicate large biases when compared to model simulations.

Ozone precursors
For the evaluation of ozone precursors, we use the Emmons et al. (2000) data set, which provides information about a variety of species, including CH 4 , CO, NO x and NMHCs.
For the evaluation of CO, we additionally use the observational data from the NOAA GLOBALVIEW data set (4th annual update, GLOBALVIEW-CO2, 2010), over the 1999-2008 period.This data set is provided by the Cooperative Atmospheric Data Integration Project for carbon monoxide which is coordinated by NOAA (National Oceanic and Atmospheric Administration), ESRL (Earth System Research Laboratory) and GMD (Global Monitoring Division).The goal of the GLOBALVIEW initiative was to get data products with a large spatial and temporal resolution to support carbon cycle modeling studies based on measurements from land-surface, ship, aircraft, and tower observations.The processing includes smoothing, interpolation and extrapolation following Masarie and Tans (1995), resulting in an extended record.
6 Results and discussion of model evaluation

Basic climate parameters
In the following subsections, we first evaluate how well the mean climate state in selected basic climate variables such as temperature, eastward and northward wind, geopotential height, specific humidity and radiation is represented in the four simulations.In the choice of the tropospheric diagnostics and performance metrics we closely follow those that were applied by Gleckler et al. (2008), with periods changed to represent 2000 conditions.Since the EVAL2 and the QCTM simulations are both nudged by meteorological reanalysis, a generally better agreement with meteorological reanalyses compared to the free-running time slice experiments (TS2000 and ACCMIP) can be expected.However, differences could still occur, in particular in regions where the nudging parameters are small, i.e., outside the main nudging interval, which is between ∼ 97 hPa and ∼ 706 hPa (see Sect. 3.1).
For the calculation of the eastward and northward wind components, a 10 % correction to the original EMAC output has been applied here, to account for a recently reported error in the output of the horizontal wind components.This error affects only the way the output is written and not the actual model performance and internal consistency (see Appendix B for more details).

Temperature
Temperature (ta) is evaluated by investigating the climatological mean annual cycle at the four selected pressure levels 850, 200, 30 and 5 hPa (Fig. 1) and the annual mean zonally averaged temperature differences between each EMAC simulations and the reference data set (ERA-Interim, Fig. 2) and the alternative data set (NCEP).
The annual cycle is in general well reproduced by all simulations at all levels and in all regions, with the exception of the 200 hPa level in the tropics.At 850 hPa, all EMAC simulations are in good agreement with ERA-Interim and NCEP/NCAR and lie generally within the interannual variability of the meteorological reanalyses, with the exception of ACCMIP which shows a positive bias (∼ 1 K) in the tropical NH summer months (JJA).Such overestimation can be explained by the positive bias of the tropical SSTs in the prescribed data set when compared to HadISST1 (see Fig. S1).For the ACCMIP simulation the prescribed SST data set is taken from a historical simulation with the CMCC climate model (see Sect. 3.4).Both the meteorological reanalyses and the model simulations are characterized by a very small interannual variability at this level (Fig. 1).
At 200 hPa, all EMAC simulations have a cold bias of around 5 K in all regions compared to the meteorological reanalyses and are well outside the interannual variability.This bias is particularly pronounced in the tropics in the two nudged simulations, whereas in the extratropics of both hemispheres the nudged simulations are in slightly better agreement with ERA-Interim than the free-running time slice simulations.Note that such bias can not be due to differences between ERA-Interim and the ECMWF data used to nudge the EVAL2 and QCTM experiments.As shown by Dee et al. (2011), the difference between the rms forecast error produced by ERA-Interim and the ECMWF forecasting system that was operational in 1989 is only about 0.2 K at 200 hPa.It is also important to recall that we did not nudge the global mean temperature but only patterns (see Sect. 3.1).
Stratospheric temperatures at 30 and 5 hPa (Fig. 1, lower rows) are within one standard deviation of ERA-Interim in the extratropics in all simulations, with the exception of the summer months in the NH.In the tropics, a cold bias of around 2 K is simulated.At 5 hPa in the tropics, ACCMIP and TS2000 show a better agreement with the observations than the other experiments.In general, temperature is much better simulated in the lower troposphere, where the simulated deviations from ERA-Interim are of similar magnitude than the differences between the two reanalysis data sets, which are anyway small and suggest therefore low uncertainties in the reference and alternative data set.It is also interesting to note that the QCTM simulation has a global average temperature at 30 hPa that is quite different from the other simulations.Since the QCTM experiment uses prescribed ozone and water vapor for the model radiation, this might be a sign of the impacts of the interactions between chemistry and radiation.
The above mentioned biases are also visible in the zonallyaveraged temperature profiles in Fig. 2. EMAC simulates the common features of the temperature distribution, characterized by high temperatures at ground levels in the tropics, by a decrease of temperature with altitude and towards the poles and by a further increase with altitude towards the tropopause, reasonably well (within ∼ 1-2 K in most parts of the simulated domain).
A warm bias can be identified in the polar SH stratosphere (50-100 hPa) in the free-running experiments and is particularly strong in TS2000.This is related to a too weak representation of the polar vortex and an underestimation of the ozone hole, which are both particularly prominent in the TS2000 simulation (see further discussion in Sect.6.2.1).In addition to the annual mean, the seasonal mean temperatures for this simulation are shown in Fig. S2, confirming that this warm bias is mainly present in the JJA and SON seasons, coinciding with the polar vortex and the ozone hole.
All experiments are characterized by a cold bias in the extratropical lower stratosphere.This feature is common to many of the CMIP3 and CCMVal models (IPCC, 2007;SPARC-CCMVal, 2010) and is related to the wet bias (an overestimation of the water vapor concentrations) that affects all four EMAC simulations.This wet bias is shown in Fig. 3, which displays the annual cycle of water vapor in the EMAC simulations compared to HALOE data at 200 hPa in the SH extratropics.Water vapor is a greenhouse gas and therefore absorbs and emits infrared radiation.In the stratosphere, the emission of infrared radiation into space is larger than the absorption of upwelling infrared radiation from the troposphere.This causes a net cooling effect.Overall, too high concentrations of water vapor in the extratropical lower stratosphere lead to too high infrared radiative cooling, which results in too low temperatures.This relation between the cold bias and the wet bias in the extratropical lower stratosphere has been shown in previous studies, for example in Stenke et al. (2008) for the ECHAM4.L39(DLR) E39 model.We shall note, however, that HALOE is believed to be biased low in these regions (see, e.g., Hegglin et al., 2013).Temper- ature biases are also evident above the tropopause in the tropics.This bias was already examined by Jöckel et al. (2006), who related it to a slightly too strong Brewer-Dobson circulation in the EMAC model, indicating deficiencies related to the wave forcing and adiabatic cooling/warming rates.
The temperature of the tropical tropopause layer is an important aspect of model representation since it has strong implications for the water vapor distribution in the stratosphere.The lower-stratospheric water vapor mixing ratios are generally a function of the model temperature near the tropical tropopause at 100 hPa (Gettelman et al., 2009), because low temperatures at the tropical tropopause cause condensation, dry the air and therefore less water vapour enters the stratosphere.This is similar to the behavior in the four EMAC simulations, where smaller biases in temperatures compared to ERA-Interim at 100 hPa (EVAL2 and QCTM) relate to smaller biases in water vapor at this level compared to HALOE observations (Fig. 4).The nudged simulations EVAL2 and QCTM represent the simulated annual cycle and absolute temperature values at 100 hPa (which is the upper limit at which nudging is applied) remarkably well compared to ERA-Interim, while TS2000 and ACCMIP show about 1 to 4 K lower values and a reasonable annual cycle.Correspondingly, the water vapor at 100 hPa is close to HALOE in the nudged simulations (within the 1σ interannual variability, except in September and October) and is lower than HALOE in the two free-running simulations throughout most of the year.The phase of the annual water vapor cycle in the tropics at 100 hPa is well captured by all model simulations, but as for temperature, its amplitude is slightly lower than ERA-Interim for the free-running simulations TS2000 and ACCMIP (Fig. 4).
The relative performance of the four simulations in reproducing temperature at the four pressure levels (850, 200, 30 and 5 hPa) and in the four domains (global, tropics, NH and SH extratropics) is summarized by the portrait diagrams in Fig. 5 (root-mean-square difference) and Fig. 6 (overall mean bias).In general, nudged simulations (EVAL2 and QCTM) perform slightly better than the free-running ones (TS2000 and ACCMIP) in the lower levels, where the nudging is indeed stronger.The performance of the four experiments is nevertheless quite similar.The model performance with respect to the two meteorological reanalyses considered for the temperature (lower and upper triangles in the portrait diagrams) is comparable, although there are some noticeable differences (especially near the tropical tropopause, see Fig. 2), revealing that uncertainties exist in the reanalyses as well.The results of the Taylor diagram (Fig. 7, first row) show a good representation of the temperature by all model experiments.Most points lie above a correlation R = 0.9, indicating that the temperature pattern is very well captured, and deviations from the observational reference point (marked with REF on the x axis) are mostly small.Most of points lie also very close the dashed arc corresponding to a normalized standard deviation equal to 1, which indicates a good match of the pattern variations between models and reanalysis data.A slightly worse performance is attained by  3), respectively.Where an alternative data set is available, the diagram boxes are split in two parts, showing the model performance compared to the primary (lower triangle) and alternative (upper triangle) data set.Where no observations are available, the triangles are marked white.the two free-running experiments at the 200 hPa level, with correlation values around 0.7-0.8,larger deviations from the reference point and discrepancies in the normalized standard deviation values.In general, the global domain and the extratropical regions are better reproduced than the tropics.The points corresponding to NCEP agree well with ERA-Interim in terms of correlation and pattern variations, but show some slight deviations from the REF point.This again suggests the existence of uncertainties in the meteorological reanalyses, which, analogously to EMAC, are largest in the tropics.

Eastward wind
The eastward wind (ua) as simulated by EMAC is in good agreement with both reanalysis data sets at 850 hPa in the tropics and extratropics, where all simulations reproduce the annual cycle pretty well (Fig. 8).As expected, the nudged simulations (EVAL2 and QCTM) perform better at this level.
The agreement is still good at 200 hPa, with the nudged simulations performing better than the free-running in the tropics.TS2000 and ACCMIP, on the other hand, slightly  overestimate the eastward wind by about 3-4 m s −1 in this region.All simulations reproduce the annual cycle quite precisely at this level.
In the stratosphere, where the nudging is much weaker, all the simulations show a similar behavior, and no significant improvement is obtained from the nudged simulations with respect to the free-running ones.On the contrary, the QCTM simulation has some problems in reproducing the annual cycle in the tropics in particular at the 5 hPa level.The other simulations reproduce the annual cycle quite well and are within the interannual variability of the observations.In the extratropics, a small negative bias is found in winter for all simulations, in particular at 30 hPa.
Figures 9 and S3 show the difference plots of the seasonal mean of the eastward wind in DJF and JJA, respectively.A generally good agreement between the EMAC simulations and ERA-Interim is simulated, and especially the summertime stratospheric easterlies are well represented in all simulations.Some weaknesses are found however in the simulations of westerlies.In DJF (Fig. 9), the subtropical jet is underestimated at about 60 • S in the free-running simulations (TS2000 and ACCMIP), while the nudged simulations capture the jet.On the other hand, the nudged simulations underestimate the polar night jet in the northern polar regions.Such underestimation might be related to a weak representation of the polar vortex in the NH.The temperature profiles for DJF (not shown) for the nudged simulations show indeed a warm bias in this specific region, which might be an indication for a too weak polar vortex.In JJA (Fig. S3), the westwind jet at 60 • S is underestimated by the free-running simulations throughout the entire atmosphere, while the nudged simulations underestimate westerlies in the stratosphere.The underestimation of the west wind jets in the free-running simulations is an indication of an underestimation of the polar vortex.This is also supported by the warm bias in the seasonal mean of the temperature in this region discussed in Sect.6.1.1 and shown in Fig. S2.
The better performance of the nudged simulations with respect to the free-running simulations in the lower troposphere (850 hPa) is revealed by the portrait diagrams (Figs. 5 and 6).The eastward wind is generally underestimated in the extratropics and in the global domain (with the notable exception of ACCMIP at 200 hPa), whereas it is overestimated in the tropics, especially in the stratosphere.As found for the temperature, there are differences in the model performance with respect to the two meteorological reanalyses considered for the evaluation, which reveals potential uncertainties in the observational data sets, particularly in the tropics.These considerations are further supported by the Taylor diagram (Fig. 7), which shows an excellent representation of the eastward wind globally and in the extratropics by all model simulations.In the tropical domain, on the other hand, variations in the phase and amplitude are significantly larger.

Northward wind, geopotential height and specific humidity
Northward wind, geopotential height and specific humidity are evaluated mainly to assess whether there are some serious limitations in the representation of the mean climate by the model and only discussed briefly.
The northward wind (va) at the four selected levels (850, 200, 30, and 5 hPa) mostly lies within the interannual variability of the ERA-Interim reanalysis, with differences between ERA-Interim and NCEP being in the same order or larger than differences to the model simulations (Fig. S4).The annual mean zonally averaged plot (Fig. S5) show that, in general, the major features are well reproduced by all model setups.The portrait diagrams (Figs. 5 and 6) further confirm the expected, generally better, performance of the nudged simulations compared to the free running ones.In the overall mean bias diagram, northward winds are found to be either overestimated or underestimated depending on the considered observational data sets.
The comparison of simulated geopotential height (zg) with observations shows a generally good agreement (see Figs. S6 and S7), with relative differences of the order of a few per cent.The annual cycle is mostly captured.Differences of the same order, however, can also be found when comparing ERA-Interim with NCEP data, revealing some uncertainties in the meteorological reanalyses as well.
The annual cycle of the specific humidity (hus) is mostly captured by the EMAC simulations (Fig. S8), with the exception of the tropical domain, in particular at the 30 hPa level.Following Gleckler et al. (2008), instead of the 200 hPa level we consider 400 hPa, since this is more significant for the evaluation of specific humidity in the troposphere.In the extratropical troposphere, the annual cycle shows a clear maximum in the summer months, following the change in incoming solar radiation during the year, which affects temperature (see Fig. 1) and consequently the amount of water vapor that the air can hold.In the tropics, on the other hand, the annual cycle shows a much smaller variation with time, since in this region the change in incoming radiation during the year is much less pronounced.The nudged simulations, which are driven by ECMWF operational analysis data, are generally closer to ERA-Interim than to AIRS data, while the freerunning simulations simulate monthly mean values closer to the AIRS data in the lower troposphere.The general pattern of the specific humidity profile climatology (Fig. S9) is characterized by a maximum over the equator at the surface, decreasing with latitude and altitude, and is well reproduced by all simulations.

Radiation
Climatological mean maps of outgoing long-wave clearsky radiation at the ToA (rlutcs) are shown in Fig. S10, compared with SRB and CERES.The observational data (Fig. S10, upper row, left) displays its highest values in the tropics (about 300 W m −2 ) and two clear minima over the poles (around 150 W m −2 at the South and 200 W m −2 at the North).The EMAC simulations capture these features as can be seen in the differences plots (Fig. S10).Compared to SRB, variations smaller than 20 W m −2 are found everywhere on the globe, with a clear overestimation over the South polar regions (about 10 W m −2 , 5-10 %), which is stronger in the free-running simulations.The other parts of the globe show a general underestimation (maximum biases of about 30 W m −2 , 10-20 %) which is stronger in the ACCMIP simulation.A similar difference pattern results from the comparison between EMAC and CERES (not shown).
The outgoing long-wave all-sky radiation at the ToA (rlut) is compared again to SRB and CERES (Fig. S11).The observations show a maximum value over the tropics (250-300 W m −2 ) and two extended minima over the polar regions (about 150 W m −2 for the South and 200 W m −2 for the North).In general, the radiation values are lower than for clear-sky conditions (Fig. S10), as expected due to the presence of clouds.All EMAC simulations show a similar pattern of deviations compared to SRB, with the free-running experiments characterized by the largest differences (about 20-30 W m −2 ).Biases of about 10-20 W m −2 in the tropics were also found for the CMIP3 models when compared to ERBE data (IPCC, 2007), although some had very large deviations (up to about 50 W m −2 ).The larger bias in the free running simulations could be due to the fact that cloud and convective parameters have been optimized for the free running mode (see, e.g., Mauritsen et al., 2012) and applied also for the EVAL2 and QCTM.If nudging systematically alters the cloud properties, the radiative balance will be altered as well.The two free-running experiments are indeed characterized by a similar globally-averaged cloud cover (64 %) which is higher than in EVAL2 (57 %) and QCTM (60 %).
Another important quantity for the evaluation of the radiation budget is the reflected short-wave all-sky radiation (rsut, Fig. S12).The net short-wave radiation is primarily deter- mined by solar incoming radiation and by the presence of clouds.The general pattern is therefore a combination of the variation of incoming solar radiation with latitude/season and of cloud cover.The EMAC simulations reproduce this pattern well.The observations show their highest values (around 120-150 W m −2 ) over regions of high surface albedo or significant cloud cover (deserts, snow covered areas, Himalaya and Sahara), while the strongly absorbing ocean surface is characterized by lower values (60-80 W m −2 ).The comparison of EMAC simulations with SRB and CERES data shows a positive bias at mid-latitudes and in polar regions, with the highest deviations (30-40 W m −2 , 10-20 %) in northern higher latitudes (Alaska, North-East Russia), which are particularly present in the EVAL2 simulation.Negative biases are found in the tropics and subtropics, up to about 20-30 W m −2 (20-30 %) in the Intertropical Convergence Zone.This pattern is consistent with the general tendency of EMAC to underestimate low cloud fraction in the tropics and to overestimate it in the extratropics in comparison with ISCCP satellite data (Räisänen and Järvinen, 2010).These results are summarized in the performance metrics plots (Figs. 5 and 6).

Ozone and ozone precursors
In this paper we focus on tropospheric ozone, and consider the stratosphere only in the context of total column ozone.Biases in tropospheric ozone found in all four EMAC simulations led to two additional simulations (ACCMIP-S1 and ACCMIP-S2) to explore related model uncertainties.These two simulations are included in the figures, but discussed separately in Sect.6.2.5.

Total column ozone
Zonal mean total column ozone (toz) climatologies from the different EMAC simulations are compared to the NIWA assimilated data and to GTO-ECV satellite observations in Figs. 10 and S13.The well-known features of highest column ozone values in NH spring, low ozone values in the tropics,  3), respectively.Where an alternative data set is available, the diagram boxes are split in two parts, showing the model performance compared to the primary (lower triangle) and alternative (upper triangle) data set.Where no observations are available, the triangles are marked white.
with a small seasonal cycle, a column-ozone maximum in the mid-latitudes of the SH in late winter/early spring and the ozone hole above the Antarctic are well represented the EMAC simulations, but significant quantitative differences compared to observations do exist.The ozone hole is underestimated in all EMAC simulations, in particular in TS2000, where the ozone hole is only marginally present and underestimated by around 75-100 DU.In NH winter, EMAC simulations overestimate column ozone in the high latitudes by about 50-100 DU compared to NIWA observations, but differences of about 30-40 DU also exist between the two observational data sets, with GTO-ECV showing higher values in this region.At about 50-60 • S, the mid-latitude maximum in total column ozone in autumn is produced by all EMAC simulations, but is more pronounced than in the NIWA and GTO-ECV observations: This positive bias ranges between 47 (EVAL2) and 59 DU (TS2000) compared to NIWA, and between 49 and 61 DU compared to GTO-ECV.In the tropics the EMAC simulations show good agreement with NIWA and GTO-ECV observations.The above features are also reflected in the zonal mean total ozone values for the different seasons and the annual mean (Fig. S13).The differences in the representation of the ozone hole among the four simulations are not statistically significant (to a 95 % confidence level, not shown).
Stratospheric ozone is mainly affected by emissions from long-lived species (CO 2 , CH 4 , N 2 O, chlorofluorocarbons, hydrochlorofluorocarbons, halons, and H 2 ) which are prescribed from the Advanced Global Atmospheric Gases Experiment (AGAGE; Prinn et al., 2000) observations as lower boundary conditions in all four simulations.Differences in emissions affecting tropospheric ozone contribute to the dif- ferences in total column ozone between ACCMIP vs. the other three simulations.Despite different emissions and different dynamics, total column ozone is generally biased high in all four EMAC simulations.This is evident also in Fig. 11 (lower left panel).The reason for this bias will be investigated in follow-up studies, since this paper focuses on tropospheric ozone (see discussion in the following sections).The correlation on the other hand is above R = 0.8-0.9except in the SH polar region (see Fig. 12, upper left panel), indicating that the pattern is very well captured, and deviations from the observational reference point (marked with REF on the x axis) are mostly small.

Tropospheric column ozone
The geographical pattern and annual cycle of tropospheric column ozone (toztrop) from the EMAC simulations is compared to MLS/OMI measurements on board the Aura satellite in Figs. 13 and 14, respectively.All EMAC simulations tend to overestimate tropospheric column ozone, in particular in the NH mid-latitudes, with deviations around 10-20 DU.This is evident also in the near-global mean values given at the top right of each panel in Fig. 13 and in the overall mean bias metric (Fig. 11, lower left panel).It should be noted that Ziemke et al. (2011) reported root-mean-square uncertainties of about 5 DU in local measurements of total column ozone from OMI/MLS using ozonesondes as reference.They interpreted differences of 10 DU and higher as significant, while smaller values were essentially considered at noise level.It should also be noted that the calculation of tropospheric column ozone is sensitive to the tropopause height in the observations and in the model.In the MLS/OMI data set, the vertically integrated MLS ozone profiles are subtracted from OMI total column ozone to derive the tropospheric column (Ziemke et al., 2011).The tropopause pressure separates tropospheric from stratospheric column ozone and is taken from NCEP using the WMO tropopause definition as in the EMAC simulations.Different temperatures in the EMAC simulations will shift the tropopause with respect to NCEP.If the tropopause is shifted towards too high (low) altitudes, this results in an overestimation (underestimation) of tropospheric column ozone.The tropospheric ozone column in EMAC is particularly sensitive to the tropopause definition, which could explain some of the differences between the observations and the EMAC simulations (see, e.g., Table 3 in Stevenson et al., 2013, although this refers to the changes in tropospheric column ozone with respect to preindustrial times and not to absolute values).However, the high bias of tropospheric ozone column in EMAC-ACCMIP is also confirmed by a comparison to other ACCMIP models, using a different tropopause definition (see Table 3 et al., 2013).The EMAC ACCMIP simulation has one of the highest tropospheric ozone burdens of all models in the ACCMIP-Hist2000 simulations (see Table 1 and Fig. 2a in Young et al., 2013).

in Young
The near-global mean in EMAC EVAL2 (36.7 DU) is equally high than the one in ACCMIP (36.1 DU), and tropospheric column ozone is still too high in TS2000 (33.6 DU) compared to the MLS-OMI data set (28.5 DU).However, the bias significantly reduces in the QCTM simulation (29.6 DU).QCTM and EVAL2 are both nudged towards the same dynamics and do not differ significantly in their chemistry schemes.As noted in Sect.3.2, the emissions setup in QCTM is identical to EVAL2 except for the aviation inventories, which however have only slight differences in the total emissions (see Table S2).The differences in tropospheric ozone therefore most likely stem from lightning emissions.While both simulations use the Price and Rind (1994) parametrization, they use different scaling factors aiming at a total value of 11.0 Tg NO yr −1 for EVAL2 and 3.8 Tg NO yr −1 for QCTM.The latter value is close to the lower limit of the estimated range from observations (4.3-17.1 Tg NO yr −1 ; Schumann and Huntrieser, 2007).This explains the differences in NO x between the two simulations (see also Sect. 5.2.4)  spheric ozone.For the configurations compared here, a lower NO x emission from lightning results in a better representation of tropospheric column ozone.TS2000 and ACCMIP use a different lightning parameterization (Grewe et al., 2001), resulting in about 10.7 and 12.4 Tg NO yr −1 lightning emissions, respectively.Aviation emissions of NO x , on the other hand, are quite similar among the four simulations, in the range 1.4 to 2.0 Tg NO yr −1 for EVAL2 (transient) and 1.8 Tg NO yr −1 for the others, therefore they cannot explain the differences in tropospheric column ozone.
In agreement with observations, lower values are simulated in the tropics and in the SH compared to NH midlatitudes.However, significant differences in the pattern are simulated, with correlation values around R = 0.85 or lower (Fig. 12, bottom-left panel).The local maximum between Africa and South America, a region affected by biomass burning emission, is reproduced in all simulations although it is slightly underestimated by the QCTM simulation and overestimated by all other.
The annual cycle (Fig. 14) is overall well reproduced by the EMAC simulations, showing two distinct maxima during spring in the SH and during spring/summer in the NH.This seasonal increase in tropospheric column ozone is due to an increase of photo-chemical production and stratospheretroposphere exchange (de Laat et al., 2005;Ziemke et al., 2006).It varies among the EMAC simulations also because of the difference in emissions.Furthermore, single year emissions in the time slice model simulations (TS2000 and ACCMIP) compared to transient emissions in the nudged EVAL2 and QCTM simulations lead to some differences in emission totals of ozone precursors (see Table S2), with subsequent impacts on tropospheric ozone formation.

Vertical ozone profiles
Similar to Fig. 6 in Young et al. (2013), Fig. 15 compares EMAC to ozonesonde data from Tilmes et al. (2012) in three regions (tropics, NH and SH extratropics) and at three altitude levels (250, 500, and 700 hPa).The tropical region is represented by 9 stations, NH and SH extratropics by 22 and 5 stations, respectively (the geographical location of the stations is depicted in Fig. S14).The annual cycle in ozone is pretty well reproduced by the model for all the regions and levels, except for the tropics at 500 and 250 hPa.The comparison shows that the high bias in tropospheric column ozone in the ACCMIP simulation that was identified in the previous section stems mainly from the 250 and 500 hPa level in the tropics, whereas at 700 hPa and in the NH and SH extratropics the agreement with the ozonesonde data is good.This is similar for the EVAL2 and TS2000 simulation, but the QCTM simulation actually shows a small but negative bias in the tropics at the two levels.As discussed in Sect.6.2.2, the difference in the simulation can likely be attributed to the difference in lightning NO x emissions.
Simulated vertical profiles of ozone are also compared to in situ measurements from aircraft campaigns, which have been mapped onto a 5 • × 5 • grid by Emmons et al. (2000), with additional data from more recent campaigns (http://gctm.acd.ucar.edu/data).For the present analysis, a subset of campaigns as selected by Pozzer et al. ( 2007) was chosen.The same time of the year and the same regions as in the campaigns were sampled in the simulations.However, the actual flight tracks and measurement time of the day were not considered in sampling the simulation output.Furthermore, simulations and measurements may be from different years.Even though the sampling methodology of simulation and in situ data already implies some averaging, we do not expect exact matches between individual trace gas profiles.Nevertheless, there is a very good overall agreement, with the model results mostly within 90 % interval of the observational data (Fig. 16).All four EMAC simulations yield similar ozone profiles over the different locations, with EVAL2 generally producing the highest ozone mixing ratios and QCTM the lowest, as in the above comparison.Ozone precursor emissions vary substantially from year to year and the time periods between the EMAC simulations and the observations are not always the same.This could ex- plain some of the disagreement between model and observations and indeed the three campaigns where the model performance appears to be not very good (Fiji, S-Atlantic and E-Brazil-Coast) have been conducted from 1992 to 1996, about a decade before the period simulated in the EMAC simulations considered here.Furthermore, these regions are quite sensitive to biomass burning emissions, which can vary quite strongly (van der Werf et al., 2008).

Ozone precursors
Similar to ozone, simulated vertical profiles of ozone precursors are compared to in situ measurements for aircraft campaigns by Emmons et al. (2000).Campaigns closest to those used in the ozone evaluation are shown, if ozone precursor data are not available for a certain campaign.Nitrogen oxides serve as catalyst in the photochemical cycles relevant for the production and destruction of tropospheric ozone.Ozone production depends non-linearly on NO x concentrations, but higher NO x concentrations mostly result in higher ozone mixing ratios in the troposphere.At very high NO x concentrations, ozone production becomes less efficient, because it is then limited by the abundance of NMHCs (Fowler et al., 2008).The vertical profiles simulated by EMAC show a similar shape as the observational data, and lie within the 90 % observational interval in most cases (Fig. 17).The spread among the mixing ratios simulated by the four EMAC simulations is very small in the lower and middle troposphere (up to about 7 km), whereas there are larger differences (of the order of 100 pmol mol −1 ) among the simulations in the upper troposphere which could be related to lightning emissions, as discussed before.TS2000 and ACCMIP usually simulate the highest mixing ratios.The higher NO x emissions of EVAL2 in comparison to QCTM are consistent with the results for ozone mixing ratios, given that NO x is one of the main substances increasing ozone production via photochemical reactions in the troposphere.
The hydroxyl radical (OH) is another important species in the photochemistry of ozone, as the HO x catalytic cycle is coupled to the NO x cycle.However, the hydroxyl radical is a very short-lived species and direct observations of OH are very sensitive to local small-scale conditions, limiting the informative value of comparisons with coarse resolution simulation data.Furthermore, estimates of global mean OH concentration are not very well constrained (Lawrence et al., 2001;Gottschaldt et al., 2013;Naik et al., 2013).As an indi-cator of tropospheric oxidation capacity, we analyse the tropospheric lifetime of methane and methylchloroform (MCF) following the method by Lawrence et al. (2001).Methane and MCF lifetimes are calculated with respect to the reactions and using the reaction rate coefficients from Atkinson (2003) and Sander et al. (2003), respectively.The results are summarized in Table 4: the four simulations show very similar values for the methane lifetime, in the range 7.9-9.1 years, and for the MCF lifetime, 4.8-5.5 years.
Another indirect indicator for tropospheric oxidation capacity is carbon monoxide (CO), which is tightly coupled to OH: The reaction between CO and OH (CO + OH → CO 2 +H) in the troposphere constitutes a sink of 90-95 % for CO and of about 41 % for OH (von Kuhlmann et al., 2003) with more CO generally leading to smaller OH mixing ratios in the troposphere.The annual cycle of CO mixing ratio is compared to the NOAA GLOBALVIEW observations in various locations (Fig. 18).The ACCMIP simulation shows always a higher CO mixing ratio with respect to the other simulations, because of the large contribution from biomass burning and traffic sources (Table S2).The annual cycle is reasonably well reproduced by all model simulations at all 9 locations considered here, although a general underestimation of CO mixing ratios by the model is clearly visible.As reported by several other studies (Shindell et al., 2006(Shindell et al., , 2008;;Monks et al., 2014), models tend to underestimate tropospheric CO, especially in the NH winter and spring, although the reason for this bias is not fully understood.We further evaluate CO by comparison with the vertical profiles from Emmons et al. (2000) in Fig. 19.Again, we find CO mixing ratios to be too low in all simulations, often outside the uncertainty ranges, with deviations of about of 50-100 nmol mol −1 in the lower troposphere.Also in this case, ACCMIP has higher total CO emissions than the other simulations.Remarkably, the differences between the simulations are negligible compared to the standard deviation from the observations.In more polluted regions like China, the model simulations deviate more from the observations, especially in the lower and middle troposphere, while they improve in the upper levels, where the effect of emissions is much smaller.A similar problem was pointed also by Pozzer et al. (2007), who concluded that this is probably due to underestimated fossil-fuel emissions in this region.The simulations evaluated here still underestimate CO mixing ratios, although they have higher emissions from anthropogenic sources than in Pozzer et al. (2007).Also the ACCMIP simulation, which has twice as high CO emissions from traffic sources, does not perform better than the other simulations in this region.The representation of CO in South East Asia is a long-standing problem in many model simulations and will require more extensive analysis in the future.However, as mentioned in Sect.5.4, we stress again that the use of aircraft data to evaluate model simulation for specific years might be affected by limitations due to the fact that aircraft data climatologies often cover time periods which do not correspond to those simulated by the model.
Vertical profiles of CH 4 mixing ratios in six selected regions (not shown) hardly reveal any disagreement among the EMAC simulations.This is not surprising, since the CH 4 lower boundary conditions are prescribed from the same observed data (AGAGE) in all the EMAC simulations.
Non-methane hydrocarbons (NMHCs) also affect ozone chemistry through a large number of complex reactions.Several species of this family (ethylene (C 2 H 4 ), ethane (C 2 H 6 ), propene (C 3 H 6 ), propane (C 3 H 8 ) and acetone (CH 3 COCH 3 )) are compared to the observational data of Emmons et al. (2000) in the Supplement (Figs.S15-S19).A reasonable agreement is found only for some NMHCs in a few locations.In general, all model simulations have problems in reproducing the NMHCs.Discrepancies between model and observations cannot always be attributed to emis-sions, as indicated by vertical profiles in remote regions or by model simulations with similar emission totals that lead to different results.The geographical distribution of the emissions might influence the representation of these species in the model simulations.Another issue is the speciation fraction adopted for the different NMHC compounds.Emission data sets usually provide total NMHC emissions, which have then to be speciated into individual components, consistently with the chemical mechanism of the model.Here we adopt the speciation fractions by von Kuhlmann et al. (2003).An underestimate of ethane (Fig. S16) gainst surface data have been also shown by Emmons et al. (2014) in the NH for several models.

Sensitivity simulations
The high bias in tropospheric column ozone identified in particular in the ACCMIP simulation motivated two additional sensitivity simulations to explore related model uncertainties.Both are identical to the ACCMIP simulation and cover the same time period (10 years under 2000 conditions), except for a code modification in the EMAC scavenging submodel SCAV (ACCMIP-S1), and an additional modification in the chemical mechanism (ACCMIP-S2).
The SCAV modification avoids the use of unrealistically high convective liquid and ice water contents for scavenging, which is expected to result in reduced uptake and less subsequent removal of nitric acid, particularly in the tropical upper troposphere/lower stratosphere (UTLS).The ACCMIP-S1 simulation serves two purposes: (1) comparing to the otherwise identical ACCMIP simulation, in order to estimate the uncertainty imposed by the reduced uptake on the results in all other simulations; (2) as a reference for the sensitivity simulation ACCMIP-S2, which is also performed with the updated scavenging code.The code modification for ACCMIP-S1 results in less and more realistic convective cloud water and cloud ice concentrations, and consequently less scavenging of HNO 3 and other species.Less scavenging of HNO 3 by cloud particles means that more HNO 3 is available for gas phase reactions.This essentially increases the abundance of NO x , which in most parts of the free troposphere would lead to higher ozone mixing ratios.However, less scavenging also means less redistribution of reactive nitrogen in the atmosphere.Convection is strongest in the tropics and thus the differences between ACCMIP and ACCMIP-S1 are most pronounced there (Fig. 15, top row).For the considered altitudes (250, 500 and 700 hPa), ACCMIP-S1 produces however similar ozone mixing ratios than ACCMIP for most months.The global mean tropospheric ozone column slightly decreases in ACCMIP-S1 compared to ACCMIP (from 36.1 to 35.3 DU).ACCMIP-S1 performs slightly better than ACCMIP for some comparisons to observations (Figs. 11 and 12).This is also reflected by a slightly better representation of most ozone precursors, but overall the differences between ACCMIP and ACCMIP-S1 are small.Thus the effects of this update should not strongly affect the conclusions drawn from EVAL2, QCTM, TS2000 and ACCMIP.
ACCMIP-S2 is a sensitivity simulation to quantify the uncertainty imposed by a possible HNO 3 -forming channel of the HO 2 + NO reaction (Butkovskaya et al., 2007) on the results of the other simulations in the present study.None of the other simulations included this reaction channel, which is not implemented into the recent JPL catalogue (Sander et al., 2011).The reaction rate coefficient of the channel is uncertain.It may depend on pressure and temperature only (Butkovskaya et al., 2007), or additionally on water vapor concentration (Butkovskaya et al., 2009).ACCMIP-S2 includes the additional dependence on water vapor concentration as described by Gottschaldt et al. (2013).This provides an upper estimate for the effects of the reaction, because water vapor enhances the HNO 3 -forming channel.Apart from the above modification to the chemical mech-anism, ACCMIP-S2 is identical to ACCMIP-S1.The additional reaction has the biggest absolute impact on ozone in the altitude range of about 10 hPa, where atmospheric ozone mixing ratios have a maximum.Compared to a simulation without the reaction, ozone increases at around 10 hPa, and decreases throughout the troposphere (see also Gottschaldt et al., 2013).The relative impact of the reaction is largest in the lower parts of the troposphere.Both, total and tropospheric column ozone decrease when the additional HNO 3forming channel is included 13 and 14).Due to the dependency of the reaction rate coefficient on temperature and water vapor concentration, effects on tropospheric column ozone are largest in the tropics.The pronounced high ozone bias of the other simulations in this region is significantly reduced as a result.RMSD and overall mean bias decrease in ACCMIP-S2 compared to ACCMIP and ACCMIP-S1 (Fig. 11, left panels).In particular, the positive bias in total and tropospheric column ozone in ACCMIP reduces in all regions, and gets negative in the SH extratropics for tropospheric column ozone.The better performance of ACCMIP-S2 (and QCTM) in tropospheric column ozone is also visible in the Taylor diagram (lower left panel in Fig. 12).
The effects on ozone precursors are mainly determined by a decreased oxidizing capacity in an atmosphere with the additional HNO 3 -forming channel (Gottschaldt et al., 2013).Most notably this is reflected in the annual cycle of CO (Fig. 18), where all other simulations are biased low, but ACCMIP-S2 is mostly biased high.The reaction with OH is a major sink of CO in the troposphere, which leads to higher CO mixing ratios in the less oxidizing atmosphere of ACCMIP-S2.There is also a secondary effect from reduced OH on CO, as mixing ratios of CO precursors depend on the oxidizing capacity too.One of these precursors is methane, which has a ∼ 50 % longer lifetime in ACCMIP-S2 than in ACCMIP-S1 (Table 4).Compared to ACCMIP-S1, ACCMIP-S2 agrees better at the Alert and Terceira Island stations, but worse at the others (Fig. 18).The value of this inconclusive result is further limited by the fact that discrepancies between observations and simulations also reflect uncertainties in the CO emission inventories.However, we note that the effect of HO 2 + NO → HNO 3 on CO is bigger than the effects of different CO emissions (Table S2).Comparing to Emmons et al. (2000), ACCMIP-S2 generally performs better than ACCMIP-S1, except for NO x and CH 4 .Note that CH 4 , CO and NO x mixing ratios strongly depend on the emissions and thus also reflect uncertainties in the inventories used.Furthermore, there are other uncertainties of reaction kinetics in atmospheric ozone chemistry (Taraborrelli et al., 2012), which need to be explored in subsequent studies.
Overall, introducing the HNO 3 -forming channel of the HO 2 + NO reaction has a stronger influence on ozone-related performance metrics than most of other differences between the six simulations and significantly reduces the high bias in tropospheric column ozone.This is an indication that including this reaction channel is important for a realistic simulation of ozone, but further experimental evidence is required.In some altitudes and regions, in particular at lower levels in the tropics, the performance worsens, pointing to a possible modification required in the reaction rate dependence on water vapor as included in the simulation here.

Conclusions
Four present-day simulations with different setups of the ECHAM/MESSy Atmospheric Chemistry (EMAC) model have been evaluated in this study through a comprehensive comparison to observations.In particular, results from a previous EMAC evaluation of a model simulation with nudging towards realistic meteorology in the troposphere by Jöckel et al. (2010) have been compared to new simulations with different model setups and emissions data sets in free-running time slice and nudged quasi chemistry-transport model (QCTM) mode (Deckert et al., 2011).The latter two configurations are important for chemistry-climate projections and the quantification of individual sources (e.g., transport sector) that lead to small chemical perturbations of the climate system, respectively.The goal of this work was to compare the EMAC simulations to each other with a focus on how well ozone and selected climate parameters are represented in the different setups (nudged vs. free-running) and simulations with different boundary conditions (emissions, sea surface temperatures and sea ice concentrations).
The two nudged simulations (EVAL2 and QCTM) are transient and driven by the same SSTs and (transient where available) emission inventories (with the exception of aviation).The previously evaluated EVAL2 simulation that covers the time period 1999-2009 (Jöckel et al., 2010) serves as the reference simulation.In the QCTM simulation (QCTM, 1999(QCTM, -2007) ) the chemistry is decoupled from radiation and dynamics, thus omitting feedback mechanisms between these fundamental aspects of a chemistry-climate model.The setups of the free-running time slice simulations (TS2000 and ACCMIP) differ from each other in the emission inventories and the SSTs.To follow the specification of the Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP), emissions from Lamarque et al. (2010) and simulated SSTs and SICs from the CMCC climate model are used as input parameters in the ACCMIP simulation.The boundary conditions in the TS2000 simulation are more similar to the nudged simulations, except that emissions and SSTs are climatological means instead of transient data sets.All four EMAC simulations are carried out using the same resolution (T42L90MA).
In addition to a qualitative evaluation showing figures for a variety of different selected diagnostics, a quantitative evaluation has been performed to summarize the results.In particular, the normalized root-mean-square difference (RMSD) between model simulation and observations as well as the overall mean bias have been calculated consistently for climate parameters and ozone for certain domains and heightlevels.Where possible, an alternative observational data set was used in addition to the reference data set to consider observational uncertainty that is introduced by differences between different instruments or meteorological reanalyses.In addition, Taylor diagrams, which are a common graphical summary to evaluate climate models, have been shown.These diagrams display the normalized standard deviation, the centered RMSD and the pattern correlation between the model simulations and the observations.The main differences due to the setup of the simulations (free-running vs. nudged) are introduced through differences in the meteorology.The evaluation of the mean state of basic climate parameters is therefore important in addition to the evaluation of ozone.This study show that the mean state of temperature, eastward wind, northward wind, geopotential height, specific humidity, and radiation is in general well represented by the four EMAC simulations.Some differences exist in specific regions and altitudes which are related to the different setups.In particular we find a cold bias (∼ 3-7 K) in the extratropical lowermost stratosphere in the freerunning simulations (TS2000 and ACCMIP).This feature is common to many of the CMIP3 and CCMVal models (IPCC, 2007;SPARC-CCMVal, 2010).This cold bias is related to the wet bias, an overestimation of water vapor in this region by around a factor of 2-10, depending on the season, leading to a too high infrared radiative cooling.The nudged simulations show the same wet bias but due to the relaxation of the temperature towards realistic meteorology, a reduced cold bias in the extratropical lowermost stratosphere.In addition, the subtropical jet (∼ 10-15 m s −1 ) at 60 • S in DJF from the ground up to around 50 hPa is underestimated in the free-running simulations.
The evaluation of tropospheric ozone and ozone precursors (NO x , CO and NMHCs) showed that the differences among the four model simulations which are related to the model setup are generally small.A common bias is the underestimation of the ozone hole.More significant differences exist due to the use of different boundary conditions (in particular emissions).For the temperature at 850 hPa in the tropics, the ACCMIP simulation shows a warm bias compared to ERA-Interim and NCEP due to the bias in the tropics in the prescribed modeled SSTs.
Tropospheric column ozone is generally overestimated compared to satellite observations, but the annual cycle of total column ozone is well represented.The high bias in tropospheric column ozone motivated two additional simulations that are identical to the ACCMIP simulation except for a code modification to avoid unrealistically high convective cloud water and ice contents for scavenging (ACCMIP-S1), and an additional modification in the chemical mechanism (ACCMIP-S2).ACCMIP-S2 includes a possible HNO 3forming channel of the HO 2 + NO reaction (Butkovskaya et al., 2007) which has a stronger influence on ozone-related performance than most other differences among the six simulations and significantly reduces the high bias in tropospheric column ozone.While there is missing experimental confirmation for this additional reaction channel, our model study suggests that including it could be important for a realistic simulation of ozone particularly in the UTLS in the tropics.In some altitudes and regions, in particular at lower levels in the tropics, the performance worsens, pointing to a possible modification required in the reaction rate dependence on water vapor as included in the simulation here.A similar improvement in tropospheric column ozone is also achieved by the nudged QCTM simulation, which uses a lower lightning NO x emission compared to other experiments.
Biases in ozone precursors exist but are strongly dependent on the inventory used.For example, the evaluation of CO showed an underestimation compared to observations in all EMAC simulations, particularly in regions with anthropogenic influence.The ACCMIP simulation with its different emission inventory from Lamarque et al. (2010) that includes a factor of 2 higher CO emissions than the inventory used in the other three simulations is generally in better agreement with the observations for CO.This stresses again the importance of accurate emission inventories for chemistry-climate modeling.
Evaluating ozone and ozone precursors with aircraft data has been proven as important in this and many previous studies.It would be important to update existing climatologies like the one by Tilmes et al. (2012) and Emmons et al. (2000) on a regular basis with newer campaigns.In addition to comparing to climatologies of aircraft data, a more direct comparison to particular campaigns should be envisaged.However, more local measurements exhibit the problem of a mismatch of spatial and temporal scales between observations and models.Sampling the model output along the flight path during the model simulation (see for example the S4D routine in Jöckel et al., 2010) and extracting the corresponding data, as planned as part of the Chemistry-Climate Model Initiative (CCMI, Eyring et al., 2013b), would facilitate and improve this comparison.
In addition, with growing complexity of chemistry-climate and earth system models, we advocate routine evaluation of models to be facilitated by common software tools that are made available to the community.All diagnostics and performance metrics shown in this paper are now implemented in the Earth System Model Validation Tool (ESMValTool).They can be routinely reproduced and applied to new EMAC simulations or other ESMs such as those participating in CCMI (Eyring et al., 2013b) or the Coupled Model Intercomparison Project (Meehl et al., 2014).licensed to all affiliates of institutions which are members of the MESSy Consortium.Institutions can be a member of the MESSy Consortium by signing the MESSy Memorandum of Understanding.More information can be found on the MESSy Consortium web-page (http://www.messy-interface.org).
The ESMValTool is currently under development and will be publicly released only at a later stage.A stable version of the tool can be made available upon request for development purposes.Interested users and developers are welcome to contact the lead author.For further information and updates, see the ESMValTool web-page at http://www.pa.op.dlr.de/ESMValTool.
The Supplement related to this article is available online at doi:10.5194/gmd-8-733-2015-supplement.
as our reference data set and ERA-Interim as the alternative.AIRS data are available from the middle of 2002 to the middle of 2011.The data used in this work cover the years 2003 to 2010.

Figure 2 .
Figure 2. Annual mean of zonally averaged temperature profile.The upper left plot shows ERA-Interim absolute values; all other plotsshow differences between the model simulations (or NCEP/NCAR) and ERA-Interim.Differences between the two fields which are not statistically significant according to the t test (95 % confidence level) are masked out in gray.

Figure 3 .
Figure 3. Annual cycle of water vapor climatology at 200 hPa averaged over the SH extratropics (20-90 • S) for the EMAC simulations in comparison to HALOE data.Shaded area indicates the ±1σ interannual variability.

Figure 4 .
Figure 4. Annual cycle of temperature (top) and water vapor (bottom) climatology at 100 hPa averaged over the tropics (20 • N-20 • S) for the EMAC simulations, in comparison to ERA-Interim reanalysis and HALOE data, respectively.Shaded areas indicate the ±1σ interannual variability.

Figure 5 .
Figure 5. Root-mean-square difference of the chosen basic climate parameters over the global domain, the tropics, and the NH and SH extratropics (from left to right).Columns and rows of each panel represent the EMAC simulations and the given diagnostics (see Table3), respectively.Where an alternative data set is available, the diagram boxes are split in two parts, showing the model performance compared to the primary (lower triangle) and alternative (upper triangle) data set.Where no observations are available, the triangles are marked white.

Figure 6 .
Figure 6.As in Fig. 5, for the overall mean bias.

Figure 7 .
Figure 7. Taylor diagrams of temperature (top row) and eastward wind (bottom row) over the four chosen domains (global, tropics, NH and SH extratropics, from left to right) and height-levels (850, 200, 30, and 5 hPa).

Figure 10 .
Figure 10.Total column ozone climatology for the EMAC simulations compared to the NIWA combined total column ozone database and GTO-ECV data.The values on top of each panel show the global (area-weighted) average, calculated after regridding the data to the horizontal grid of the model and ignoring the grid cells without available observational data in the GTO-ECV data.

Figure 11 .
Figure 11.Root-mean-square difference (top) and overall mean bias (bottom) for total and tropospheric column ozone (left), ozone profiles (middle) and surface CO diagnostics (right).Columns and rows of each panel represent the EMAC simulations (including the sensitivity experiments) and the given diagnostics (see Table3), respectively.Where an alternative data set is available, the diagram boxes are split in two parts, showing the model performance compared to the primary (lower triangle) and alternative (upper triangle) data set.Where no observations are available, the triangles are marked white.

Figure 12 .
Figure 12.Taylor diagrams for total and tropospheric column ozone (left), ozone profiles (middle) and surface CO diagnostics (right).

Figure 13 .
Figure 13.Tropospheric column ozone in the EMAC simulations compared to MLS/OMI observations.The values on top of each panel show the global (area-weighted) average, calculated after regridding the data to the horizontal grid of the model and ignoring the grid cells without available observational data.

Figure 14 .
Figure 14.Annual cycle of the tropospheric column ozone climatology in the EMAC simulations compared to MLS/OMI observations.The values on top of each panel show the global (area-weighted) average, calculated after interpolating the observations on the model grid and ignoring the grid cells without available observational data.

Figure 15 .
Figure 15.Annual cycle of ozone climatology in three regions (tropics, NH and SH extratropics) at three pressure levels (250, 500 and 700 hPa) for the EMAC simulations compared with ozonesondes data by Tilmes et al. (2012).Model and observational data are grouped into four latitude bands and sampled at three, with the models sampled at the ozonesonde locations before averaging together.The tropical region is represented by 9 stations, NH and SH extratropics by 22 and 5 stations, respectively (see Fig. S14).The shaded areas indicate the ±1σ interannual variability (for EMAC only).

Figure 16 .
Figure16.Ozone vertical profile climatology from selected aircraft campaign observations byEmmons et al. (2000) and corresponding simulated values by the EMAC simulations.Profiles represent mean values.The EMAC simulations are averaged over the same regions and time of year as the observations, but for different years.Solid whiskers indicate ±1 standard deviation and dotted whiskers show minimum and maximum, both for the observational data.

Figure 18 .
Figure 18.Annual cycle of CO surface concentration climatology for the EMAC simulations and NOAA GLOBALVIEW data, at nine different stations world wide.The shaded areas indicate the ±1σ interannual variability.

Table 2 .
Overview of the four EMAC simulations evaluated in this study.All experiments have a spin-up year at the beginning of the simulated period which is not considered in the analysis. .

Table 4 .
Estimated methane and MCF lifetimes for the EMAC simulations.