Interpol-IAGOS: a new method for assessing long-term chemistry–climate simulations in the UTLS based on IAGOS data, and its application to the MOCAGE CCMI REF-C1SD simulation

. A wide variety of observation data sets are used to assess long-term simulations provided by chemistry–climate models (CCMs) and chemistry-transport models (CTMs). However, the upper troposphere–lower stratosphere (UTLS) has hardly been assessed in these modelling exercises yet. Observations performed in the framework of IAGOS (In-service Aircraft for a Global Observing System) combine the advantages of in situ airborne measurements in the UTLS with an almost-global-scale sampling, a ∼ 20-year monitoring period and a high frequency. Even though a few model as-sessments have been made using the IAGOS database, none of them took advantage of the dense and high-resolution cruise data in their whole ensemble yet. The present study proposes a method to compare this large


Introduction
Chemistry-climate models (CCMs) and chemistry-transport models (CTMs) are essential tools for understanding atmospheric composition, providing information where measurements are lacking and predicting air composition future evolution. Assessing and reducing uncertainties in the processes controlling its past and future changes can be achieved by comparing an ensemble of simulations from different models while using the same simulation setup. Among the model intercomparison projects, the main goal of the Chemistry-Climate Model Initiative (CCMI; Eyring et al., 2013) is the reduction of the uncertainties in the multi-model projections involving stratospheric ozone, tropospheric composition and climate change but also in a better understanding of the atmospheric processes relevant for these topics. CCMI is a common initiative from the International Global Atmospheric Chemistry (IGAC) and Stratosphere-to-troposphere Processes And their Role in Climate (SPARC) projects. It has taken over from both SPARC CCMVal (Chemistry-Climate Model Validation; SPARC, 2010) focused on the stratosphere and IGAC ACCMIP (Atmospheric Chemistry-Climate Model Intercomparison Project; Lamarque et al., 2013) dealing mainly with tropospheric composition. In this framework, a set of simulations has been designed to address its objectives. Among them, the REF-C1SD experiment aims at assessing the ability of the models to reproduce the actual atmospheric composition for the recent climate time period. For this purpose, a part of its protocol consists of nudging the meteorological fields to meteorological reanalyses based on observations, as indicated by the SD suffix (which stands for "specified dynamics"). The task for each participating model thus consisted of simulating as realistically as possible the tropospheric and stratospheric compositions in the last decades , following a common protocol.
Several studies have assessed the ability of REF-C1SD experiments, or previous similar simulations of air composition under recent climate conditions, to reproduce the mean tropospheric and/or stratospheric composition, by the use of monthly mean climatologies from observation data sets as reference, mostly from space. Froidevaux et al. (2019) based the evaluation of the REF-C1SD run from the Community Earth System Model version 1 -Whole Atmosphere Community Climate Model (CESM1-WACCM) on zonal monthly means of the stratospheric ozone column, using the Microwave Limb Sounder on the Aura satellite (Aura-MLS) and the multi-satellite data set merged in the framework of the GOZCARDS (Global OZone Chemistry And Related trace gas Data records for the Stratosphere) project. As described in Young et al. (2018), tropospheric ozone fields pro-vided by the ACCMIP participating models have been assessed, referring to zonally averaged mixing ratios from the Tropospheric Emission Spectrometer , and tropospheric ozone column from OMI-MLS . Hu et al. (2017) also compared the OMI-MLS tropospheric ozone columns to a GEOS-Chem simulation. The observed carbon monoxide (CO) columns from Measurement Of Pollution In The Troposphere (MOPITT) instrument served as the reference in the assessment of modelled tropospheric CO, notably from the REF-C1SD simulation generated by the Global Modeling Initiative (GMI) CTM over the period 2000-2010 (Strode et al., 2016), and from the CESM1 Community Atmosphere Model version 4 with chemistry (CAM4-Chem) .
Only few studies compared observations (in situ measurements or from space) and CCMI  or similar simulations, focusing on the upper troposphere-lower stratosphere (UTLS). However, the latter is a key region regarding both the ozone (O 3 ) radiative forcing (Riese et al., 2012) and the stratosphere-troposphere exchange (STE) that substantially influences tropospheric ozone levels (e.g. Tao et al., 2019), albeit with a high uncertainty due to their different representations in models (Stevenson et al., 2006). Smalley et al. (2017) (Williams et al., 2019). In addition to ozonesondes, aircraft measurements from different campaigns were used in the evaluation of the REF-C1SD simulations from the CESM1 CAM4-Chem model . Aircraft campaigns have already proven their usefulness in assessing models in the UTLS. Tilmes et al. (2010) built a climatology of O 3 and CO in the tropics, subtropics and extratropics by gathering a wide set of aircraft campaigns from 1995 to 2008. Hegglin et al. (2010) used this and other aircraft-campaign-based data sets to assess the 18 CCMs participating in CCMVal-2 in the extratropical lower stratosphere using several diagnostics. For instance, the seasonal cycles derived at 100 and 200 hPa highlighted a relatively good reproduction of ozone behaviour in the lower stratosphere and allowed us to identify an overestimation of the transport from the tropics at 100 hPa and across the tropopause at 200 hPa. However, their conclusion also highlighted the limitations in space and time of the in situ observations, especially in the upper troposphere.
Among available observation data sets, the commercial aircraft measurements from the ongoing IAGOS (In-service Aircraft for a Global Observing System; Petzold et al., 2015, http://www.iagos.org, last access: 7 May 2021) European re-search infrastructure are well designed to study ozone and CO in the long term, notably in the UTLS . IAGOS observations started in August 1994 for ozone and in December 2001 for CO. They are characterized by a high spatiotemporal resolution and a wide coverage with most data gathered at cruise levels (9-12 km above sea level). Thus, the IAGOS database is suited to assess long-term simulations in this altitude range. Recently, its ozone data have been used to evaluate simulations from the CESM1 CAM4-Chem  and GEOS-Chem models (Hu et al., 2017) during the periods 1995and 2012, respectively. Tilmes et al. (2016 used the IAGOS measurements gathered in the vicinity of Narita airport (Japan) only, and the comparison made by Hu et al. (2017) only spread over 2 years, while IAGOS ozone data have been available since 1994 and covered a wide area, especially in the northern midlatitudes from western North America to East Asia. Brunner et al. (2003Brunner et al. ( , 2005 combined research aircraft measurements with the first years of the IAGOS-MOZAIC database (1995)(1996)(1997)(1998) to assess five CTMs and two CCMs. Gaudel et al. (2015) performed an evaluation of the MACC (Monitoring Atmospheric Composition and Climate) reanalysis over Europe during 2003-2010, using IAGOS O 3 and CO measurements. However, these comparisons used frequent simulation outputs. Although the high frequency is necessary for their approach to separate accurately the air masses into different categories, it is not adapted to the assessment of monthly averaged fields used in multi-model intercomparisons. Consequently, the IAGOS cruise data in the UTLS have been used neither as a whole ensemble nor to derive a monthly climatology for the evaluation of long-term chemistry-climate simulations. This is what we propose in the present paper.
To compare the REF-C1SD simulations against IAGOS data, interpolating the simulation outputs onto the highresolution observations would be the most accurate way, but high-frequency outputs from multi-model intercomparisons such as CCMI are not available yet. Alternatively, the comparison could be performed after mapping the highresolution IAGOS data onto the model grid on a monthly basis. Several gridding methods already exist for in situ measurements. Some of them consist of calculating a linear combination from the neighbouring measurements points onto each grid point (e.g. New et al., 2000). However, it requires us to store the information of all the measurement locations and during a whole month simultaneously. It is thus convenient for measurements with regular locations such as surface stations, whereas their use on the IAGOS database would be expensive computationally as well. Variational methods are also widely employed (e.g. Bourassa et al., 2005) but they concern data assimilation, which is not our purpose. The present study aims at providing a new methodology designed to generate a gridded monthly data set from the IA-GOS measurements, in order to evaluate REF-C1SD types of simulations. We also propose a set of relevant diagnostics for the model evaluation against IAGOS data mapped onto the model grid. These diagnostics originate from Cohen et al. (2018) who studied climatologies and trends in ozone and CO, based on the analysis of the full IAGOS data set corresponding to the cruise phase of flights. The use of such a high spatial and temporal resolution data set allows us to account for inter-regional differences that could not be highlighted with zonal means. Its projection onto a model grid suits well the constraint of working on monthly outputs from multidecadal simulations like REF-C1SD. In order to demonstrate the interest of the new methodology and its associated diagnostics, we perform the assessment on one of the REF-C1SD simulations, that of the MOCAGE (MOdèle de Chimie Atmosphérique à Grande Echelle) CTM.
In Sect. 2, we describe briefly the IAGOS observations, the CCMI model intercomparison project, the MOCAGE CTM that we use in this study and its configuration for the REF-C1SD simulation. In Sect. 3, we present the methodology proposed to map the IAGOS data set onto the model grid on a monthly resolution, the chosen statistical metrics for models' evaluation and the different assessment diagnostics. In Sect. 4, we present a first application of this methodology on the evaluation of the MOCAGE REF-C1SD simulation. Strengths and weaknesses of the methodology and the chosen diagnostics are discussed. Conclusions are given in Sect. 5.

The IAGOS observations
The European Research Infrastructure IAGOS , http://www.iagos.org, last access: 7 May 2021) provides in situ measurements aboard several commercial aircraft. The observations used hereafter have been performed in the framework of the ongoing IAGOS-Core programme that followed the MOZAIC programme . Ozone (CO) measurements started in August 1994 (December 2001), based on an UV (IR) absorption technology, with an accuracy of 2 ppb (5 ppb), a precision of 2 % (5 %) and a time resolution of 4 s (30 s). Further information about the instruments can be found in Marenco et al. (1998) and Thouret et al. (1998) for O 3 and in Nédélec et al. (2003) for CO. Nédélec et al. (2015) present a more recent evaluation of both ozone and CO instruments in the frame of IAGOS. The IAGOS observations (referring to the IAGOS-Core database hereafter) frequently sample the whole troposphere near airports, measuring vertical profiles during ascent and descent phases, and the UTLS during the cruise phases, mostly in the northern midlatitudes where most of the flight observations are gathered. In these latitudes, a recent analysis of O 3 and CO climatologies and trends based on almost two decades of IAGOS cruise measurements has been performed in Cohen et al. (2018). In addition to global climatologies, the same analysis also focused on eight well-sampled regions in the UT and the LS separately. In order to generate results comparable with the latter, this study focuses on the same time period (1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013) and, where relevant, on the same regions.

The CCMI project and the REF-C1SD experiment
The CCMI project is a common initiative from the IGAC and SPARC programmes. CCMI phase 1 gathers a community of 18 CCMs and two CTMs, whose description is given in the review of Morgenstern et al. (2017). A series of experiments has been designed to model tropospheric and stratospheric air compositions for past, present and future climates. For each experiment, a common protocol is recommended to all participating models. Amongst the CCMI simulations, the REF-C1SD reference experiment aims at modelling as realistically as possible the day-to-day tropospheric and stratospheric compositions in a recent climate, using SD. For this purpose, as described in Eyring et al. (2013), the simulations are driven by (or nudged towards) dynamical reanalysis data sets (typically ERA-Interim or MERRA) and extending from 1980 to 2010. For this long-term simulation, the 3-D output fields of species concentrations are archived as monthly means.

The MOCAGE model and the simulation setup
The MOCAGE model (Josse et al., 2004;Guth et al., 2016) is an offline global CTM. The chemical scheme is composed by the coupling of the RACM (Regional Atmospheric Chemistry Mechanism; Stockwell et al., 1997) and the REPROBUS (REactive Processing Ruling the Ozone BUdget in the Stratosphere; Lefèvre et al., 1994) schemes, corresponding to tropospheric and stratospheric chemistry, respectively. The MOCAGE REF-C1SD simulation is run using a global domain at a 2 • ×2 • horizontal resolution, and 47 vertical levels, in hybrid σ -pressure levels, distributed from the surface up to ∼ 5 hPa. The simulation is driven by the meteorological fields from the ERA-Interim reanalysis. The biomass burning and anthropogenic emissions come from the Global Fire Emissions Database version 2 (GFEDv2) and MACC/CityZEN EU project (MACCity) inventories, respectively. The latter is characterized by a 10-year resolution, and a linear interpolation is applied to derive yearly emissions. The period spreads from August 1994 to December 2013, consistent with Cohen et al. (2018). The first 14 years come from the MOCAGE REF-C1SD simulation originally produced for the CCMI project. For the years out of the period covered by the experiment, the MOCAGE REF-C1SD run has been extended to 31 December 2013 using the same code and inputs as in the original MOCAGE CCMI REF-C1SD simulation.

Methodology
The objective of the proposed methodology is to make possible the comparison between the whole IAGOS database and the 3-D monthly mean volume mixing ratios from CTMs and CCMs simulations. Our approach consists of distributing the IAGOS observations, performed every 4 s, on a given model grid. A first application is proposed on the MOCAGE REF-C1SD run, characterized by a ∼ 200 km horizontal resolution in the midlatitudes and a ∼ 800 m vertical resolution in the UTLS. In order to account for the distance of the measurements from the centre within one given cell, we chose a reverse linear interpolation at the first order, as described in Sect. 3.1 and illustrated in Fig. 1. The subsequent gridded monthly means are derived using weighted averages, as described in Sect. 3.2, and are directly comparable to the model monthly mean outputs.
In a first step, this approach is used for a statistical evaluation of the MOCAGE REF-C1SD climatologies on a hemispheric scale over the periods December 1994-November 2013 for O 3 and December 2001-November 2013 for CO. The data processing used to produce the climatologies and the statistical metrics chosen are presented in Sect. 3.3. In a second step, we attempt to go further in the assessment of the MOCAGE simulation by evaluating separately the upper troposphere and the lower stratosphere. For this purpose, the discrimination between the grid points mostly representative of the UT or the LS is necessary. As in Cohen et al. (2018), this has been done with respect to Ertel potential vorticity (PV) and applied in eight northern midlatitude regions selected because of their high level of sampling by IAGOS. The methodology used is explained in Sect. 3.4.

Reverse interpolation of a given measurement point on the model grid
At a given point where IAGOS measured a mixing ratio C obs (X) for species X, the algorithm presented here locates its position on the model grid defined by its longitude, latitude and hybrid σ -pressure coordinates. More precisely, we locate the model grid point which is the closest west and south of, and below (in altitude), the observation point and which corresponds to the ith, j th and kth grid point coordinates, respectively. As shown in Fig. 1c, a normalized scalar is then computed for each dimension (coefficients α, β, γ ), increasing linearly with the distance between the measurement point and the (i, j, k) grid point. Note that the γ vertical coefficient is derived from log-pressure coordinates. Finally, a resulting 3-D weight is computed for each of the eight closest cells. By noting the variable indexes I , J and K belonging to the ensembles {i, i + 1}, {j, j + 1} and {k, k + 1}, respectively, we define the functions f I , g J and h K , whose values depend on α, β and γ , respectively, such as (c) to calculate a normalized scalar for each dimension (α and β), depending on the distance between the measurement point and the "bottom-left" grid point; (d) to calculate the weight of the four closest grid points. As indicated in the colour scale on the right, this weight ranges between 0 and 1.
The resulting weight of each of the grid points surrounding the measurement location is thus defined as the following product: In this way, as illustrated in Fig. 1d, for a given cell (I, J, K) amongst the eight closest ones, this weight decreases when the distance increases between the measurement point and the model grid point. Note that since the simulation outputs are monthly averages, we use the monthly mean surface pressure for determining the hybrid σ -pressure levels on the 47 vertical grid levels for a given model longitude/latitude. Although the surface pressure can show an important intramonthly variability, we calculated that a 30 hPa change at surface would cause a variation weaker than 2 hPa on a given vertical grid level in the UTLS. Although caution is needed while treating low-altitude measurements, the monthly resolution on the surface pressure field thus has a negligible impact on the distribution of the IAGOS data from the cruise altitudes onto the model vertical grid.

Deriving the monthly mean values from observations
The weighting coefficients defined above correspond to a single observation data point. To obtain monthly averages from the whole observation data set, the last step consists of summing up all the values measured in the vicinity of the (i, j, k) grid points for each month. Thus, for a given grid point (i, j, k), we define n as the index for the measurement performed in its vicinity during the considered month, and the corresponding mixing ratio for the species X is denoted C obs,n (X), and N the total number of measurements performed in its vicinity. The monthly value of the X mixing ratio at (i, j, k) is then derived with the equation where the denominator is equivalent to the amount of weighted measurement points performed in the (i, j, k) grid cell during the chosen month. Hereafter, we refer to it as N eq . In the end, this method yields monthly fields of IAGOS O 3 and CO mixing ratios (or any other variable measured by IAGOS, e.g. water vapour) projected on the MOCAGE grid points where IAGOS data are available. This data set is named IAGOS-DM hereafter, the suffix DM referring to the distribution on the model grid. With this method, the cruise observation data are distributed onto the MOCAGE vertical levels spanning from level 28 up to level 22 and corresponding to the ∼ 360-175 hPa interval. Note that the measurement points on the MOCAGE vertical levels below level 28 (∼ 360 hPa) are considered as corresponding to ascent or descent phases of the flights. These measurements are not processed since they are only available in small areas close to airports. Levels 27 and 28 generally correspond to these phases too but include cruise measurements above elevated lands, since hybrid σ -pressure levels tend to follow land elevation. In order to compare the observations and the model at the same locations and months, we apply a mask on the MOCAGE REF-C1SD simulation outputs that allows us to account only for the IAGOS-DM sampled grid points. The subsequent data set is named MOCAGE-M, the letter "M" referring to the mask. Thus, IAGOS-DM and MOCAGE-M data sets are spatially consistent and can be used to make grid-point-to-grid-point comparisons on climatological timescales, as long as we assume the gridded IA-GOS data to be representative of the measurement period.
The latter point has been tested on a 5-year subsample using the simulation daily outputs instead of monthly outputs. It consisted of comparing MOCAGE-M to a test product derived by calculating monthly averages from the daily outputs and applying a mask based on the IAGOS daily sampling.
The results from this test are briefly presented in Sect. 4. In order to test the advantages of the linear interpolation involving weighting factors, we also derive another product from the IAGOS database using a simplified method, i.e. by solely averaging the measurement points into the grid cells where they are located. This control product is named IAGOS-DM-noW hereafter, the -noW suffix standing for "no weighting". Since it changes the spatial sampling distribution, a new subsequent mask has to be applied to the MOCAGE grid to be consistent with the IAGOS-DM-noW product, called MOCAGE-M-noW hereafter.
3.3 Methodology for the assessment of the climatologies

Filtering conditions
For the climatological part of this study, we chose to perform a seasonal and a yearly analysis. Avoiding sampling biases where and when IAGOS-DM data (counted as N eq ) are not numerous enough requires that the seasonal sample N eq reaches a minimum threshold to be selected (noted N thres ). We chose to set this N thres limit depending on latitude to account for the varying grid-box area linked to the 2 • × 2 • grid and on the chemical tracer to account for the shorter period for CO measurements compared to O 3 . N thres therefore decreases with latitude following a cosine function, similarly to the model horizontal grid cell areas. The reference threshold N thres,ref corresponds to O 3 measurements for gridbox areas during a given season, over the whole period (December 1994-November 2013 for O 3 and December 2001-November 2013 for CO). It has been set to N thres,ref = 100 as a compromise between sampling robustness and a largeenough amount of data in IAGOS-DM sample. Accounting for the shorter CO measurement period compared to O 3 (∼ 60 % of the O 3 period), the same threshold applied to the CO climatologies would result in a greater proportion of filtered-out grid cells. Thus, the corresponding N thres threshold for this species is derived by applying a factor of 0.6, leading to 60. Note that this reference threshold is defined seasonally. Therefore, the N thres,ref used for yearly climatologies is multiplied by a factor of 4.

Statistical metrics for assessing the climatologies
Quantifying a simulation assessment requires the use of statistical parameters. This paragraph aims at defining the chosen metrics and at justifying this choice. Pearson's coefficient is a key result from linear regressions. It is used to quantify the correlation between two signals. If we call (m i ) i∈ [[1,N]] and (o i ) i∈ [[1,N]] the lists of modelled and observed values, respectively, their correlation is defined as where m and o are the mean values and σ m and σ o their respective standard deviations. Quantifying total biases and mean errors is also primordial in a model assessment. However, the use of the absolute mean bias and root mean square error (RMSE) may not be relevant for climatological purposes because of a strong influence that could arise from observed outliers. In our context, another inconvenience lies in the strong vertical O 3 gradient near and above the tropopause. It tends to induce a strong absolute bias with respect to the tropospheric mixing ratios, since it makes the O 3 absolute mean bias and RMSE mainly depending on the highest vertical grid cells. The normalized bias metric (and associated standard error) is chosen for a better representativeness of biases for both low and high mixing ratios. The modified normalized mean bias (MNMB) and the fractional gross error (FGE) are respectively defined as and The MNMB (FGE) represents precisely the spatial average based on the model relative biases (on their absolute value) shown in Figs. 3, 4 and in Sect. A in Appendix.

Methodology for assessing the seasonal cycles in the UT and in the LS
A second part of this assessment targets the behaviour of the model in the UT and the LS separately. The diagnostics we use for this purpose are adapted from Cohen et al. (2018), based on Thouret et al. (2006), who used the PV fields from the ECMWF operational analysis to derive the tropopause pressures. In contrast to the latter studies, we define the tropopause layer with the monthly averaged PV fields from ERA-Interim, as used in the MOCAGE REF-C1SD simulation. A given grid point is considered as belonging to the UT if its monthly PV is lower than 2 potential vorticity units (PVU) and to the LS if the PV is greater than 3 PVU. The cells at which PV ranges between 2 and 3 PVU are considered as belonging to the transition zone separating the two layers and are not selected. In order to enhance the distinction between the UT and the transition zone, the first model level below the 2 PVU threshold is also filtered out from the UT. The 2 PVU threshold is derived from a logpressure interpolation between the grid points. We also filter out the grid boxes where this PV classification is not consistent with the mean observed O 3 mixing ratio, i.e. where the monthly O 3 level reaches 140 ppb in the UT and where it goes under 60 ppb in the LS. It avoids an additional bias based on errors in the dynamical field leading to unrealistic UT and LS attribution. These thresholds on O 3 mixing ratio were chosen according to the O 3 seasonal cycles shown in Fig. 3.7 in Cohen et al. (2018), where the upper boundary linked to the interannual standard deviation in the UT is less than 100 ppb and where the lower boundary in the LS is greater than 100 ppb. We estimated that a supplementary 40 ppb interval would limit an exaggerated filtering of grid cell monthly values. As in Cohen et al. (2018), we focus our analysis on the seasonal cycles for eight regions in the northern midlatitudes that are well sampled by IAGOS. Their coordinates and their corresponding sampling are detailed in Table 1 in Cohen et al. (2018). Because of the 2 • × 2 • horizontal grid resolution in the simulation, we applied a 1 • eastward or northward shift on the odd-coordinated edges. The subsequent regions defined in this paper are shown in Fig. 2. For each of them, the monthly means are calculated by averaging the gridded monthly means separately in the UT and the LS. The latter values were defined as described in Sect. 3.1 and 3.2. In Cohen et al. (2018), the regional monthly means with less than 300 data were filtered out. Here, due to the loss of data caused by the monthly resolution, we lowered this minimum threshold to 150 in order to keep taking the less-sampled regions into account, such as western North America and Siberia. Still, we kept the criterion from Cohen et al. (2018) which required at least 7 d between the first and last measurements in the considered month and region, avoiding the averages to be representative of transient meteorological conditions only. Following the same study, the computation of the seasonal cycles is based on the years exhibiting 7 available months or more, distributed over three seasons at least. This criterion avoids biases linked to the interseasonal differences in the sampling, thus ensuring a good representativeness of the whole year. It is important to note that the sampling threshold mentioned in this paragraph concerns each monthly average within a regional time series, contrasting with the sampling threshold we use for the (multi-)decadal average on each grid cell in the horizontal climatologies.

Monthly representativeness
A first step in the assessment of the methodology consists of testing the monthly representativeness of the IAGOS-DM mean values, in order to evaluate the temporal consistency between IAGOS-DM and MOCAGE-M. For this purpose, as mentioned at the end of Sect. 3.2, we compared MOCAGE-M to a test product derived by calculating monthly averages from the simulation daily outputs, after applying a mask based on the IAGOS daily sampling. For this test, the chosen period spreads from 2003 to 2007 inclusive, an uninterrupted measurement period for both ozone and CO. Concerning the mean 3-D distributions, a mean normalized difference between the two products has been found below 1.7 % for each season and each species. In absolute values, 10 % of the yearly mean biases are greater than 6.0 % (4.1 %) for ozone (CO), and 1 % are greater than 13.1 % (10.3 %). Seasonal mean biases are characterized by a 90th percentile generally lower than 10 %, and a 99th percentile from 14.7 % up to 22.8 % for ozone and from 13.0 % up to 18.1 % for CO. The maximum values correspond to winter and spring. Concerning the seasonal cycles, the relative difference between the two MOCAGE products was found to be almost systematically below 5 %, and amongst all the regions, its ozone values seldom reach past 10 %, with a maximum value at 15.2 %. In conclusion of this comparison, the similar results obtained between MOCAGE-M and the test product suggested that in most cases, the IAGOS-DM monthly means could be considered as representative of the month.

Horizontal climatologies
Figures 3 and 4 show the yearly mean climatologies, respectively, for O 3 and CO, and the model relative biases. The latter are defined as the model bias normalized to the average between the two data sets and are provided in percentages in these figures. Level 22 is seldom reached by the IAGOS measurements, and levels 27 and 28 are sampled only in the vicinity of airports. Thus, only levels 26 up to 23 are represented in these figures. Additionally, the seasonal mean climatologies are available in Appendix A.
In Fig. 3, IAGOS-DM and MOCAGE-M show similar geographical structures. In the tropics and subtropics, the O 3 amounts are close, with consistent poleward gradients. Both have maxima located above northeastern Canada. The O 3 mixing ratio in the northern midlatitudes is underestimated in the model for levels 24-26, and close to the observations for level 23. The seasonal climatologies in Figs. A1-A4 show that this feature is representative of spring and fall, whereas ozone tends to be underestimated (overestimated) in all vertical grid levels in summer (winter). Note that the discontinuity over Greenland is due to its topography causing a steep elevation of the vertical grid levels.
In Fig. 4, CO also shows a good correlation between the two data sets, notably with the same maxima and minima locations. But the CO mixing ratio is generally overestimated by the model, especially over East Asia and India. In the northern midlatitudes, the seasonal climatologies in Figs. A5-A8 generally show an overestimation in winter and spring and a less-visible underestimation in summer and fall. in the simulation. Their stronger correlations at higher levels suggest a remarkably good consistency of the modelled stratospheric composition with the observations, showing its ability to simulate stratospheric chemistry and transport. The same feature is visible with the regression fit, showing a lower bias for O 3 , and at highest levels. With respect to the 1 : 1 line, levels 25 and 26 are characterized by an overestimation of the lower part of the O 3 distribution (< 120 ppb) and by an underestimation of the higher part, more pronounced during boreal summer according to Fig. B3. A possible reason is that the summertime tropopause altitude in these regions can be overestimated by the model, or that the vertical stability is underestimated. These biases have been largely improved with the most recent version of MOCAGE used to run CCMI phase 2 simulations. Concerning CO, the highest values (generally > 100 ppb) correspond to the strongly emitting and convective regions: South Asia, East Asia and tropical Africa. Figure 4 allows us to identify the high mixing ratios close to the 1 : 1 line at tropical African points, whereas the high mixing ratios with a positive bias were associated with both South and East Asian areas. The latter can be due to an overestimation of convection in this region and/or an overestimation in the inventory for Asian emissions. On the contrary, CO above tropical Africa shows good results, indicating a realistic combination between convection and emissions.
The method proposed in this paper to evaluate MOCAGE REF-C1SD against IAGOS data in the UTLS aims at being applied to other chemistry-climate simulations, like the REF-C1SD simulations from other models. Since IAGOS is mapped onto the model vertical grid, the latter differing from one model to another, we also plotted a synthetic regression in Fig. 6, where all the points at all levels have been gathered into a single scatterplot. This summarized model performance concerning mean spatial distributions includes the final products of our evaluation methodology for climatologies. From the whole ensemble of ∼ 13 000 (∼ 12 500) sampled grid points for O 3 (CO), the correlation shows a good agreement between the simulation and the observations, especially for O 3 (r = 0.95). Its regression fit is dominated by an overestimation for lower values (< 100 ppb) and an underestimation for higher values, especially between 200 and 300 ppb. Above 350 ppb, the balance between overestimated and underestimated O 3 values tends to be more balanced. Table 1 gives a synthesis of the biases and associated deviations, for the assessment of MOCAGE-M versus IAGOS-DM. The yearly MNMB equals −0.012 for O 3 and 0.049 for CO, demonstrating a very good estimation of these two species in the UTLS on a hemispheric scale, especially for O 3 . More precisely, it shows a balance between positive and negative normalized biases. The yearly fractional gross error (FGE), corresponding to the averaged normalized bias absolute value is also low, with 0.150 and 0.112 for O 3 and CO, respectively. The seasonal patterns show that metrics linked with CO biases (MNMB and FGE) generally yield values closer to 0, compared to O 3 . The O 3 seasonal behaviour is characterized by a balance between opposite seasons: the most positive (negative) bias takes place in winter (summer)  Table 2, the comparison between the two methods shows a better agreement between the model and the observations when we apply the interpolation with the weighting factors. The O 3 correlation with the "noW" products decreased to 0.84-0.92 compared to the 0.90-0.95 derived from our method, and the CO correlation dropped from 0.72-0.81 down to 0.63-0.72. The MNMB and the FGE show better scores for the "noW" prod-Y. Cohen et al.: Assessing CCM/CTM simulations using IAGOS ucts in each case, except the O 3 MNMB in DJF. The general improvement of normalized biases, normalized errors and spatial correlations, compared to a simplified gridding method, suggests that the use of a weighting function in our methodology can significantly enhance the model assessment.

Regional-scale analysis
In this section, we attempt to evaluate the simulation in the UT and the LS separately, focusing on the seasonal cycles.
For this, we sort both data sets between the two layers as explained in Sect. 3.4. As a first step, before comparing the simulation to the observations, we analyse the impact of the mapping method for IAGOS onto the MOCAGE grid on a monthly basis. For this purpose, two versions of the IAGOS data set are used. Hereafter, IAGOS-HR refers to the high-resolved IAGOS data synthesized in Cohen et al. (2018), where every single measurement was categorized as belonging to the UT (P TP + 15 hPa < P < P TP + 75 hPa), the tropopause transition layer or the LS (P < P TP −15 hPa), and where regional monthly means were derived by averaging all the concentrations measured above the defined region. In contrast, IAGOS-DM refers to the new product presented in this paper, i.e. the IAGOS data distributed onto the model's grid, then assigned to either the UT or the LS based on the monthly averaged PV at each model grid point. Note that IAGOS-HR seasonal cycles were computed on the original regions' coordinates, but the changes induced by the 1 • difference in some of the regions are expected to be negligible, Table 1. Seasonal and annual metrics synthesizing the assessment of the simulated O 3 and CO climatologies by IAGOS-DM, gathering all the vertical grid levels as in Fig. 6. From left to right: Pearson's correlation coefficient (r), modified normalized mean bias (MNMB), fractional gross error (FGE) and the sample size (N cells  based on the geographical sensitivity tests mentioned in Cohen et al. (2018). The comparison between the two IAGOS products in the matter of seasonal cycles is proposed in Figs. 7 and 8, respectively, for O 3 and CO. They are shown with their corresponding interannual variability (IAV), defined as a year-toyear standard deviation. For complementary information, a more exhaustive representation is proposed in Figs. C1 and C2 in the Appendix, showing the results with each region in a distinct panel. In Fig. 7, both IAGOS versions show a summertime O 3 maximum in the UT and a springtime maximum in the LS. A lessened contrast between the UT and the LS is observed in IAGOS-DM. In the UT, the O 3 volume mixing ratio and its interannual variability are higher in IAGOS-DM than in IAGOS-HR for the winter and fall seasons (∼ 60 ± 20 compared to ∼ 50 ± 10 ppb), whereas they are similar in spring and summer. In this layer, the most important differences between the two versions thus take place during lower-ozone seasons.
In the LS, the O 3 amounts are lower in IAGOS-DM (∼ 110-375 ppb) than in IAGOS-HR (∼ 150-450 ppb) during the whole year. There are two main reasons that explain the lower O 3 amounts in the LS and the higher amounts in the UT in IAGOS-DM compared to IAGOS-HR. The first is the projection of IAGOS observations with a very fine vertical resolution onto the MOCAGE vertical grid with a ∼ 800 m vertical resolution. Second, the use of a monthly PV cannot provide the description of the day-to-day variations of the tropopause altitude, whereas the latter can be important to sort the data points between the two layers. In other words, the effect of time averaging leads to a loss of tropopause sharpness, thus resulting in a misclassification of a nonnegligible part of the individual measurements. For a given layer, it introduces a bias due to unexpected mixing with another layer. Figure 7 also makes it possible to compare the  behaviour of each region. In the LS, the differences between northern and southern regions shown in IAGOS-HR are generally also visible in IAGOS-DM. The regional behaviours discussed in Cohen et al. (2018), i.e. the low summertime O 3 mixing ratio in the northwestern North American UT and in the Middle Eastern LS remain visible in IAGOS-DM, although the last one is substantially less pronounced. We also note high ozone values in November in the Siberian UT seen by IAGOS-DM only. It is linked to a strong positive anomaly in November 1997 due to an upper-layer air mass that could not be differentiated to the UT, and weakly balanced by the average with too few other years. In Fig. 8, the CO seasonal cycles in the UT are consistent between IAGOS-HR and IAGOS-DM, with a generally low difference, a common springtime maximum and a consistent inter-regional variability: a higher CO level in the two regions on the Pacific coast (northwestern North America and northeastern Asia), higher summertime amounts in northeastern Asia, and lower CO levels in one of the two southernmost regions (the Middle East). Note that the monthly resolution of both PV and filter-ing leads to a lessened sampling in the UT in IAGOS-DM.
In the North Atlantic region where aircraft trajectories describe a narrow altitude range, the resulting seasonal cycles were incomplete, so we chose to exclude them from both figures. We applied the same treatment to CO in the UT above the western Mediterranean Basin and Siberia, where the level of sampling during winter and spring (not shown) is insufficient to provide complete seasonal cycles. In the LS, the CO mixing ratio is always higher in IAGOS-DM (∼ 50-95 ppb) than in IAGOS-HR (from ∼ 40 up to 65 ppb). In IAGOS-HR, a seasonal cycle is noticeable only in the Middle East and northeastern Asia, whereas it is the case for almost every region in IAGOS-DM. The influence of the troposphere is increased in IAGOS-DM, with a high peak in May for the western Mediterranean Basin, in June-July for northeastern Asia and in July for Siberia, likely related to the effects of boreal biomass burning in the latter. Thus, mapping the observations onto the model grid significantly changes the CO seasonal cycles in the LS.  Y. Cohen et al.: Assessing CCM/CTM simulations using IAGOS As for O 3 , the reason why the CO amounts in IAGOS-DM are higher in the LS and lower in the UT comes from the coarse vertical resolution in the MOCAGE grid and from the uncertainty when sorting the UT data from the LS data using a monthly mean modelled PV field. More generally, the comparison between IAGOS-HR and IAGOS-DM for O 3 and CO clearly shows that the processing applied for mapping the IAGOS high-resolution data set onto the MOCAGE coarse grid slightly modifies IAGOS characteristics. This processing, which enables a meaningful comparison between IA-GOS long-term measurements and the REF-C1SD simulation, acts as a numerical filter. It is important to note that the seasonal cycles in IAGOS-DM generally show values ranging between the MOCAGE-M and the IAGOS-HR cycles, such as the yearly means in Table 3, especially in the LS where the mean O 3 bias drops from 84 ppb with IAGOS-HR down to 19 ppb with IAGOS-DM. The correlation in time also tends to be enhanced by the use of the IAGOS-DM product. It confirms that the representation derived from IAGOS-HR cannot be reached by a model with the typical REF-C1SD resolution, especially for CO in the LS, but some main characteristics mentioned above can still be used as criteria. Last, the comparison synthesized in Table 4 also shows a better consistency between model and observations when our method is applied, mainly in the matter of biases in the LS. No significant change is observed in the UT.
We now assess the MOCAGE-M seasonal cycles by comparing them to IAGOS-DM. As complements to Fig. 7, statistical results are given in Table 3. Note that averages calculated over all represented regions have been computed only to synthesize the assessment and to provide a quantification that confirms some features seen in the figures. As they are similar to the zonal averages, they are not meant to have a geophysical signification. A qualitative summary is also provided in Table 5. In the UT, MOCAGE-M shows a springtime maximum and higher O 3 concentrations (from ∼ 120 ppb up to 130 ppb), instead of the observed summertime maximum (the season in which O 3 values range between ∼ 80 and 110 ppb). Adding the fact that simulated O 3 levels are particularly strong in the northernmost regions (western North America and Siberia) where the stratosphere at the cruise levels is richer in O 3 , it is likely that the stratospheric influence on the UT is overestimated in the simulation. The inter-regional averages shown in Table 3 confirm the significant difference between the two data sets in the UT, both from the O 3 mixing ratio (97 ± 5 ppb in MOCAGE-M compared to 72 ± 9 ppb in IAGOS-DM) and from the seasonality (r = 0.35). In the LS, the simulation reproduces well the cycles including the seasonality (r = 0.84 as shown in Table 3), the magnitude, the amounts of ozone (203±23 ppb compared to 222±36 ppb from IAGOS-DM) and the inter-regional differences. The latter are characterized in both data sets by lower ozone levels in the two southernmost regions (western Mediterranean Basin and the Middle East) and higher ozone levels in the two northernmost regions (western North America and Siberia). Without the noisy signal characterizing western North America and the western Mediterranean Basin in IAGOS-DM, the interval representing the springtime interannual variabilities spreads from ∼ 200 ppb up to ∼ 400 ppb in both data sets, showing another feature well reproduced by the model. Though on a yearly basis, according to Table 3, the model tends to underestimate ozone IAV on average by a factor of 1.6.
The modelled CO seasonal cycles (Fig. 8) in the UT show similarities with the observations (IAGOS-DM), including the higher concentrations in the two Pacific coast regions (western North America and northeastern Asia), the strong summertime concentrations in northeastern Asia and also comparable mixing ratios between the model and the IAGOS-DM observations in most regions, as confirmed by Table 3. However, the simulation overestimates the CO mixing ratios in the two Pacific coast regions, and the seasonal maxima generally take place during late winter-early spring in the simulation, earlier than the observed middle-of-spring maxima. The seasonal minima are in phase with the observations. In the LS, the seasonal cycles' magnitude is underestimated by the simulation but the overall bias remains relatively low, with a 73 ± 5 ppb average for MOCAGE-M compared to 69 ± 9 ppb for IAGOS-DM. In most regions, MOCAGE-M shows seasonal cycles in the LS in phase with the UT, thus contrasting with the observations and making the correlation drop from 0.64 in the UT to 0.31 in the LS. This suggests that the model simulation is affected in the LS by transport from the troposphere during springtime. Consistently with observations, MOCAGE-M shows a summertime maximum in northeastern Asia exclusively. Although part of this feature may originate from the positive bias in the UT, the fact that it only concerns the summer season, in contrast to the UT, suggests that summertime convection also plays a non-negligible role.

Summary and conclusions
We developed a methodology that makes the IAGOS database ready to assess chemistry-climate long-term model simulations for recent decades and particularly the REF-C1SD experiment produced in the frame of the CCMI phase 1 project. The current paper describes this methodology and its application on a chosen simulation (the REF-C1SD simulation from MOCAGE-CTM), assessing modelled ozone and carbon monoxide monthly fields during August 1994-December 2013 and December 2001-December 2013, respectively.
The first step consists of generating a gridded monthly IA-GOS data set (IAGOS-DM), firstly by a linear-distanced reverse interpolation on the chosen model grid on a monthly basis and then by deriving weighted monthly means on each grid cell. The second step consists of deriving seasonal and annual climatologies for the well-sampled vertical grid lev- els, then to derive statistical scores for the simulation assessment. In the case of the REF-C1SD simulation from MOCAGE, the yearly mean spatial distribution is well reproduced by the model, especially for O 3 , and especially at the sampled highest levels too. It suggests a particularly good representation of the main stratospheric processes that affect O 3 in the UTLS. The extreme mean CO mixing ratios observed above the strongly emitting and convective regions in the tropics and subtropics are also visible in the simulation, with a very low bias above tropical Africa and a significant positive bias above South and East Asia. Globally, the annual O 3 normalized mean bias is very low (MNMB = −0.012) and slightly higher seasonally. They are a bit higher in winter and summer (|MNMB| = 0.144-0.169) than in spring and fall (|MNMB| = 0.027-0.033), with quasi-opposite values in each couple of antagonist seasons. The yearly bias in CO is positive (MNMB = 0.049), with highest values similarly in winter and spring, and particularly low values in summer and fall. The statistical metrics were applied for each vertical grid level separately in order to locate strengths and weaknesses of the model but also for all UTLS grid cells for the purpose of a bulk comparison that could be reiterated on other model simulations.
Another step consists of a comparison of the seasonal cycles between IAGOS observations and the MOCAGE simulation in the upper troposphere (UT) and the lower stratosphere (LS). It relies on the use of a monthly mean calculated PV field to define a UT and a LS separated by a transition layer, following the same principle as in Thouret et al. (2006). The mean seasonal cycles have been compared over the eight well-sampled regions defined and analysed in Cohen et al. (2018). The application to the assessment of this REF-C1SD experiment by MOCAGE is preceded by an analysis of the changes induced in IAGOS seasonal cycles by the projection on the model monthly grid. As expected, going from IAGOS-HR to IAGOS-DM systematically leads to an increase (decrease) in upper-tropospheric (lowerstratospheric) O 3 , to an increase in lower-stratospheric CO and generally to a slight decrease in upper-tropospheric CO. The use of a monthly mean PV field and the ∼ 800 m vertical resolution in the UTLS of MOCAGE onto which IA-GOS observations are projected automatically result in an artificial increase of stratosphere-troposphere exchange. It is explained by the fact that the grid cells in the vicinity of the tropopause are crossed by both tropospheric and stratospheric air masses in the course of a month. It results in a decreased vertical gradient between UT and LS. Nevertheless, the seasonal maxima and minima become less clear but remain visible in IAGOS-DM with respect to IAGOS-HR. The hierarchy between the regions is generally conserved from IAGOS-HR to IAGOS-DM, for both chemical species and both layers: in each of these cases, we find the same regions showing lowest/highest values between the two IA-GOS representations. Also, some specific local behaviours mentioned in Cohen et al. (2018) remain visible in IAGOS-DM. Concerning O 3 , we highlighted the consistency of the lowest quantities in the UT above western North America and, substantially less significant, in the LS above the Middle East. Concerning CO, we showed the conservation of the spring-summer maximum in northeastern Asia in the UT and its summertime maximum in the LS.
The evaluation of the MOCAGE REF-C1SD simulation (MOCAGE-M) with IAGOS-DM shows a good representation of O 3 in the LS in the matter of seasonal cycle magnitudes and geographical variability, thus highlighting the wellreproduced main stratospheric processes. In the UT, for all the regions, the model overestimates the O 3 mixing ratios and shows a typical lower-stratospheric seasonality, suggesting an overestimation in the transport from the stratosphere. The modelled CO field shows similarities with the observations in the UT, with a 1-month shift in the seasonal maxima. One possible reason is the decadal linear interpolation in an- thropogenic emissions implemented in REF-C1SD, leading to a lack of year-to-year variability in modelled CO fields. In the LS, CO is generally higher in the simulation and shows a seasonal cycle in phase with the UT, in contrast to IAGOS-DM. It suggests an overestimated tropospheric influence in this layer during springtime.
The methodology shown in this paper has proven useful for assessing the REF-C1SD experiment from MOCAGE in the UTLS, further highlighting the model strengths and weaknesses when compared to the densest in situ IAGOS data set in the UTLS. Particularly, the use of the IAGOS-DM product instead of IAGOS-HR systematically reduced the biases characterizing the simulation, thus avoiding an underestimation of the model abilities to reproduce the chemical composition of the UT and the LS in a recent climate time period.
The present methodology could be easily applied to CCMI REF-C1SD simulations from other models, both for an intermodel comparison and for assessing CCMI products against the IAGOS database, notably intermodel-averaged fields. To a greater extent, it can be used on a wide range of long-term simulations including both CCMs nudged and free runs in order to perform climatological comparisons. Precaution must be taken while extending this work to the specified-dynamics simulations from CCMs, regarding the loss of consistency between chemical and dynamical variables that is introduced by nudging, as highlighted in Orbe et al. (2020). Notably, inconsistencies between ozone and potential vorticity are likely to introduce noise in the simulated upper-tropospheric and the lower-stratospheric behaviours. Last, the assessment illustrated in this study is based on two chosen applications of our methodology, i.e. the analyses of long-term seasonal and yearly averages on different vertical grid levels and the mean seasonal cycles in the UT and the LS, but a wide diversity of complementary comparisons remains possible. We thus recommend this new product to the CCMI community.