On the impact of recent developments of an atmospheric general circulation model on the simulation of CO 2 transport

The quality of the representation of greenhouse gas (GHG) transport in atmospheric General Circulation Models (GCMs) drives the potential of inverse systems to retrieve GHG surface fluxes to a large extent. In this work, the transport of CO2 is evaluated in the latest version of the LMDz GCM, developed for the Climate Model Intercomparison Project 6 (CMIP6) relative to the LMDz version developed for CMIP4. Several key changes have been implemented between the two versions; those include a more elaborate radiative scheme, new sub-grid scale parameterizations of convective and boundary 5 layer processes, and a refined vertical resolution. We performed a set of simulations of LMDz with the different physical parameterizations, two different horizontal resolutions and different land surface schemes, in order to test the impact of those different configurations on the overall transport simulation. By modulating the intensity of vertical mixing, the physical parameterizations control the interhemispheric gradient and the amplitude of the seasonal cycle in the summer northern hemisphere, as emphasized by the comparison with observations at surface sites. However, the effect of the new parameterizations depends 10 on the region considered, with a strong impact over South America (Brazil, Amazonian forest) but a smaller impact over Europe, Eastern Asia and North America. A finer horizontal resolution reduces the representation errors at observation sites near emission-hot spots or along the coastlines. In comparison, the sensitivities to the land surface model and to the increased vertical resolution are marginal.


Introduction
The accumulation of carbon dioxide (CO 2 ) in the atmosphere further to anthropogenic activity is one of the primary drivers of climate change (Ciais et al., 2013).This trace gas therefore receives particular attention and benefits from various observation networks and systems at the surface, in the atmosphere and from space (e.g.Ciais et al. (2014)).These data streams can be used to locate and quantify the sources and sinks of CO 2 through the inversion of atmospheric transport in a Bayesian framework.
However, despite the large monitoring effort, such estimations still suffer from large uncertainties (Peylin et al., 2013).For instance, atmospheric inverse systems used in the last Global Carbon Budget of the Global Carbon Project (Le Quéré et al., 2018) disagree on the amount of the decadal land sink integrated over the northern extra-Tropical latitudes by about 1 GtC per year.Several factors could explain such an inconsistency, but uncertainties in the modelling of atmospheric transport have long boundary layer tracers by cumulus (Locatelli et al., 2015a), the unrealistic phasing of the diurnal cycle of convection over continents, the precipitation peak being generally simulated too early in the day (Guichard et al., 2004), and the lack of tropical variability (Lin et al., 2006).
In order to address these deficiencies, a new version of the LMDz GCM, called LMDz5B, has been developed for CMIP5 (Hourdin et al., 2013).The new physics treats shallow and deep convection separately.On the one hand, shallow convection is represented in a unified way by combining the diffusive approach of Mellor and Yamada (1974) for the small scale turbulence and a mass flux scheme, the thermal plume model (Rio and Hourdin, 2008), that represents both dry and cloudy thermals in the convective boundary layer.On the other hand, deep convection and downdrafts are represented by the Emanuel (1991) scheme coupled with a parameterization of cold pools (Grandpeix et al., 2009).Deep convection triggering and closure are not CAPE functions anymore.They depend on sub-cloud processes.The convective onset is now controlled by the thermal plume variables and the maintenance of deep convection after its onset is operated by the cold pools.In better agreement with observations, the main results are a delay of the convective initiation, a self-sustainment of convection through the afternoon (Rio and Hourdin, 2008;Rio et al., 2009) and a drastic increase of the tropical variability of precipitations (Hourdin et al., 2013).This version has not been implemented in the above-mentioned inversion system for CO 2 because preliminary CO 2 transport simulations showed unrealistically large seasonal cycles at some southern stations like Palmer Station (PSA) in Antarctica (unpublished results).However, it was successfully used for aerosol data assimilation around North Africa by Escribano et al. (2016) and showed promising improvements for the representation of the magnitude of diurnal variations of surface concentrations Locatelli et al. (2015a).
For CMIP6, configuration 5B of LMDz has further evolved from Hourdin et al. (2013): it has a different formulation of the triggering assumptions, a different radiative transfer code and it accounts for the thermodynamical effect of ice.The convective triggering is now based on evolving statistic properties on the thermal plumes by considering a thermal size distribution instead of a bulk thermal (Rochetin et al., 2013).The motivation behind was to depart from the QE hypothesis and to allow a more gradual transition between shallow and deep convection through three step processes (appearance of clouds, crossing of the inhibition layer, and deep convection triggering).In the shortwave, the code is an extension to 6 bands of the initial 2-band code that is used in LMDz5A (Fouquart and Bonnel, 1980), as implemented in a previous version of the ECMWF numerical weather prediction model.In the longwave, LMDz uses the Rapid Radiation Transfer Model (RRTM) (Mlawer et al., 1997).
This version is now called 6A.
For the energy and water flux between land surface and atmosphere, LMDz can be coupled with the ORCHIDEE (ORganizing Carbon and Hydrology In Dynamic Ecosystems, version 9) (Krinner et al., 2005) terrestrial model or to a simple bulk parameterization of the surface water budget.
The reference configuration of LMDz5A used in CMIP4 had 39 eta-pressure layers and 96 × 96 grid-points, i.e. a horizontal resolution of 1.89 • in latitude and 3.75 • in longitude.Current reference simulations of IPSL-CM for CMIP6 use the new configuration of LMDz6A with a refined grid of 144 grid points both in latitude and longitude directions and a vertical resolution extended to 79 layers.The number of layers under 1 km has increased from 5 to 16 layers.The remaining additional layers are mostly located in the stratosphere so that in the lower stratosphere (between 100 and 10 hPa), the vertical spacing ∆z is approximately 1 km in this model setup.For the inverse system, LMDz is currently run in an offline version of configuration 5A with 39 eta-pressure layers and 96 × 96 grid-points, i.e. a horizontal resolution of 1.89 • in latitude and 3.75 • in longitude.

Description of the simulations
We have run the two versions of the physics described above, 5A and 6A, at several resolutions for years from 1988 to 2014.
A summary of the simulations used is given on Table 1.The identification number of the LMDz code used here (that contains both physics versions) is 2791.We discard the first 10 years (1988)(1989) to allow enough spin-up for the tracer simulations, considering the interhemispheric exchange time of about 1 year for passive tracers (Law Rachel M. et al., 2003).The dynamics is nudged towards the 6-hourly horizontal winds from the ECMWF reanalysis (Dee et al., 2011) with a relaxation time of 3 hours (Hourdin and Issartel, 2000).CO 2 , SF 6 and 222 Rn initial values are set uniformly for all model grid boxes respectively at a value of 350 µmol/mol (abbreviated as ppm), 1.95 pmol/mol (abbreviated as ppt) and 0 Bq/m 3 on 1 January 1988.350 ppm is the global mean given for that date by the forward simulation associated to the CAMS CO 2 inversion used here (see Section 2.3).1.95 ppt is the initial value used for SF 6 in the TransCom protocol of Denning et al. (1999).The initial value of 222 Rn does not matter given the short lifetime of this radionuclide.The time step of model output is hourly.Numerical approximations in the advection scheme and subgrid parameterizations prevent LMDz from strictly conserving mass.For CO 2 , for instance, the model loses about 1 GtC integrated over 10 years for the reference version and twice as much for the new version.We have therefore applied a global mass correction both on the CO 2 and the SF 6 3-dimensional mole fraction fields every hour.

2.3
Prescribed tracer fluxes at the surface CO 2 surface fluxes are prescribed every 3 hours from version 15r4 of the CO 2 atmospheric inversion product of the CAMS.
The 3-hourly resolution is allowed by prior information to the inversion system, while surface air sample measurements constrained the fluxes at weekly or coarser resolution (they also correct a mean day-night difference every day but this is marginal).of LMDz close, but not identical, to 5A-96L39.Fluxes from another atmospheric inversion could have been used instead, but recall that the most robust atmospheric inversions share the same surface measurements to a large extent, so that the question of the lack of independence of our CO 2 simulations to the surface measurements would remain anyway.Of interest here is the use of fossil fuel emissions from the Emission Database for Global Atmospheric Research version 4.2 (EDGAR, http://edgar.jrc.ec.europa.eu/)scaled to the annual global values of the Global Carbon Budget (Le Quéré et al., 2015).Details of the prescribed fluxes are given in (Chevallier, 2017).

The inversion system
For use at resolution 2.50 • ×1.30 • , the natural component of the optimized fluxes has been interpolated from its native 3.75 • ×1.90 • resolution, and has been completed by a fossil fuel component directly interpolated from the EDGAR native 0.1 • ×0.1 • resolution in order to avoid artificial smoothing.All grid changes here conserve mass.
Monthly averages of SF 6 emission fluxes at 1 • ×1 • are taken from the EDGAR 4.0 inventory for the period 1988-2008 as corrected by Levin et al. (2010).The global emissions steadily increased from 934 mmol/s in 1988 to 1599 mmol/s in 2010.
Since these sources are mostly in the northern hemisphere and since there are no sinks, SF 6 has been largely used to gain further insight into IH transport and STEs.We additionally prescribe 222 Rn surface fluxes according to Patra et al. (2011).
With its short lifetime (3.8 days), 222 Rn is used here to gain some insight into the vertical mixing within the column.

Model sampling strategy
For each species, the simulated concentration fields were sampled at the nearest grid point from observation location both horizontally and vertically.They were also sampled to the nearest hour from the time when the observations were taken.
Observations are all dry air mole fraction measurements calibrated relative to the CO 2 World Meteorological Organization (WMO) mole fraction scale.For comparison, the corresponding dry air variables in the model simulations are used.In Section 3, even though the model simulations are not compared to measurements, the model sampling still refers to some observation selection (in the afternoon for the zonal-mean profiles, or following a satellite retrieval pattern for the total column), as indicated in the corresponding text.
The simulated mole fractions of CO 2 were compared with some of the atmospheric surface measurements that were assimilated when optimizing the surface CO 2 fluxes prescribed here.The location of these assimilated surface stations is shown in Figure 1.As in the CAMS inverse modelling framework, we retain only early afternoon data (12:00-15:00 LST) for continuous stations under 1000 m.a.s.l and night-time data (00:00-3:00 LST) for continuous station above 1000 m.a.s.l.All measurements from flasks below 1000 m.a.s.l have been kept.The reasons behind this hour selection are the failure of transport models in general to accurately represent the accumulation of tracers near the surface at night (Geels et al., 2007) and the advection of air masses during the day by upslope winds over sunlit slopes in the afternoon (Geels et al., 2007).A description of the surface observations used for the inversion can be found in Chevallier (2017), but only a subset is used here.This subset comes from the obspack_co2_1_GLOBALVIEWplus_v3.2_2017-11-02archive (Cooperative Global Atmospheric Data Integration Project, 2017), from the World Data Center for Greenhouse Gases archive (https://ds.data.jma.go.jp/gmd/wdcgg/) and from the Réseau Atmosphérique de Mesure des Composés à Effet de Serre monitoring network (https://www.lsce.ipsl.fr/).We have selected the sites with more than 3 years of record and with enough data density in time to compute the statistics.
In addition, we use some unassimilated surface observations in the Tropics (bkt, cri, hkg, hko, lln, hat -note the lower letter case used here to denote unassimilated sites in the Chevallier (2017) inversion) to better evaluate the quality of the inversion over the tropics which are not well constrained.We sampled the model output at the elevation (above sea level) corresponding to the actual elevation of each site.hkg and hko only provide the daily mean ratio of CO 2 .

Vertical profile samples from aircraft measurements
We have compared the simulated CO 2 mole fractions against observations of CO 2 vertical profiles from three sampling programs: Comprehensive Observation Network for TRace gases by AIrLiner (CONTRAIL), the NOAA/ESRL Global Greenhouse Gas Reference Network Aircraft Program and the lower-tropospheric greenhouse gases sampling program over the Amazon described in Gatti et al. (2014).Aircraft measurements have not been assimilated in the CAMS inversion product and are therefore called independent in the following.CONTRAIL (Machida et al. (2008), http://www.cger.nies.go.jp/contrail/index.html)provides high-frequency CO 2 measurements over 43 airports worldwide and during commercial airflights between Japan and other countries.The calibration of the data is assured within 0.2 ppm (Machida et al., 2008).We selected from the CONTRAIL dataset all the CO 2 vertical profiles during the ascending and descending flights for the period 2006-2011 over the regions portraited in Figure 1.The regions are similar to Niwa et al. (2011) and have been chosen according to the number and location of the vertical profile samples.The number of hourly-mean measurements at 5.5 km per model grid box are shown in Figure 2  The NOAA/ESRL Global Greenhouse Gas Reference Network Aircraft Program consists here of measurements of air samples collected every few days or months at 22 aircraft profiling sites over continental North America (shown in blue in Figure 1) between altitudes 300 and 8000 m.a.s.l.In the lowest altitudes, compared to the CONTRAIL measurements that have been sampled nearby commercial airports, these measurements are not affected by local emissions.We performed statistics on 974 available vertical profiles.
The lower-tropospheric greenhouse base sampling program over the Amazon provides biweekly air sample profiles from above the forest canopy (300 m) to 4.4 km above sea level at 4 sites (san, tab, alf and rba) in 2010.The locations of the airborne platforms are shown in blue in Figure 1.During their descending flights, small aircrafts filled small flasks between 12:00 and 13:00 LST when the boundary layer is fully developed.Most of the samples are representative of air masses that have been blown away by the dominant easterly flow from the tropical Atlantic ocean across the Amazonian Basin.Air masses at sites tab and rba are mainly related to transport of source and sinks from a large fraction of the Amazonian forest.Air masses at For each station, the annual gradient to MLO is calculated by subtracting the annual mean of the CO 2 mole fraction at MLO (Mauna Loa, 19 • 52' N 155 • 58'W) to the annual mean from the smooth curve of the station of interest.Regarding the seasonal cycle, the amplitude is calculated from the smooth curve as an absolute peak-to-peak difference within a year at each site.Then, we average these yearly amplitudes over the period 1998-2014.The seasonal phase is evaluated using the Pearson coefficient between observed and simulated smooth curves.The synoptic curve is extracted at each site from the residual between the raw time series and the smooth curve.In order to plot the seasonal latitudinal gradient of CO 2 , we choose marine boundary layer sites : ZEP (Zeppelin, Ny-Alesund, Svalbard, Norway and Sweden), ICE (Storhofdi, Vestmannaeyjar, Iceland), SHM (Shemya Island, Alaska, USA), AZR (Terceira Island, Azores, Portugal), MID (Sand Island, Midway, USA), MNM (Minamitorishima, Japan), KUM (Cape Kumukahi, Hawaii, USA), GMI (Mariana Islands, Guam), CHR (Christmas Island, Republic of Kiribati), SMO (Tutuila, American Samoa), CGO (Cape Grim, Tasmania, Australia).
The synoptic variability is evaluated using two quantities: the Pearson correlation coefficient and the model-to-observations ratio of the standard deviation (Normalized Standard Deviation, NSD) between the observed and simulated residual time series.For each site, the diurnal amplitude is calculated from a residual time series between the raw time series of the CO 2 mole fraction and its daily mean.
For the airborne measurements from the NOAA/ESRL Global Greenhouse Gas Reference Network and from CONTRAIL, only the CO 2 samples taken in the afternoon (between 11:00 and 20:00 LST) have been retained.The resulting samples have been averaged into vertical bins of 1 km for each hour, before being averaged spatially for a given region of Figure 1 and monthly.For each subregion and each 1-km altitude bin, a detrended signal at 3.5 km has been subtracted to the time series.
Over the Amazon, a background time series has been subtracted to the simulated and observed vertical profiles through the same method described in Gatti et al. (2014).
3 General behaviour

Zonal mean structures
We first study the zonal-mean structure of the 222 Rn, SF 6 and CO 2 simulations.We focus on the boreal summer (JJA) as the convection is more active over Northern Hemisphere continents during this season and the spread among the versions is the largest.Figure 3(a) shows the vertical structure of the zonal-mean mole fraction of 222 Rn from 5A. 222 Rn is a short-lived radioactive tracer naturally emitted by continental surfaces that decays radioactively with a half time of 3.8 day.Its lifetime is comparable to that of mesoscale convective systems over the tropics (10 hours on average but it can reach 2-3 days (Houze, 2003)).For this reason, 222 Rn has been largely used by modellers to evaluate vertical transport operated by subgrid-scale processes in the PBL and low troposphere (Genthon and Armengaud, 1995;Belikov et al., 2013).The vertical profile, with a maximum at ground level and a decrease with increasing height, mainly reflects the transport by convective processes between 10 • N and 70 • N from the boundary layer to the tropopause.
Recalling Table 1, Figure 3( d) shows that the effect of the modified physics is a radon depletion with respect to 5A over the entire mid-troposphere above 7.5 km, between 30 • S and 80 • N. The largest relative depletion, of about half of the 222 Rn concentrations in 5A, occurs in the northern mid-latitude troposphere around 10 km.The lower concentrations of 222 Rn suggests that there is, on average, less convection penetrating into the upper troposphere in the new physics.However, the increase of 222 Rn at 2.5 km and the decrease at the surface manifests the thermal activity that transport tracers from the surface to the top of the boundary layer.The mean reduction in active convection over the continents shown by the 222 Rn mole fraction suggests that an effect of the stochastic triggering based on thermal activities is to prevent the triggering of spurious deep convection.This observation is consistent with previous findings that thermal activities reduce the strength of the deep convection (Rio et al., 2009;Locatelli et al., 2015a).The land-surface model (Figure 3 (g)), the horizontal resolution (Figure 3 (j)) and the vertical resolution (Figure 3 (m)) have a modest effect on the vertical structure of 222 Rn compared to the physics.
They enhance (land-surface) or attenuate (vertical resolution) the changes induced by the new physics in the northern midlatitudes.For instance, Figure 3 (m) shows a slight increase around 10 km (10% of the total concentration), meaning that more deep convection penetrates within the upper troposphere with a finer vertical resolution.
SF 6 is a quasi-inert gas released into the atmosphere by electrical and metal industries (Maiss et al., 1996).Because of its quasi-inert nature (lifetime over 1000 years (Ravishankara et al., 1993;Morris et al., 1995;Kovács et al., 2017)) and its weak seasonality, we use SF 6 to gain insight into the large-scale transport in our simulations.Figures 3 (f), (i), (l), (o) highlight the effects of the model setups described earlier on the zonal mean distribution of SF 6 .The modified subgrid-scale parameterization has much more impact on the zonal mean of SF 6 in the stratosphere than in the troposphere.The stratosphere is not as mixed as the troposphere, resulting in a longer exchange time scale and in an integration of the differences over time.The higher mole fraction of SF 6 means the air is younger, suggesting an accelerated Brewer Dobson circulation.The effect of the physical parameterizations on the STE fluxes has also been noticed by Hsu and Prather (2014), using two cycle versions (with two physics) of the ECMWF fields as an input to their offline transport model.The cause of this modified stratospheric dynamic is unclear and worth further investigation.Out of the stratosphere, differences between simulations are, Contrary to SF 6 , the zonal mean distribution of CO 2 exhibits a strong seasonality in the northern mid-latitudes.In boreal winter, the prevalence of the fossil fuel emissions along with stable boundary layer conditions contribute to increase CO 2 in the boundary layer.In boreal summer, the CO 2 sink by photosynthesis outweighs fossil fuel emissions and terrestrial sources (respiration, land-use), leading to a net drawdown of CO 2 mole fraction at the surface as seen in Figure 3 (b) beyond 50 • N. As a result, the effect of the physics has an opposite sign on the CO 2 distribution compared to SF 6 : a negative anomaly greater than 1.5 ppm in the PBL and a positive anomaly of 0.5 ppm around 10 km.The new physics amplifies the trapping of negative anomalies of CO 2 near the surface, consistent with a less efficient vertical transport.The land-surface and resolution have modest impact on the vertical repartition of CO 2 .

Simulated xCO 2 convolved with the OCO-2 space-time coverage
In a similar way to the zonal-mean distribution, we analyze the seasonal climatology of the column-average dry air mole fraction of CO 2 , denoted xCO 2 , convolved with the space-time coverage of NASA's retrievals of the Second Orbiting Carbon Observatory (OCO-2, Eldering et al. (2017)).We used all retrievals for the year 2017 from version 8r that are flagged as "good" by this algorithm (O'Dell et al., 2012).Recalling Table 1, Figure 4 shows that the physics has the strongest impact on the annual and seasonal climatology of xCO 2 fields.In boreal winter, the differences between the two physics exceed 0.5 ppm over tropical South America and tropical Southern Africa.In boreal summer, the differences are negative and exceed 0.3 ppm in terms of absolute value beyond 50 • N.This is due to the weaker vertical mixing of the new physics which limits IH exchanges: the negative anomalies of xCO 2 are more trapped into the northern hemisphere.Compared to the physics, the land surface scheme, the horizontal and vertical resolutions have a modest effect on xCO 2 with most differences less than 0.3 ppm.The values of 0.3 and 0.5 ppm mentioned here refer to, respectively, the threshold and breakthrough requirements for systematic errors in satellite retrievals as defined in the User Requirement Document of ESA's GreenHouse Gas Climate Change Initiative project (GHG-CCI, 2016).Comparing model performance to retrieval requirements is motivated by the same role that model and retrieval errors play in atmospheric inversions.In our case, 6% of 5% of the summer land grid points in terms of the differences between the two physics outreach the 0.5-ppm minimum requirement.
If the horizontal resolution has a modest effect on the xCO 2 values at large scale, its impact can be much larger at local scale and exceed 0.5 ppm in individual grid points.The impact of the horizontal resolution is particularly noticeable over Northern India while zooming over this region.In comparison, the effect of the land-surface scheme and of the vertical resolution are modest.A classical approach to evaluate the intensity of the IH exchanges is to plot the latitudinal distribution of the SF 6 mole fraction at the surface (Denning et al., 1999).The 5-year mean of the model-minus-observation mole fraction difference at the 11 background surface stations, in Figure 5, suggests that the IH exchanges are not sufficient in all versions as the gradient is systematically overestimated.The model spread has a value of 0.01 ppt for all latitudes, remaining smaller than the ensemble absolute bias of about 0.02 ppt.Both the ensemble spread and the ensemble bias remain usually smaller, by comparison, to the measurement calibration uncertainty of 0.03 ppt (96% confidence interval, NOAA ESRL GMD ( 2015)).The consistent negative difference of 0.01 ppt in the southern hemisphere induced by the new physics increases the surface latitudinal gradient and relates to the weaker vertical mixing.The vertical resolution cancels the effect of the physics by decreasing the latitudinal gradient and even improves it slightly.We also assess the ability of the different versions to represent unassimilated observations at surface sites located over the tropics.In the prescribed surface fluxes, the tropics represent 1.6 ± 0.9 P gC/a of the 4.3 P gC/a global total flux averaged for years 2004-2011.Despite its importance, the region is not well constrained by inverse modelling systems (Peylin et al., 2013).
Last, we briefly look at the quality of the model simulations between 5 and 6 km above sea level by comparison to aircraft measurements.Aircraft measurements will be more extensively used in Section 4.3 in terms of profiles.6A-144L79 1.6 (0.1) 0.4 (0.1) -2.8 (0.1) Table 2. Simulated mean gradients of CO2 mixing ratio between MLO and other stations located in the Northern Hemisphere (latitudes > 30 • N), the Tropics (30 • S≤ latitudes ≤ 30 • N), the Southern Hemisphere (latitudes < 30 • S).For each one of the three domains, the corresponding sites are weighted by the inverse of their standard deviation.The value inside the brackets defines the associated mean weighted standard deviation.

Annual surface gradient to MLO
The annual gradient between stations reflects both large scale transport and integrated fluxes over large areas.Table 2 shows the mean and standard deviation of the annual gradient of the stations in the Northern Hemisphere, in the Tropics and in the Southern Hemisphere, to MLO.On average over these latitudinal bands, the differences among simulations do not exceed 0.3 ppm and remain in the range of the measurement calibration objective defined by the WMO. 10 continental or coastal stations out of 65 assimilated surface sites (BRW, SHM, KAS, HUN, UTA, AMY, PAL, WLG, LEF, MHD) show differences larger than 0.3 ppm.
We performed the same analysis with a regional grouping of the stations, using the tiling of the globe in 22 regions defined by the TransCom 3 protocol (Gurney et al., 2002).The largest systematic difference among the simulations is found for region Boreal North America (0.4 ppm), where the standard mean deviation around the annual mean is roughly 0.3 ppm for each simulation.In this case, Boreal North America is only represented by the inland site BRW which may not be representative of the whole region.

Seasonal variability
The impact of the model setups on the seasonal cycle at each station is documented considering two characteristics: the phase and the amplitude.The ratio of modelled to simulated amplitude of the seasonal cycle is depicted for each station in the upper panel of Figure 6 while the phase is displayed in the lower panel of Figure 6.For comparison purpose, the amplitude and phase are plotted separately for two versions simultaneously.CO2 seasonal amplitude from 6A-96L39 (x axis) and 5A-96L39 (y axis) for all available stations.(f): Same than a but from 6A-96L39 (x axis) and 6AWOR-96L39 (y axis).(g): Same than (f) but from 6A-96L39 (x axis) and 6A-144L39 (y axis).(h): Same than (f) but from 6A-144L39 (y axis) and 6A-144L79 (x axis).The stations are numbered by increasing latitude (with the identifier correspondence given in the bottom of the panel) and are colored according to their category.Blue: maritime stations, black: mountainous stations, yellow: coastal station, brown: continental station.Stations written in lowercase (uppercase) refer to unassimilated (assimilated) stations.
Regarding the phase (bottom row), most station points are located close to the bissector.This means that the phase is well captured (correlation above 0.9) and is not much affected by the model setups for most of the assimilated stations, including for station PSA (ratio of 1.1 and correlation of 0.99 with 6A-96L39) that was not well simulated by a previous intermediate version 6A (see Section 2.1).However, the seasonal features of the unassimilated stations (lln, hat, dmv, hkg, hko, cri) appear to be much more sensitive to the model setups, especially to the resolution.Station DMV is not depicted here since the correlation 5 coefficient is less than 0.3 and the amplitude ranges between 0.3 and 0.6 depending on the model setup.The poor representation of the seasonal cycle of DVI has already been noticed in (Lin et al., 2017).They attributed this deficiency to inaccurate prior Net Ecosystem Exchange (NEE) and/or fire emissions in the prescribed surface fluxes as the CH 4 seasonal cycle was in better agreement with observations compared to the CO 2 simulated values in their model.This explanation is likely, given that the 16 region is poorly constrained by observations.Because of their strong sensitivity to the model setups, these stations should be associated with a strong error if there are assimilated in the inverse system, which explains why they have been discarded so far from the inversion system.The new physics increases the seasonal amplitude at (assimilated) mid-latitude sites over land: 9 stations over 26 have an amplitude shift larger than or equal to 0.2 ppm as a result of the convective inhibition.The horizontal resolution has an impact limited to only 3 assimilated stations, that show an amplitude shift larger than or equal to 0.2 ppm.This is due to a change of topography and land fraction map.The amplitude at most mountain stations ( 7) is underestimated by more than 0.1 ppm in all versions even though they have been assimilated.
Figure 7 depicts the seasonal-mean latitudinal structure of the CO 2 bias (modeled -observed) at marine surface sites and at 5.5 km in boreal winter (JFM) and in boreal summer (JAS).In winter, the model spread reaches a value larger than 0.5 ppm both at the surface and at 5.5 km.In summer, the model spread reaches a value of 1.5 ppm near the surface beyond 40 • N mainly due to the physics.Consistent with a less efficient mixing inferred in the zonal mean structure (Figure 3), the new physics increases (decreases) the latitudinal gradient in boreal mid-latitudes in summer at the surface (at 5.5 km) as the negative anomalies are more trapped in the boundary layer.For all simulations, the latitudinal gradient et 5.5 km between 50 • N and 40 • S is well reproduced as the bias does not exceed 0.5 ppm.  to the relative root mean square error (RMS).Unsurprisingly, the model-minus-observation mismatch is not as good as for the seasonal variability.Indeed, the synoptic scale has not been constrained by the inverse modelling system.In the reference version, most stations (58 out of 72) have correlations around 0.8 and a NSD around 0.7.The lack of synoptic variability in 5A-96L39 has been reported over Europe (Locatelli et al., 2013) and over Asia (Lin et al., 2017).All versions of the model have difficulties in accurately reproducing the synoptic variability at the mountain stations.The new physics enhances the standard deviation at some sites located in the northern mid-latitudes.The horizontal resolution has a mixed impact: It slightly increases the amplitude but increases or decreases the correlation coefficient depending on the sites.This can be attributed to the coarse resolution of the prescribed fluxes or to NWP forcing uncertainties.The synoptic variability is not affected by the land surface scheme nor by the vertical resolution.As for the seasonal variability, the improved horizontal resolution has a limited impact on the simulated synoptic variability to only 3 assimilated sites (KZM, CHL, HUN) in term of amplitude and 10 correlations with observations.All versions poorly simulate synoptic variability at site hko since the site is located in an urban area and is affected by local emissions not well described in the prescribed surface fluxes.

Diurnal cycle at the surface
The simulated CO 2 diurnal variation reflects the day-night contrast in both the prescribed fluxes and the PBL (Planetary Boundary Layer) vertical mixing.Since the fossil fuel emission inventory is here constant within a month, most of the diurnal variability comes from the prior biospheric fluxes, with marginal corrections having been brought by the inverse modelling system.Another part of the diurnal variability is induced by boundary layer processes: during night-time, CO 2 accumulates near the surface within the shallower stable boundary layer whereas during daytime, the low CO 2 concentration caused by the photosynthesis uptake is distributed over a deeper convective PBL.The daily-mean CO 2 mole fraction would be positive even when the integrated flux over the day is zero (Denning et al., 1995).This diurnal rectification highlights the importance of diurnal cycle representation, since its lack of realism might have repercussions on longer timescales.
Figure 9 shows the peak-to-peak amplitude of the CO 2 mole fractions for 8 sites with an amplitude greater than 1.5 ppm for the boreal summer months (JJA).Although similar conclusions can be drawn in boreal winter, we depict diurnal cycle characteristics only for the summer when the diurnal amplitude is the strongest.We can see that for most sites, version 5A underestimates the diurnal amplitude with the exception of AMY, in agreement with previous studies (Geels et al., 2007;Locatelli et al., 2015a).The new physics for a little majority increases the amplitude of the diurnal cycle, especially regarding the extremes.Locatelli et al. (2015a) in their supplementary showed that the Mellor and Yamada (1974) scheme strongly increases 222 Rn overnight compared to the Louis (1979) scheme used in the 5A version.Similar experiments with 222 Rn lead to the same conclusion (not shown).The strongest increase of amplitude (up to 10 ppm) is seen with a finer vertical resolution for the continental stations NGL and AMY.A possible explanation is that the CO 2 input from the surface is distributed within a thinner layer.The lower panel of Figure 9 shows boxplots of a measure of the phase of the diurnal cycle at the same sites in boreal summer for the CO 2 simulated mole fraction and the CO 2 prescribed fluxes.The measure of the phase is defined as the local time at the minimum CO 2 mole fraction.It typically happens in the afternoon after convection has ventilated the PBL and the photosynthesis activity has drained the CO 2 at the surface.In the GCM, the minimum value of the fluxes to the atmosphere seems to propagate to the sampling level within a few hours at each site.The new physics affects the amplitude without noticeably ameliorating the timing of the diurnal cycle.The timing at mountain site SNB is improved whereas it is deteriorated at site PAL (516 m).The other sites are not affected by the change of physics.In contrast, the horizontal resolution seems to have a positive effect both on the timing and the amplitude at coastal site MHD.All versions seem to underestimate the mean amplitude and shift the daytime minimum earlier at the mountain sites CMN and bkt compared to lower-latitude sites.Nonetheless, the amplitude is largely dependent on the sampling location and model level.Models typically show high amplitudes at model levels close to the surface and smaller amplitudes aloft (Law et al., 2008).In order to improve the representation of the diurnal cycle, it might be preferable to choose the level which better fits the observations.

Validation against independent measurements of vertical profiles of CO 2
Errors on CO 2 flux estimates by inverse modelling are thought to be proportional to the vertical mixing efficiency within a column (Stephens et al., 2007;Saito et al., 2013).If a model transports too much tracer from the boundary layer to the free atmosphere, the inverse system will compensate the induced tracer deficit at the surface by modulating the CO 2 fluxes.A mean of validating the flux estimates is to compare the simulated vertical profiles with independent (unassimilated) observations of vertical profiles (Pickett-Heaps et al., 2011).Since only surface measurements have been assimilated, the vertical gradient mainly reflects intrinsic mixing efficiency within the column.In this section, we evaluate the simulated vertical profile against independent aircraft measurements over several regions: Europe, North America, Brazil, East Asia, Greater Northern India,

5
Northern Southeast Asia at the annual and seasonal scales.The benefit of using the newly developed version is also assessed over these regions.

North America and Europe
Over North America, the surface flux pattern has a strong seasonality.In winter, positive fluxes to the atmosphere driven by fossil fuel emissions are mainly located along the East coast whereas in summer, the strongest sink is located over the mid-West 10 states.Because of a large net ecosystem production (NEP) of organic carbon during the crop plant growth, the mid-West states can contribute to half of the summer uptake in North America (Crevoisier et al., 2010;Sweeney et al., 2015).CO 2 fluxes over North America are relatively well constrained by surface observations as seen in Figure 1. Figure 10 shows the seasonal and annual climatologies of the CO 2 mole fraction bias (model-observations) on average over all the North American airborne platforms depicted in Figure 1.On the whole, the simulated value in the lowest level is overestimated by about 0.5 ppm on an annual basis and by about 1 ppm in winter.This behaviour is seen both for profile sites close to assimilated stations (ESP, LEF, THD, SGP) and for profile sites further away (not shown).In parallel, the profile above 2 km is well simulated except in summer when the bias is about 0.5 ppm.This leads to an overestimated vertical gradient between 1 and 3 km in winter.In the inversion system, the overestimated winter gradient would artificially decrease the estimated fluxes to the atmosphere.The model spread does not exceed 0.5 ppm throughout the year except in summer when it reaches a value of 1.5 ppm at 1.5 km and 1 ppm at ground level.It only explains a small share of the variability (standard deviation) of the misfits (about 1-1.5 ppm).This misfit variability is comparable among the model versions.The difference between the two physics is responsible for a large portion of the model spread.This can be explained, in part, by the fact that the air mass composition is more influenced by local processes during the summer than at any time of the year.At each site, westerly wind flow prevails throughout the year in the entire free troposphere.As the air masses move across the continent, they progressively mix with air coming from the biosphere and from fossil emissions.In summer, the decrease of the wind speed over the mid-continent and over the East coast results in less homogeneous vertical profiles in the free troposphere (Sweeney et al., 2015).Combined with an enhanced convection, this effect might emphasize the divergence between the two physics.
The convective inhibition (Figure 3) as a result of the new physics translates into a lower concentration of 1 ppm at 1.5 km and a higher CO 2 concentration of 0.6 ppm in the mid-troposphere as the trapping of negative anomalies of the CO 2 mole fraction within the PBL is enhanced.The CO 2 depletion around 1.5 km induced by the new physics may be due to the vertical transport of negative anomalies by the thermal activity.Combined with the new physics, the land surface scheme also has a strong impact on the summer vertical profile as the amount of water vapour and temperature directly influence the vertical mixing through surface buoyancy.By inhibiting deep convection, it increases the upper troposphere concentration by 0.5 ppm and decreases the surface concentration by 0.5 ppm.The effect of the resolution is modest here.
The figure for Europe (EUR, Figure 11) shows similar features than for North America, but with smaller values (absolute biases, standard deviations, model spread), except for the standard deviations of the misfits in the lower atmosphere that are about 50% larger.

Indo-Pacific region
Figure 12 presents the profile misfit statistics for the CONTRAIL CO 2 data over Eastern Asia, Northern Southern Asia, and Greater Northern India.They mostly have the same shape: a negative bias close to the surface (up to -8 ppm for Greater Northern India in OND) and a null one above.The decreasing misfit standard deviations with height and the small model spread under 1 ppm are similar to EUR, except for Greater Northern India in the lower atmosphere, where the model spread reaches 2 ppm (up to 4 ppm) at the seasonal scale, in particular at the end of the monsoon season (OND).
In NSA and IND, the negative bias at annual scale within the boundary layer is likely related to urban sources, close to the airports for these commercial flights.The negative bias was also noticed in NSA and in IND for OND in the study of Lin et al. (2017).We also note that the prescribed surface fluxes have not been well constrained for IND and NSA.For NSA, and for the whole year.In order to highlight the differences in profile shape, the annual mean of the bias at 3.5 km has been removed for each simulated vertical profile (5A-96L39: -2.0 ppm, 6A-96L39: -2.0 ppm, 6AWOR-96L39: -2.0 ppm, 6A-144L39 and 6A-144L79: 1.3 ppm).masses moving through the Arabian desert and North Africa in winter and those coming from Southeast Asia in summer (Suresh Babu et al., 2011;Lin et al., 2015).The impact of the model setups reaches 3 ppm in this region and during AMJ and OND, two intermediary seasons.Special care should be taken when assimilating new stations in this area.Further to this lack of measurement constrains, the prescribed flux variability in NSA and IND mainly reflects the prior flux variability, while in EAS, fluxes are more robust (Thompson et al., 2016) and the misfits appear comparable to EUR there.

Amazon basin
The CO 2 surface fluxes over the Amazon basin have not been directly constrained by observations and mainly reflect the variations of the prior flux used in the inverse system.The two closest assimilated stations are located along the Atlantic coast (Figure 1).They are representative of the air masses coming off the tropical Atlantic ocean through the tropical easterly winds (Gatti et al., 2014).Moreover, the assimilation of additional surface and airborne observations has not enabled to improve the variability of the CO 2 fluxes so far, at least with this inversion system (Molina et al., 2015).Molina et al. (2015) concluded, through several experiments with both global and regional models, that this limitation mainly stems from model transport errors and uncertainties on biospheric and fire burning emissions.In this context, we evaluate the sensitivity of the simulated CO 2 concentrations to model setups at the four airborne stations featured in Figure 1: tab, rba, alf, san.The simulated and observed CO 2 vertical profile averaged for the wet period (January-June) and dry period (July-October) in 2010 are depicted in Figure 13.All versions poorly represent the shape of the mean observed CO 2 vertical profiles in the lower troposphere.
The mismatch is particularly amplified during the dry season.The vertical gradients of the reference 5A-96L39 and of the observations between 1 km and 3 km have opposite signs, suggesting issues in the prior fluxes (NEE or/and fire emissions).
The simulated profile is also very sensitive to the subgrid scale parameterizations for each site, and, to a lesser extent, to the land surface model.At the surface, the differences between the two physics ranges from 2 ppm at san in the dry season to 6 ppm at tab during the wet period.The other setups have a modest impact compared to the physics.
The CO 2 vertical profiles suggest a more mixed lower and mid troposphere with the new physics.In order to visualize the behaviour of the two physics, we additionally calculate the corresponding simulated 222 Rn profiles with the same sampling strategy, even though we do not have any observations to compare them with.The lower panel of Figure 13 shows that less radon is transported above 5 km, suggesting a less dominant role of deep convection.This is confirmed while comparing the simulated mean precipitation during the wet and dry period with reference data from NASA's Global Precipitation Climatology Project (Figure 14).In the Tropics, precipitation is an indicator of the convective activity and we see here that the new physics decreases the mean precipitation (mainly convective) during both periods without showing better agreement with the reference data.The modelling of the precipitation in this region has been shown to be particularly challenging (Lintner et al., 2017).The simulated radon profiles suggest that more radon is detrained above the boundary layers by the thermals in the new physics, especially during the dry season.The strengthening of the thermals when the deep convective scheme is inhibited is a known behavior of the new physics (Rio and Hourdin, 2008).As a result, the boundary layer of the new physics is more mixed and goes higher.
The lack of realism of the simulated transport does not impact the CO 2 fluxes estimated by inverse modelling in this region, as they mostly rely on the prior fluxes up to now.However, it limits the potential benefit of assimilating new surface observations there, in line with Molina et al. (2015).

Conclusion
We have compared two reference versions of a GCM, LMDz, that have been prepared for, respectively, CMIP4 and CMIP6, from the point of view of the transport of tracers.The more recent version benefits from more elaborated radiative scheme and subgrid scale parameterizations, in addition to a refined vertical resolution.The main changes on the physical parameterizations concern boundary layer mixing due to vertical diffusion (Mellor and Yamada, 1974), shallow convection (Rio and Hourdin, 2008;Rio et al., 2009), thermodynamic effects of ice, cool pools (Grandpeix et al., 2009) and convective triggering and closure assumptions (Rochetin et al., 2013).These main changes have been accompanied over the years by other evolutions of the model physics, by continuous tuning (Hourdin et al., 2017), and by continuous technical changes (including bug introduction and bug fixes) that have diverse impacts.Within this flow of modifications from a large developer group, our evaluation of the two versions is based on a snapshot of the LMDz code in its release 2791, a few months before the start of the CMIP6 simulations.
We performed a set of CO 2 , SF 6 and 222 Rn simulations using those two versions of LMDz at two horizontal resolutions and guided by the ECMWF wind reanalysis for nearly two decades (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014).In addition, we compared two simulations with two different land surface schemes, one using the ORCHIDEE terrestrial surface model and the second using a simplified bulk scheme.In this case, the land surface scheme only controls the heat and latent fluxes at the land-atmosphere interface.The SF 6 and 222 Rn emissions were prescribed following the TransCom 3 protocol.The CO 2 surface fluxes have been optimized beforehand by the assimilation of surface observations in a version of LMDz close to the older model version studied here.We have compared the resulting ensemble of simulations with both assimilated and unassimilated CO 2 observations from a large dataset in different parts of the globe.This study enabled us to benchmark the effects of the resolution, land-surface scheme and sub-grid scale parameterizations on CO 2 simulated values, which is a fundamental step before implementing the recent developments in our inverse modelling system.At the surface, the comparison with the assimilated CO 2 measurements showed that the land-surface scheme and the vertical resolution have a limited impact compared to the horizontal resolution and sub-grid scale parameterizations.The new physics tends to weaken the vertical mixing within the column over continental areas.The annual mean mole fraction values are little modified but the variability at seasonal, synoptic, diurnal scales is enhanced at continental and coastal sites.The higher seasonal cycle in the northern hemisphere, as a result of a less efficient vertical mixing, affects the latitudinal CO 2 gradient in boreal summer by about 1 ppm, a value that should impact the geographical distribution of the CO 2 surface fluxes estimated by inverse modelling.At synoptic scale, the higher variance does not lead to an improved correlation.As for the diurnal cycle, even though the amplitude shows better agreement with the observations, the phasing is not improved by the model setups at most CO 2 monitoring stations, but it heavily relies on the prior fluxes used in the inversion system.Even though the improved amplitude is promising for assimilating a larger fraction of hourly data at continental surface stations, further efforts should be made on the prior biospheric fluxes and on sub-grid scale parameterizations to better simulate the diurnal cycle.The atmospheric transport at mountain stations is still poorly captured by all versions even when considering the refined vertical and horizontal resolutions.This may mean that the resolution is still too coarse to accurately reproduce the atmospheric flow The assimilation of column-average mole fraction retrievals from satellites like OCO-2 offers a promising perspective for atmospheric inversion because their spatial density joined to their vertical integration reduce the impact of transport errors (Basu et al., 2018).From that perspective, we quantified the impact of the model setups on the simulated xCO 2 convolved with the OCO-2 space-time coverage for a given year.The model-ensemble spread is mainly due to the physics and exceeds 0.5 ppm in the boreal summer high latitudes, in Northern Africa and in Brazil, or locally around emission hot-spots.In boreal summer, the new physics decreases the latitudinal gradient by decreasing the xCO 2 values in the high latitudes further to a less efficient vertical mixing.This may decrease the northern sink inferred by inverse modelling with the satellite data and LMDz.In austral summer, the mean xCO 2 shows large discrepancies (up to 1 ppm) over the Amazonian basin between the simulations, this region being particularly sensitive to the parameterization assumptions.As for the surface fields, the xCO 2 fields are sensible to the horizontal resolutions around emission hot-spots.
The comparison with unassimilated airborne measurements enabled to assess the quality of the inversion as well as the sensitivity of the vertical profiles of CO 2 mixing ratio to the model setups.The results show that the accuracy of the simulated CO 2 vertical profiles as well as their sensitivity to parameterizations depend on the region of interest and the season.Profiles in regions well covered by observations such as Europe, Eastern Asia and North America tend to be better captured than in regions poorly constrained (Greater Northern India, Northern Southeast Asia, Brazil).The optimized fluxes reflect mainly the prior fluxes in these regions nearly devoid of assimilated data.Over the Amazonian basin, the present study indicates that the vertical profile uncertainty mainly comes from the physical parameterizations and, to a less extent from the land surface model, with a model spread reaching 6 ppm in the boundary layer.Here again, a finer resolution does not noticeably modify the vertical profile shape.This does of course underline the large uncertainties associated with the optimized fluxes and the difficulties in assimilating new observations over these regions, confirming the findings of Molina et al. (2015).Given the leading role of the Amazonian basin on the global carbon cycle, it appears important to improve the realism of the vertical mixing over this region.For example, radon profile samples operated by airborne campaign could help the modelling community to improve convective parameterizations at specific sites.
In terms of CPU time, the most advanced version tested here (6A-144L79) is about twenty times more expensive than the reference version (5A-96L39), due to refined spatial and temporal resolutions, and to more sophisticated sub-models.If adapted to the off-line configuration that is used in the atmospheric inversion system, it would be at least five times more expensive than the current version due to the refined horizontal and vertical grids, but the time step may also have to be reduced for the whole code and even much more within an off-line version of the new thermal plume model.It will be possible to distribute the computational load on a large number of processing units with the new icosahedral dynamical core of LMDz, when it has been coupled to the LMDz physical package and then adapted to the off-line model (Dubos et al., 2015).In the mean time, we may wonder if the benefit of the new version for CO 2 atmospheric inversion counterbalances its numerical heaviness.To address this question, the sensitivity of the CO 2 surface values to the model setups gives some insights into their impact on the inferred surface fluxes.On a seasonal basis, the updated physics would likely decrease the northern sink in boreal summer as a result of a weaker vertical mixing within the column.However, the robustness of the simulated surface concentration gradients to MLO suggests that, on an annual basis, the large-scale surface fluxes inferred from surface measurements using an updated version should remain the same, meaning that the increased boreal summer uptake would be compensated in the rest of the year.Further, LMDz versions developed in the last few months, after the ones tested here, appear to strengthen vertical mixing within the column again (results not shown).In this context, only the horizontal resolution is expected to bring some improvement on the estimated natural fluxes depending on the quality of the prior fossil fuel emission inventory.
However, when assimilating satellite observations, annual-mean flux estimates in the high latitudes should change because of the interaction between the changed flux seasonal cycle and the seasonally-varying satellite sampling (Byrne et al., 2017).The improved vertical resolution from 39 to 79 layers has a marginal impact on the simulated CO 2 values, a situation which is different from the previous change from 19 layers to 39 layers, that had a major benefit in the inversion system (Locatelli et al., 2015b).
Even in the cases when the model setups have significant impact, our experiments, that are classical in the TransCom community, did not much allow us to demonstrate the superiority of one version versus another.All versions appear to represent valid transport modelling options (at least with the current data selection in the inversion system) and the motivation to implement the most sophisticated one in the inversion system would mainly come from the wish not to diverge from the LMDz core technical and scientific developments.This situation is paradoxical given the major improvements brought to LMDz for the representation of meteorology and climate, the benefit of which on other variables than tracer concentrations can be seen even when horizontal winds are nudged, like here (Hourdin et al., 2015).However, we may miss some measurement programs dedicated to the transport of tracers in the column.Observations of mixing boundary layer heights from radiosondes, ceilometers or satellites may also give some insight into the model quality, as well as the comparison with some highly detailed models (e.g., Randall et al., 2003).
assimilated the surface measurements for the period 1979-2015 in an off-line version of LMDz5A at horizontal resolution 3.75 • ×1.90 • (longitude × latitude).As a consequence, the surface fluxes carry some imprint of a version Geosci.Model Dev.Discuss., https://doi.org/10.5194/gmd-2018-164Manuscript under review for journal Geosci.Model Dev. Discussion started: 5 July 2018 c Author(s) 2018.CC BY 4.0 License.

Figure 1 .
Figure 1.CO2 sampling locations.Red dots denote the subset of the assimilated site locations that are used here.Yellow dots denote unassimilated site locations.Blue dots denote independent aircraft measurement locations in America (other aircraft sites for the rest of the world are shown in Figure 3).Specific areas for our study are shown in red: Europe (EUR: 40-70 • N, 10 • W-50 • E), Greater Northern India (IND: 20-30 • N, 70-100 • E), East Asia (EAS: 20-50 • N, 100-150 • E), Northern Southeast Asia (NSA: 10-20 • N, 90-160 • E).Stations RPB and ASC, in black even though they have been assimilated, are the NOAA tropical Atlantic sites used to define the background concentrations of CO2 and SF6 coming into the Amazon basin.

Figure 2 .
Figure 2. Number of CONTRAIL measurements used here at 5.5 km above sea level, within the model grid boxes (3.75 • × 1.90 • ).The specific areas of Figure 1 are also shown.Prior to the calculation of this number, the measurements have been averaged hourly in each grid box.

Figure 4 .
Figure 4. Map of the differences in xCO2 (ppm) between 6A-96L39 and 5A-96L39 (top, effect of the new physics), 6A-96-L39 and 6AWOR-96L39 (second row, effect of the land surface), 6A-144L39 and 6A-96L39 (third row, effect of the horizontal resolution),6A-144L79 and 6A-144L39 (last row, effect of the vertical resolution).The left column shows the average over the 2005-2010 boreal summers (June-August) and the right column shows the average over 2005-2010 boreal winters (December-February).The simulated xCO2 values have been temporally convolved with the sampling of the OCO-2 satellite retrievals for the year 2017.

Figure 7 .
Figure 7. Latitudinal mean distribution of the CO2 bias (modeled -observed) between 5 and 6 km above sea level in the free troposphere (upper) and at the marine boundary layer (MBL) sites (lower) for January-February-March (JFM) (left) and July-August-September (JAS) (right) during the period 2007-2010.The MBL sites are ZEP, ICE, SHM, AZR, MID, MNM, KUM, GMI, SMO, CGO.The 5-6 km measurements come from the CONTRAIL database.
variability at the surface The synoptic variability characteristics, Normalized Standard Deviation (NSD) and correlations with observations, are depicted for each station on a Taylor Diagram in Figure 8. NSD refers to the ratio of the simulated to observed standard deviation.Consistent with the design of Taylor diagrams, the distance between an actual model result and the reference (the star) is equal Geosci.Model Dev.Discuss., https://doi.org/10.5194/gmd-2018-164Manuscript under review for journal Geosci.Model Dev. Discussion started: 5 July 2018 c Author(s) 2018.CC BY 4.0 License.

Figure 8 .
Figure 8.Taylor diagrams showing correlations and normalized standard deviations (NSD: the ratio of the simulated to observed standard deviation) between the simulated and observed CO2 synoptic variability for all surface stations.The stations are numbered and coloured as in Figure 6.

Figure 9 .
Figure 9. Top: Boxplots of the peak-to-peak amplitude (maximum concentration minus minimum concentration) of the mean diurnal cycle for July-September for observed (grey) and modelled (colors) CO2 for each model simulation during the years 2011-2012.The diurnal amplitude is calculated from the residual between the raw data and the daily mean.The sites are listed on the x axis.Bottom: Boxplots of time of minimum crossing for each model.The time for the prescribed CO2 are displayed for both horizontal resolutions in yellow (96 × 95)and purple (144 × 143).Here are depicted only the sites with a diurnal amplitude greater than 1 ppm.The code color for stations is the same as previously.

Figure 10 .
Figure 10.Bias (model-observations, thick lines) and standard deviation (shaded areas) for the monthly CO2 vertical profile misfits over North America during the period 2008-2014.The data have first been averaged in 1-km-altitude-bins per hour and per site, before being averaged among the 12 North American sites of Figure 1 per month.The statistics are drawn from that ensemble of monthly and spatiallyaveraged values.They are shown for each season (January-March, JFM; April-June, AMJ; July-September, JAS; October-December, OND)

Figure 11 .
Figure 11.Same as for Figure 10 but over Europe from the CONTRAIL dataset during the period 2006-2011.The domain is portraited in Figure 2.

Figure 13 .
Figure13.Top: Mean difference between CO2 profiles measured and simulated in 2010 at the four Amazonian aircraft sampling sites and an oceanic CO2 background (that is, ∆ CO2) during the dry (left of each panel) and wet (right of each panel) seasons, respectively (solid lines) and the standard deviation divided by the square root of number of profiles (dashed lines and error bars).The background is estimated from in situ measurements at monitoring stations ASC and RPB, as described in the main text.Bottom: Same as the top but for the 222 Rn (ppm).The dry season (red lines) is affected by fires at most sites and is here defined as period July-October for illustrative purposes only; it does not correspond to all months within the fire season.

Figure 14 .
Figure14.Observed and simulated mean precipitation (mm/day) during the wet and dry seasons over each Amazonian sampling site (tab, rba, san, alf).The black dots depict monthly mean precipitations derived from NASA's Global Precipitation Climatology Project.
Geosci.Model Dev.Discuss., https://doi.org/10.5194/gmd-2018-164Manuscript under review for journal Geosci.Model Dev. Discussion started: 5 July 2018 c Author(s) 2018.CC BY 4.0 License.around complex topography.The annual-mean latitudinal gradient of SF 6 is still slightly too strong in all model versions, likely reflecting insufficient IH exchanges.

Table 1 .
Description of the simulations.