Description and evaluation of aerosol in UKESM1 and HadGEM3-GC3.1 CMIP6 historical simulations

. We document and evaluate the aerosol schemes as implemented in the physical and Earth system models, the Global Coupled 3.1 conﬁguration of the Hadley Centre Global Environment Model version 3 (HadGEM3-GC3.1) and the United Kingdom Earth System Model (UKESM1), which are contributing to the sixth Coupled Model Inter-comparison Project (CMIP6). The simulation of aerosols in the present-day period of the historical ensemble of these models is evaluated against a range of observations. Up-dates to the aerosol microphysics scheme are documented as well as differences in the aerosol representation between the physical and Earth system conﬁgurations. The additional Earth system interactions included in UKESM1 lead to differences in the emissions of natural aerosol sources such as dimethyl sulﬁde, mineral dust and organic aerosol and subsequent evolution of these species in the model. UKESM1 also includes a stratospheric–tropospheric chemistry scheme which is fully coupled to the aerosol scheme, while GC3.1 employs a simpliﬁed aerosol chemistry mechanism driven by prescribed monthly climatologies of the relevant oxidants. Overall, the simulated speciated aerosol mass concentrations compare reasonably well with observations. Both models capture the negative trend in sulfate aerosol concentrations over Europe and the eastern United States of America (US) although the models tend to underestimate sulfate concentrations in both regions. Interactive emissions of biogenic volatile organic compounds in UKESM1 lead to an improved agreement of organic aerosol over the US. Simulated dust

potentially impacting our ability to accurately simulate historical and future climate change (Shindell et al., 2013; of the aerosol absorption. Subsequent developments by Mulcahy et al. (2018) implemented in GC3.1 reduced the aerosol effective radiative forcing by approximately 50% but an in-depth evaluation of the underlying aerosol properties in this model was not conducted as part of that study.
In this paper we document the GLOMAP aerosol scheme as implemented in UKESM1 and GC3.1 for CMIP6 and highlight differences in the aerosol representation between the two models. In particular, the additional Earth system processes included 5 in UKESM1 will lead to fundamental differences in aerosol sources, evolution and sinks that we characterize and evaluate where possible. Subsequent impacts on the aerosol forcing will be explored further in a companion paper. The present-day aerosol climatology is evaluated against observations using the coupled historical simulations conducted for CMIP6. While many process-based evaluations utilize nudged simulations, where the model's meteorology is relaxed to reanalysis data, freerunning coupled climate simulations enable feedbacks to more fully evolve due to a consistent treatment of the dynamical 10 physical climate, biogeochemical (in the full ESM) and composition states. Evaluation of these simulations is important in establishing confidence in the predictive skill of aerosols and their feedbacks in historical and future climate simulations. This study therefore provides an assessment of the suitability of this model for wider aerosol-climate studies being conducted as part of CMIP6.
The paper is outlined as follows: Section 2 describes the host model configurations and the GLOMAP and mineral dust 15 aerosol schemes including science updates implemented in GLOMAP since Mann et al. (2012) and Mann et al. (2010). Section 3 outlines the model simulations used in this study. Section 4 describes the observations used in the evaluation of the aerosol properties. A detailed evaluation of the tropospheric aerosol properties is presented in Section 5 followed by a discussion in Section 6. In this study we evaluate the simulation of aerosols in the CMIP6 historical integrations carried out by the UK community using two global models, the physical climate model HadGEM3-GC3.1 (hereafter referred to as GC3.1) and its Earth system counterpart, UKESM1. GC3.1 is a global coupled atmosphere-ocean-ice model and details of its components and coupling are described and evaluated at length in Kuhlbrodt et al. (2018); Williams et al. (2017). In brief, GC3.1 is comprised of the 25 Global Atmosphere 7.1 (GA7.1) atmosphere configuration of the Unified Model (UM) (Walters et al., 2019;Mulcahy et al., 2018), NEMO ocean model (Storkey et al., 2018), CICE sea-ice model (Ridley et al., 2018) and JULES land surface model (Best et al., 2011). Here we use the low resolution version of GC3.1 (Kuhlbrodt et al., 2018) which has a horizontal resolution of approximately 135 km in the atmosphere and 1 • in the ocean. In the vertical, the atmosphere consists of 85 levels with a model lid at 85 km above sea level with 50 of these levels below 18 km and 75 vertical levels in the ocean. 30 The GA7.1 model is described in detail by Walters et al. (2019) but we briefly document some of the key parameterizations of relevance for the composition and distribution of aerosols in the model. Large scale advection is modelled using the ENDGame (Even Newer Dynamics for general atmospheric modelling of the environment) dynamical core (Wood et al., 2014). ENDGame is a non-hydrostatic, semi-implicit, semi-langrangian deep atmosphere model on a regular latitude-longitude grid. Hermite cubic vertical interpolation is used for the advection of moist prognostic variables while an improved version of the Priestley (1993) conservation scheme has been implemented for a consistent treatment of the moisture and atmospheric composition tracers. Large-scale precipitation is modelled using a single-moment scheme based on Wilson and Ballard (1999) and includes an improved treatment of drizzle rates (Abel and Shipway (2007)). A prognostic treatment of rain allows the 3-dimensional 5 advection of precipitation. The introduction of the latter required modifications to be made to the aerosol wet scavenging processes described in Mann et al. (2010) and is described in more detail in Section 2.6. The warm rain microphysics has undergone significant development following Boutle and Abel (2012) and Boutle et al. (2014). The Khairoutdinov and Kogan (2000) scheme for autoconversion and accretion replaces the scheme of Tripoli and Cotton (1980) that was used prior to GA7 and has been found to significantly improve the representation of stratocumulus clouds (Boutle and Abel, 2012) and is 10 expected to simulate more realistic aerosol-cloud-precipitation feedbacks (Boutle and Abel, 2012;Wilkinson et al., 2013;Hill et al., 2015). Large-scale cloud uses the prognostic cloud fraction and prognostic condensate (PC2) scheme (Wilson et al., 2008a, b) with modifications described in Morcrette (2012). The atmospheric boundary layer is modelled with the turbulence closure scheme of Lock et al. (2000) with modifications described in Lock (2001) and Brown et al. (2008). Convection is based on mass flux scheme of Gregory and Rowntree (1990) with various extensions to include down-draughts (Gregory and Allen, atmospheric timestep of the model physics is 20 minutes, due to the inherent computational cost of the chemistry and aerosol components, both components are called once per hour.

Aerosol scheme: GLOMAP-mode
GLOMAP is a 2-moment modal aerosol microphysics scheme simulating speciated aerosol mass and number across 5 lognormal size modes. The basic aerosol model is described in detail in Mann et al. (2010) and Mann et al. (2012). Here we briefly 5 describe the main components of the model and document in detail any updates to the previous documentation.
The configuration described and evaluated here simulates the sources, evolution and sinks of four aerosol species: sulphate (SO 4 ), black carbon (BC), organic matter (OM) and sea salt. The mineral dust scheme is described in Section 2.7. Aerosol size modes respresented are the nucleation (geometric mean dry radius, r < 5 nm), Aitken (5 < r < 50 nm), accummulation (50 < r < 500 nm) and coarse (r > 500 nm) soluble modes and an Aitken insoluble mode (see Table 1). A fixed geometric 10 standard deviation, σ g , for each mode is assumed. Table 1  The aerosol composition within each mode and the mean radius of each mode is simulated according to the microphysical processes represented in the model. These include condensation of gas-phase sulphuric acid (H 2 SO 4 ) and a condensible 15 secondary organic vapour (SEC_ORG) onto pre-existing particles, aerosol coagulation within and between modes and cloud processing. Mode-merging to the next largest mode occurs when the particle geometric diameter exceeds the permitted size (see Table 1) in any given mode. New particle formation from the binary homogeneous nucleation of H 2 SO 4 and water follows Vehkamäki et al. (2002) and occurs mainly in the free troposphere. Additional nucleation of new particles in the boundary layer is not yet included. The chemical components are treated as internal mixtures within the aerosol modes. This allows for a more 20 accurate determination of the aerosol optical properties as outlined in Section 2.9.
GLOMAP includes a prognostic treatment of sea-salt and secondary organic aerosol (SOA). Sea-salt emissions are described in Section 2.4.1. SOA is produced from the gas-phase oxidation of land-based monoterpene sources by OH, NO 3 and O 3 . The molar yield of SOA from these reactions was increased from the 13% used in Mann et al. (2010) and Mann et al. (2012) to 26% in GA7 (Walters et al., 2019). This increase accounts for a wide range of uncertainty in the representation of biogenic SOA 25 Scott et al., 2014), including the large range in the observed yield of SOA; uncertainty in the emissions of precursor gases (Biogenic volatile organic compounds, BVOCs), a lack of anthropogenic and marine VOC sources as well as no contribution from isoprene in the current configuration. While BC and anthropogenic OM are emitted as insoluble species they can subsequently undergo ageing into the soluble modes by coagulation or condensation after being coated by 10 monolayers of soluble material. It is worth noting that the Aitken insoluble mode does permit particles with geometric mean dry radii greater 30 than 50 nm. This is a result of the emitted particles from biofuel and biomass sources having emission radii of 75 nm (Stier et al., 2005). These particles will subsequently remain in this mode until they have been coated by the required 10 monolayers of soluble material. Mineral dust is simulated in UKESM1 and GC3.1 but not in the modal framework as the development of Table 1. Properties of the aerosol size distribution in GLOMAP including the permitted size range of the aerosol modes, their geometric standard deviation, σg, and aerosol species contributing to each mode. Species represented are sulphate, black carbon, organic matter and sea salt.
UKCA which are also all emitted at the surface. As in Mann et al. (2010), we assume 2.5% of the anthropogenic SO 2 emissions are emitted as primary sulfate particles with an emission size distribution specified according to Stier et al. (2005).

Natural aerosol emissions
One of the main differences between the aerosol configurations of UKESM1 and GC3.1 lies in their treatment of natural aerosol. The fully-coupled UKESM1 model, incorporating as it does a dynamic vegetation and ocean biogeochemical model, 5 interactively simulates emissions of marine DMS, BVOCs and primary marine organic aerosol (PMOA) .
Changes in land and ocean ecosystems have the potential to influence these natural emissions and so coupling these emissions in UKESM1 enables additional ES-aerosol climate feedbacks to be simulated. In contrast, GC3.1 either prescribes these emissions based on fixed, present-day observation-based climatologies (DMS and BVOCs) or does not include the aerosol source at all (PMOA). These natural emissions are described in more detail below. Emissions of SO 2 from continuously degassing 10 volcanoes are prescribed in both models using the present-day three-dimensional climatology of Dentener et al. (2006). This is a temporally fixed data set with no seasonal variation.

Sea Salt
Primary emissions of sea-salt aerosols are calculated using the bin-resolved, windspeed-dependent flux parameterization of Gong (2003). The emitted sea-salt is mapped to the accumulation and coarse soluble modes depending on whether the bin 15 radius mid-point is below or above the upper limit of the accumulation mode size range (around 250 nm). The treatment of sea-salt emissions is the same in UKESM1 and GC3.1 apart from the specified sea-salt density which has been increased in UKESM1 from 1600 kg m −3 to 2165 kg m −3 . The smaller density value represents a hydrated salt particle (Schulz et al., 2006) however given GLOMAP treats the aerosol water content independently it is more accurate to use the actual dry salt density.

20
In UKESM1, seawater concentrations of DMS used to drive the ocean-atmosphere flux of DMS are simulated interactively by the ocean biogeochemistry component, MEDUSA, using the parameterization of Anderson et al. (2001). As discussed in  this parameterization was tuned as part of the development of the fully coupled UKESM1 model to ensure energy balance at the top-of-atmosphere (TOA). In the Anderson et al. (2001) scheme, DMS is parameterized as a function of chlorophyll (C), light (J) and a nutrient term (Q) : The fitted parameter values were originally set to be a = 2.29, b = 8.24 and s = 1.72. In UKESM1, the value of a was tuned from 2.29 to 1.0. This essentially reduces the minimum allowed value for DMS while maintaining the slope of the fit to observations reported in Anderson et al. (2001). DMS seawater concentrations in GC3.1 are prescribed from Lana et al. (2011).
The DMS emission flux to the atmosphere is calculated in both models using the Liss and Merlivat (1986) emission scheme. In 30 GC3.1 this emission flux is scaled by DM S (1 + 0.7), where the additional 0.7 DMS flux represents a missing marine organic aerosol source (Mulcahy et al., 2018).

Primary marine organic aerosol
There is an increasing body of literature supporting the existence of an organic source of aerosol over the oceans from organic enriched sea-spray aerosol emitted via bubble-bursting and emission from gas-phase VOCs in the ocean surface layer (McCoy 5 et al., 2015;O'Dowd et al., 2004;Meskhidze and Nenes, 2006;Facchini et al., 2008). Primary marine organic aerosol (PMOA) emissions are believed to constitute the majority of the marine OA emissions (de Leeuw et al., 2011) and have been shown to have a high correlation with surface chlorophyll (Rinaldi et al., 2013;Spracklen et al., 2008). Recognising this as a potentially important source of cloud condensation nuclei (CCN) in remote marine regions, such as the Southern Ocean, GC3.1 represents this source by applying a scaling of 1.7 to the marine emissions of DMS as noted above. This is an over-simplified approach but 10 improved the agreement of simulated and observed cloud droplet number concentrations over the Southern Ocean (Mulcahy et al., 2018).
In UKESM1 emissions of primary marine organic aerosol are explicitly modelled following the emission parameterization of Gantt et al. (2011) with updates from Gantt et al. (2012). The organic mass fraction of the emitted sea spray aerosol, OM SSA , is calculated as a function of the biological productivity (based on surface chlorophyll-a, C), the 10 m windspeed (U 10 ) and 15 the sea-salt dry diameter (D p ) according to: 1 + 0.03 exp(6.81D p ) + 0.03 1 + exp(X(−2.63C) + X(0.018U 10 )) (2) In UKESM1 we use a value of 3 for the X parameter, which acts to enhance the positive correlation of OM SSA with C and negative correlations with U 10 (Gantt et al., 2012). Chlorophyll concentrations are taken from the MEDUSA ocean biogeochemistry model with a coupling frequency of 3 hr. When used in the PMOA emission parameterization the chlorophyll 20 concentrations are scaled by a half. This is due to systematic positive biases in the MEDUSA chlorophyll concentrations, in particular in the Southern Ocean (Yool et al., 2013), which was found to have a detrimental impact on the emissions of PMOA.
The PMOA mass emission flux is then given by: where V SS is the volume flux of emitted sea salt (in cm 3 m −2 s −1 ) and ρ SSA is the apparent density of the emitted sea spray 25 aerosol (in g cm −3 ) calculated as: where ρ OM and ρ salt are the densities of organic matter (defined here as 1500 kg m −3 ) and sea salt (defined here as 2165 kg m −3 ). Gantt et al. (2012) apply a global scale factor of 6 to Eqn 3 above. However, given our global PMOA emissions compare well with Gantt et al. (2012)  (2012) we do not apply a global scaling factor here. This global scale factor is expected to be model dependent given the dependence of Eqn 2 on U 10 and V SS which will in themselves be model and resolution dependent.
In order to apply the mass emission flux calculated in Eq. 3 to GLOMAP, an assumption about the size of the emissions is required. In a mesocosm study of sea spray aerosol composition and size Prather et al. (2013) find a nascent sea spray emission 5 mode centred at 162 nm diameter (at 15% relative humidity). Additionally, Prather et al. (2013) find that the number fraction of this size range is dominated by primary marine organic particles, with inorganic sea salt particles dominating in the supermicron size range. These contributions are consistent with the observed mass fractions of marine aerosol in O'Dowd et al. .
As implemented here, the PMOA emission calculated in Eq. 3 is distributed across the soluble (25% of mass) and insoluble 10 (75% of mass) Aitken modes assuming a 160 nm size. The split between soluble and insoluble is guided by the measurements in O' Dowd et al. (2004), who show insoluble marine organic aerosol concentrations a factor of three greater than soluble marine organic aerosol concentrations.

Biogenic volatile organic compounds (BVOCs)
In UKESM1 emissions of the most abundant terrestrial biogenic VOC compounds, isoprene and monoterpenes, are simulated 15 using an interactive BVOC emission scheme, iBVOC (Pacifico et al., 2011(Pacifico et al., , 2012. In iBVOC, emissions of biogenic isoprene are based on a simplified mechanistic scheme of Pacifico et al. (2011) and the BVOC emission parameterisation from Guenther et al. (1995) for monoterpenes. While biogenic isoprene emissions are coupled to the gas-phase chemistry in the UKCA model and thus directly affect tropospheric ozone production and methane lifetime, due to the simple SOA chemical formation mechanism currently employed in the model (Table 2 and 3), only emissions of monoterpenes contribute to the formation of 20 SOA. As already described above the yield of SOA from monoterpene has been doubled from 0.13 used in Mann et al. (2010) to 0.26 here in part to account for missing BVOC sources. This simplified approach preserves the global emission magnitude but may introduce a bias in the geographic distribution of SOA. Isoprene is emitted mainly in the tropics and sub-tropics while the largest sources of monoterpenes are found in the northern hemisphere boreal regions. The bias will, therefore, manifest on the regional and local scale rather than the global and is expected to be small compared to the large uncertainty associated with 25 modelling BVOC emissions Arneth et al. (2008).
Under present-day conditions iBVOC produces an annual global total monoterpene emission flux of approximately 130 Tg[C]/yr (see Figure 1 in Supplementary Information). Global annual total monoterpene emissions are highly uncertain (Arneth et al., 2008) and are poorly constrained by measurements. Past estimates range from 29 Tg[C]/yr to 135 Tg[C]/yr at present-day conditions (Guenther et al., 1995;Arneth et al., 2008;Stavrakou et al., 2009;Guenther et al., 2012;Sindelarova et al., 2014;30 Messina et al., 2016;Hantson et al., 2017). UKESM1 is within the upper-range of these estimates. In GC3.1, emissions of monoterpenes are prescribed as monthly averages from the Global Emissions Inventory Activity (GEIA) database which used the Guenther et al. (1995) model. The annual global emission flux is higher than UKESM1 at 137 Tg[C]/yr and is just outside the upper range of previous estimates reported above. The GC3.1 monoterpene emissions are temporally fixed and so do not resond to changes in vegetation or climatic conditions.

Aerosol chemistry
The aerosol chemistry in UKESM1 is fully coupled to the UKCA stratospheric-tropospheric (StratTrop) chemistry scheme (Archibald et al., 2020;Morgenstern et al., 2009;O'Connor et al., 2014). Therefore the chemical oxidants involved in the 5 oxidation of gas phase aerosol precursors, namely the hydroxyl radical (OH), ozone (O 3 ), and nitrate radical (NO 3 ), are interactively simulated and therefore have their own production and loss mechanisms. The aerosol chemical production is therefore more tightly coupled to the chemical state of the atmosphere throughout the historical simulations than in GC3.1 reflecting an additional level of realism. Changes in the concentrations of these trace gas oxidants since pre-industrial times have been shown to have important impact on the evolution of the historical aerosol forcing (Karset et al., 2018).
10 Table 2 describes the aerosol chemistry included in the StratTrop scheme. The reader is referred to Archibald et al. (2020) for a complete description of the StratTrop chemical mechanism in UKESM1. Gas phase emissions of SO 2 , DMS and monoterpenes are oxidised via gas phase reactions with OH, NO 3 and O 3 to eventually produce H 2 SO 4 and SEC_ORG (see Table 2).
MSA is treated as an inert sink of sulphur and is neither transported nor advected in the model. Dissolution of SO 2 in cloud droplets follows the Henry's Law equilibrium approach (Warneck, 2000) and uses a global fixed value of cloud pH of 5.0. The 15 aqueous phase oxidation rate of SO 2 is determined from the reaction of HSO − 3 and SO 2− 3 with H 2 O 2 and O 3 . In the aqueous phase there is no explicit product to these oxidation reactions. Instead the reaction fluxes are used to update the accumulation and coarse mode sulphate aerosol mass. In the current configuration these calculated fluxes are reduced by 25 % to account for the lack of a removal mechanism of the in-cloud SO 4 aerosol produced in this way.
In the stratosphere additional sulphur cycle aerosol-chemistry processes are included which are appropriate for non-volcanic 20 sources in the stratosphere (Dhomse et al., 2014;Weisenstein et al., 1997). These include the photolytic and thermal reactions of COS, SO 2 , SO 3 and H 2 SO 4 . Reactions of COS and DMS with O( 3 P) are also included. Volcanic sources of SO 2 in the stratosphere are not treated interactively but are specified from a climatology (see Section 2.8).
In GC3.1 the chemical oxidants involved in the gas phase and aqueous phase oxidation of aerosol precursors are prescribed as monthly mean climatologies. As the StratTrop chemistry configuration used in UKESM1 was not finalised by the time the 25 GC3.1 configuration was frozen, the oxidant fields were taken from HadGEM3 simulations run for the Chemistry Climate Model Initiative (CCMI) Morgenstern et al., 2017). The simplified "offline oxidant" chemistry scheme (see Table 3) used has only 7 atmospheric chemical tracers compared with 84 tracers used in the full StratTrop scheme and so significantly reduces the computational cost of the model. The chemistry is only solved below 20 km and so does not explicitly simulate stratospheric aerosol chemistry. The offline oxidant scheme includes the degradation of SO 2 and DMS, together 30 with the oxidation of monoterpene to form SEC_ORG. The chemical fields O 3 , OH, NO 3 , HO 2 and H 2 O 2 are input as time varying monthly mean fields, with only the aerosol precursor species, such as DMS, DMSO, SO 2 , H 2 SO 4 , monoterpene and SEC_ORG retained as advected tracers. H 2 O 2 is represented by both an advected tracer and an offline field. As it is a highly soluble species and is therefore affected by wet deposition, it is given a chemistry production and loss mechanism. The Table 2. Aerosol precursor chemistry included in UKESM1. For full details on reaction rate coefficients see Archibald et al. (2020).  Previously aerosol removal by convective precipitation was carried out after the convection scheme was called and so was removed from the levels at which the convective precipitation was formed and therefore did not interact with convective updraught. Kipling et al. (2013) found too much aerosol aloft in the tropical upper troposphere, where the aerosol had been transported above the heights at which aerosol removal by impaction and nucleation scavenging processes are most active. By 5 implementing an explicit treatment of the wet scavenging of aerosol in the convective plume they found statistically significant improvements in model biases in the mass burden and vertical profiles of black carbon aerosol in remote regions. Furthermore, with the introduction of prognostic rain into the HadGEM3 model (Walters et al., 2011), the amount of large-scale precipitation was greatly reduced improving model biases, in particular excessive light rain or drizzle events were reduced in the model.
This significantly reduced the wet deposition of aerosol and led to an increase in aerosol burden and aerosol optical depth, 10 exacerbating the biases found by Kipling et al. (2013).
In order to avoid excessive lofting of aerosol, an in-cloud convective plume scavenging scheme is employed following the approach described by Kipling et al. (2013). The aerosol number and mass mixing ratios are depleted in the rising plume depending on the scavenging coefficients for the mode, the precipitation rate and convective updraught mass flux together with the mass mixing ratio of liquid water and ice. The scavenging coefficients are set to be the same as those for dynamical cloud 15 with the scavenging coefficients for accumulation and coarse modes set to 1.0. The scavenging coefficient is set to 0.5 for the soluble Aitken mode in recognition of the higher updraught velocities in convective cloud which likely lead to supersaturations high enough to activate some of the Aitken mode aerosol. The nucleation mode is not scavenged.

Nucleation scavenging
Previously the occurence of nucleation scavenging in large-scale rain was determined by the rain rate differences between a 5 given level and the level above (Mann et al., 2010). With the implementation of prognostic rain a new approach was required.
Nucleation scavenging by large-scale rain now occurs when the autoconversion and accretion rates calculated by the largescale precipitation scheme exceed a minimum rate of 10 −10 kg kg −1 s −1 . Removal of aerosol then occurs at the rate of the conversion of cloud water to rain. All soluble mode particles with a dry radius greater than 103 nm are subject to in-cloud scavenging and insoluble mode particles are scavenged only in cold environments when temperatures are below 258 K. There 10 is currently no representation of aerosol re-evaporation whereby aerosol particles are returned to the atmosphere when a falling cloud liquid droplet evaporates. A representation of the sub-grid variability of precipitation is incorporated by scaling the nucleation scavenging rate by the grid box mean cloud liquid fraction and assuming a raining fraction of 0.3. Removal by ice particles is included by assuming that removal by ice occurs at the same rate as the riming rate of ice crystals and aggregates in the cloud microphysics. 15 Impaction scavenging of aerosol below clouds is based on Slinn (1982), using a modified Marshall-Palmer size distribution for raindrops (Sekhon and Srivastava (1971)) with raindrop terminal velocities from Easter and Hales (1983) and scavenging coefficients from Flossmann et al. (1985). Below-cloud scavenging by snow is now included following the approach described in Wang et al. (2011) whereby a power law function is used to derive a snow scavenging rate, k snow , for each aerosol mode, where P is the snowfall precipitation rate. The scavenging coefficients, a and b for each mode are taken from overly efficient wash-out of the larger aerosol particles, particularly at high latitudes. For a snowfall rate of 1 mm hr −1 and a typical coarse mode size of 2 µm, the value of a can be as low as 0.1 mm −1 (Croft et al., 2009;Feng, 2009

Dry deposition
Dry deposition and sedimentation of aerosol follows that described in Mann et al. (2010) with a modification to how the sedimentation is calculated. Previously, as the aerosol deposition processes occur on the hourly timestep, the aerosol flux into and out of each model grid box was artificially restricted to ensure numerical stability. This can overly-restrict the sedimentation velocities and a sub-stepping of the sedimentation has been implemented to circumvent this problem. For computational effi-30 ciency, the sub-stepping is applied to coarse and accumulation modes only. Tests led to an optimal timestep of 15 minutes and 30 minutes being applied to the coarse and accumulation modes respectively. These changes increased coarse mode deposition velocities, in particular impacting the sea-salt aerosol distribution. Impacts on other modes are found to be small.

Mineral dust: CLASSIC aerosol scheme
Mineral dust aerosol is simulated independently of the other aerosol species using the CLASSIC dust scheme . Dust aerosol can therefore be considered to be externally mixed with the GLOMAP aerosols. The CLASSIC dust scheme is used in both GC3.1 and UKESM1, though some settings differ between the models. The scheme is described in detail in Woodward and Sellar (2019) and was developed from the scheme described by Woodward (2001) (1999). Dust is produced from bare soil in both models, and also from seasonally vegetated areas of grass and shrub in GC3.1 and no preferential sources 10 are imposed. Seasonally vegetated sources are omitted from UKESM1 in order to limit the potential impact of biases in the interactively simulated vegetation calculated by the TRIFFID scheme, which are inevitably larger than those of the IGBP climatology (Loveland and Belward, 1997) used in GC3.1. This is a relatively minor change as seasonal sources generate less than 10% of the GC3.1 dust load.
Dust is transported as six independent tracers corresponding to the emission bins and is subject to deposition through sedi-15 mentation, turbulent mixing and below-cloud scavenging. In UKESM1 the total dust deposition flux to the ocean is passed to the MEDUSA ocean biogeochemistry scheme as a source of iron for plankton growth. Unlike GLOMAP, the dust scheme is called on each 20 minute model timestep.
Three emission variables are tunable in the dust scheme: multipliers to the friction velocity and soil moisture dependence and a global dust emission multiplier. The first two are needed to compensate for the differences between the instantaneous point 20 measurements used to derive the algorithms and the model resolved variables at the grid-scale and they also compensate for model biases. The global multiplier is a common feature of most dust emission schemes. The dust scheme was fully retuned for UKESM1 in order to minimise biases across a number of dust metrics including near-surface dust concentrations, dust aerosol optical depth (AOD), deposition rates and size distribution against observations. This resulted in the UKESM1 emission size distribution being shifted towards larger particle sizes than in GC3.1. This affects the many size-dependent dust processes and 25 results in changes to the global dust load, distribution and the radiative effects of dust. This evaluation is documented in detail in Woodward and Sellar (2019).

Stratospheric volcanic aerosols
Stratospheric volcanic aerosols, produced from explosive volcanic injections of SO 2 , are not simulated interactively but are prescribed using the CMIP6 stratospheric aerosol climatology. This implementation is detailed in Sellar et al. (2020) and so is 30 only briefly described here. The aerosol optical properties are based on the climatology decribed by Thomason et al. (2018) and use a combination of satellite observations from the period 1979-2014 and chemical transport modelling (Arfeuille et al., 2014). This approach ensures better consistency in stratospheric aerosol radiative forcing between the CMIP6 models that use this climatology.

Aerosol-radiation and aerosol-cloud interactions
Aerosol particles can modify radiation fluxes through the direct scattering and absorption of shortwave (SW) and longwave (LW). The aerosol optical properties (refractive index, mass extinction and absorption coefficients and asymmetry parameter)

Model simulations
For the purpose of this evaluation we make use of the ensemble of historical simuations that were conducted with both GC3.1 and UKESM1 for CMIP6. The historical simulations cover the period from 1850 to the end of 2014 and therefore model the evolution of climate since the pre-industrial era. These simulations are forced by transient external forcings of solar variability, 25 well-mixed greenhouse gases and other trace gas emissions and aerosols. The volcanic forcing due to the stratospheric injection of SO 2 from volcanic eruptions is prescribed as a zonal mean climatology of the stratospheric aerosol optical properties over the historical period. All forcings and how they are implemented in both models are described fully in (Sellar et al., 2020). The GC3.1 historical simulations are evaluated in full in Andrews et al. (2019). For CMIP6 a total of 19 and 4 ensemble members were run for UKESM1 and GC3.1 respectively. Each ensemble member of each model was initialised from a different date in 30 the respective model's pre-industrial control simulation. For the evaluation presented in this study we use the first 9 members of the UKESM1 ensemble and all 4 members of the GC3.1 ensemble. Unless otherwise stated, the ensemble mean of these models is presented in the evaluation below.
In addition we utilise the atmosphere-only configuration of each model (otherwise known as AMIP configuration) for detailed aerosol budget analysis as all of the required diagnostics were not output in the historical simulations. Driven by observed sea surface temperature (SST) and sea ice fields simulations were run from 1979 to the end of 2000 and allow additional simula-5 tions to be carried out at a much reduced computational cost. The UKESM1 AMIP configuration does not include the additional dynamic ocean and land surface components (Eyring et al., 2016). Instead the required vegetation (vegetation fractions, leaf area index, canopy height) and surface ocean biology fields (DMS and chlorophyll) are taken from a single UKESM1 historical member and are prescribed as ancillary data, thereby maintaining traceability to the fully coupled model.
As aerosol observations for the complete historical period do not exist, we focus our evaluation on the present-day period 10 of the historical simulations. High temporal daily or sub-daily aerosol data is not available from these simulations due to the large data volume already requested from the CMIP6 simulations. This may introduce some uncertainty into our analysis as

Aerosol optical depth
Two different types of AOD observations are used in this evaluation: ground-based measurements and satellite retrievals of AOD. Ground-based measurements are taken from the globally extensive Aerosol Robotic Network (AERONET) (Holben et al., 1998) which provides quality assured measurements of aerosol optical properties from a range of different aerosol regimes across the globe (Holben et al., 2001). In this study, AOD at 440 nm is used from the version 2 level 2.0 product.

Aerosol number concentrations
To evaluate simulated aerosol number concentrations and hence cloud condensation nuclei we use observations of N 50 (the total particle concentration with diameter >50 nm) and N tot (the total particle concentration within the detectable range of the instrument, generally 3 nm). N 50 and N tot observations were derived from size distribution measurements from a combination of ground-based measurements, ship-based and aircraft campaign data. The data was largely compiled as part of the Global 30 Aerosol Synthesis and Science Project (GASSP) (Reddington et al., 2017) and is described in detail in Appendix B of Johnson et al. (2019b). The campaign data used represent campaigns in predominantly marine environments and they took place between 1995 and 2012. All ground-based and campaign data was interpolated onto to model horizontal and vertical grid. The station data was converted to monthly means while the campaign data was assumed to be representative of the month(s) in which the campaign took place. This monthly mean data is compared with monthly mean model data averaged over the 20 year period covering 1995 to 2014 inclusive.

Cloud droplet number concentration
Observations of cloud droplet number concentrations (N d ) are limited although there has been recently more effort in this field 5 (Grosvenor et al., 2018b). We use two satellite N d products both derived from the MODIS sensor. The first is a monthly global N d climatology at 1 • resolution (Grosvenor and Wood, 2014;Grosvenor et al., 2018a, b) which retrieves N d over both land and ocean. In this dataset, retrievals of cloud optical depth and effective radius at 3.7 µm from the Level 2 MODIS Collection 5.1 cloud products are used to estimate the liquid N d . See Grosvenor et al. (2018b) for details of the retrieval method and further references. Retrievals are filtered to include only liquid cloud fractions greater than 80% and low clouds (below 3.2 km). 10 The second N d monthly dataset is described in Bennartz and Rausch (2017) and retrieves information over ocean only. This dataset does not filter for high solar zenith angles and so is likely to contain overestimates at high latitudes (Grosvenor and Wood, 2014;Grosvenor et al., 2018a). It also does not filter for low-altitude clouds. Validation of N d retrievals in deeper clouds has not been performed and it is likely that some of the assumptions made for the retrieval are wrong for such clouds. In an attempt to reduce the effect of MODIS effective radius biases, this dataset is filtered to only include data points for which the 15 effective radius from the 3.7 µm retrieval is greater than that from 2.1 µm and which is in turn greater than that from 1.6 µm, since this is the expected order based on aircraft measured droplet size profiles and the different vertical penetration depths into cloud top of the different wavelengths (e.g., see Grosvenor et al. (2018a)). However, in stratocumulus regions where N d retrievals are likely to be most reliable and where they validate well against aircraft observations the three wavelengths produce similar effective radii values (Painemal and Zuidema, 2011). This suggests that this filtering may throw away data even in such 20 regions and this may lead to a low N d bias (Grosvenor et al., 2018b). The time series of both products covers the period 2003 to 2014.
Annual mean climatologies of simulated N d at 1 km from years 2003 to 2014 are compared with annual means generated from the satellite products. Lack of high temporal outputs from the historical simulations prevents consistent filtering methods from being applied to both satellite and model data. High solar zenith angles and sea-ice are screened for in Grosvenor et al. Table 4 shows the global annual mean aerosol budget including the gas phase budget of aerosol precursors for UKESM1 and GC3.1. A full breakdown of the SO 2 budget is provided in Table 5. Additional spatial plots comparing the natural emissions from both models are provided in the Supplement. For sulphate aerosol, the primary source is reflecting the 2.5% of the emitted 5 anthropogenic SO 2 that is assumed to directly enter the aerosol phase, while the secondary source includes contributions from chemical production, condensation and nucleation of H 2 SO 4 . Both of these sources are in good agreement with the corresponding values reported in Mann et al. (2010). The lifetime of SO 4 is 5.57 days and 4.95 days in UKESM1 and GC3.1 respectively and is over 2 days longer than Mann et al. (2010) (3.7 days). The UKESM1 SO 4 lifetime is also on the upper end of the range shown in the AeroCom-I models (3-5.4 days, Textor et al. (2006)) but compares well with a previous version of 10 GLOMAP in HadGEM3 (5.1 days, Bellouin et al. (2013)). The differences in lifetime across the AeroCom models and previous GLOMAP configurations reflect not only the diversity in aerosol processes and aerosol chemistry across the different aerosol schemes but also the differences in the host climate models driving processes such as the aerosol tracer transport, water uptake and aerosol removal rates. For example, the frequent occurence of very low precipitation rates has been improved considerably in recent HadGEM3 configurations (Walters et al., 2011(Walters et al., , 2014 which has significantly reduced the aerosol nucleation and . This is due to the different treatment of the DMS chemistry as well as differences in the availability of oxidants. The production of SO 2 from the oxidation by NO 3 is over 4 times larger in GC3.1 representing approximately 50% of the DMS loss versus 25% loss in UKESM1 (see Table 5). Surface level oxidants from both models and their fractional differences are shown in Figure 1. Over global oceans, NO 3 is significantly lower in UKESM1 and concentrations are over 80% lower in the important DMS source region of the Southern Ocean. Furthermore, GC3.1 includes the 30 additional production pathway for SO 2 via the intermediate formation of DMSO.

Aerosol budget and burdens
The global burden and lifetime of SO 2 is smaller in UKESM1 than in GC3.1. This is driven largely by the different emission injection heights of anthropogenic SO 2 . Inputting all anthropogenic SO 2 at the surface, as is done in UKESM1, leads to higher surface SO 2 concentrations close to source regions. This additional SO 2 is then more efficiently removed via dry deposition, with dry deposition lifetimes in UKESM1 of 6.6 days compared to 8 days in GC3.1, while wet deposition timescales are longer (14 versus 12 days). A sensitivity simulation, with the emission injection heights for anthropogenic SO 2 set-up to be the same as GC3.1, increased the SO 2 burden and lifetime to 0.61 Tg and 2.21 days respectively. These values compare better with GC3.1. Notably, the simulation did not significantly impact the SO 4 budget (see Section 5.2.1). This highlights the important role of the aerosol chemistry and driving oxidants in determining the SO 4 budget.

5
While the timescales for the oxidation of DMS are longer in UKESM1, the oxidation timescales of SO 2 are shorter by 10% (3.7 days compared to 4.3 days, see Table 5). Therefore despite GC3.1 having DMS emissions that are more than double that of UKESM1 and higher global SO 2 burdens, the secondary production of SO 4 is only 15% higher. The globally shorter oxidation timescales in UKESM1 result in comparable SO 4 burdens and contribute to the longer SO 4 lifetime in UKESM1.
The timescales for oxidation will be determined largely by the differences in oxidants between the two models (see Figure 1).
Variations in anthropogenic sources are to be expected due to the choice of different analysis year and emission source data.
The higher primary OM emission in UKESM1 reflects the additional source from primary marine organics which contributes of the order of 4.5 Tg/yr. The inclusion of the iBVOC emission scheme in UKESM1 and different oxidants leads to differences in the secondary OM emission source. The production of SEC_ORG from the oxidation by NO 3 and to a lesser extent by OH is much higher in GC3.1 than in UKESM1 further demonstrating the role of the different oxidants in the two models. In particular, the lack of a NO 3 sink in the offline oxidant scheme in GC3.1 is leading to a perpetual supply of NO 3 . The global mean burdens in both models are also within the reported range of 0.13-0.26 Tg and 1.0-2.2 Tg for BC and OM respectively 5 (Tegen et al., 2019;Textor et al., 2006). The lifetime of BC at 5.1 days in both models is much shorter in UKESM1 and GC3.1 than in HadGEM2-ES which suffered from an excessively long lifetime of 15 days (Bellouin et al., 2013). Overall, the BC and OM lifetimes in both models are in good agreement with each other and are approximately 1 day shorter than the AeroCom median lifetimes of 6.5 and 6.2 days respectively (Textor et al., 2006).
Mineral dust emissions in UKESM1 (7398.5 Tg/yr) are much higher than other models and are more than twice as large There is a very large uncertainty in global sea-salt emissions (Lewis and Schwartz, 2004) with the UKESM1 and GC3.1 global means well within this range. Both emission and lifetime are in good agreement with the AeroCom median value of 6280 Tg/yr and 0.41 days respectively (Textor et al., 2006). The global mean sea-salt emissions and burdens are higher 25 in UKESM1 by approximately 35% due to the change in the prescribed sea-salt density as well as differences in the 10m windspeeds.
The spatial distributions of the annual mean gas-phase SO 2 and aerosol burdens from the historical ensemble mean are broadly similar in both models (Figures 2 and 3) with a few noteworthy differences. The higher SO 2 burden in GC3.1 ( Figure   2a) is globally widespread with particularly notable enhancements in the tropical regions. In this region the Lana et al. (2011) 30 seawater DMS concentrations in GC3.1 peak and will be significantly higher than the seawater concentrations calculated interactively in UKESM1. As discussed above SO 4 burdens are globally comparable between the two models. Lower burdens are found in the Southern Hemisphere high latitudes in GC3.1 (Figure 2b). This is likely caused by the longer lifetime in UKESM1 leading to long-range transport of SO 4 to remote high latitudes. Over North America, both models systematically underestimate the observations in the east of the country but show a high correlation (r 2 > 0.8). In the west, the models generally tend to overestimate the observations and have a lower correlation (0.2 < r 2 < 0.4) due to the larger degree of scatter in this region. This region is expected to be relatively remote from emission 20 sources which are predominantly in the east and so the positive bias highlights potential issues in the amount of sulphate or SO 2 transported from source, excessive oxidation away from source or too low removal rates. The simulated SO 4 is underpredicted by both models across East Asia at the EANET measurement locations (Figures 4c and d). Overall, the simulated SO 4 in UKESM1 tends to underpredict the observations to a greater degree than GC3.1 with a NMB of -0.21 compared to -0.18 in GC3.1.   Over North America, a small negative trend in SO 4 concentrations is found in the East IMPROVE sites (Figure 5c) which is generally well captured by both the models although again a negative model bias is found in both models. The absolute 5 SO 4 concentrations and negative trend is smaller at these sites than over Europe although the observations cover a shorter time period. Over the western measurement sites (Figure 5d) there is little to no trend seen in the observations while the models exhibit a small negative trend and overestimate the observed values. Finally, over East Asia (Figure 5b) an increase in the SO 4 concentrations is found reflecting the increase in anthropogenic emissions in this region over the observed period, particularly in China. Both models capture the rising trend and while they generally underpredict the observed values the models are well At most surface measurement sites the difference between UKESM1 and GC3.1 surface SO 4 concentrations are generally much less than the difference from the observations with both models exhibiting similar biases. The models generally tend to underestimate the observed surface concentrations except over the western US sites where both models overestimate the observations. One notable exception where the models deviate from one another is over Europe. Here UKESM1 has a consistent negative bias during all years (NMB=-0.25) while GC3.1 is in good agreement with the observations (NMB=-0.03). The 5 similarity between both models is in many ways surprising given the different vertical distribution of the anthropogenic SO 2 emissions in both models. In UKESM1 all SO 2 emissions are emitted at the surface and therefore one might expect higher surface SO 4 concentrations as a result. The SO 2 surface concentrations at these sites are higher in UKESM1 (not shown).
However as discussed in the previous section most of this excess surface SO 2 is efficiently removed by dry deposition (Table   5). A sensitivity simulation was conducted which prescribed the SO 2 emission injection heights in the same way as in GC3.1.

10
While this decreased the surface SO 2 concentrations to be more comparable with GC3.1 it has only a small impact on the surface SO 4 comparison (see Supplementary Figures 3 and 4). For example, the NMB at EMEP sites reduces slightly from -0.26 to -0.23 while the correlation coefficient increases from 0.37 to 0.39 compared to a NMB and correlation coefficient of -0.03 and 0.44 respectively in GC3.1. Furthermore, differences in DMS emissions will not contribute significantly to the source terms at the measurement sites assessed here. This demonstrates that simulated SO 4 production in the key anthropogenic source 15 regions assessed here is oxidant limited rather than SO 2 limited.
While globally the oxidation timescales of SO 2 to SO 4 are faster in UKESM1 (see Table 4 and 5) regional analysis of the budget over Europe shows the oxidation timescales are slower by 15% leading to a longer regional lifetime of 1.6 compared to 1.3 days in GC3.1. Lower concentrations of O 3 over Europe (Figure 1c) lead to a nearly 60% lower production of SO 4 from the aqueous phase oxidation by O 3 . The different vertical profile of SO 4 production also leads to a shorter dry deposition 20 lifetime of SO 4 in UKESM1 (3.7 versus 5.6 days) although this is compensated for by a longer lifetime for wet removal.
Regional budget analysis of the SO 2 and SO 4 budget over North America found very similar oxidation rates and timescales in both models despite some notable differences in oxidant concentrations shown in Figure 1. This supports the comparable performance of both models against the observations across the North American sites and suggests the larger contributing role of emissions and deposition processes (common to both models) to biases in this region.   1980 1984 1988 1992 1996 2000 2004 2008 Year 0 1 2 3 4 5 6 7 8 9 10 SO4 Concentration ( g m 3 ) a) EMEP 1980 1984 1988 1992 1996 2000 2004 2008 Year 0 1 2 3 4 5 6 SO4 Concentration ( g m 3 ) b) EANET 1980 1984 1988 1992 1996 2000 2004 2008 Year 0 1 2 3 4 5 6 7 SO4 Concentration ( g m 3 ) c) East IMPROVE 19801984198819921996 Year 0   conditions. Accurate temporal and spatial sampling is known to be important in model-observation comparisons and evaluation of annual data here is likely to introduce some uncertainties in our comparison (Schutgens et al., 2017). However given the relatively strict criteria applied to the observed data in building the annual mean observed climatology we believe the observed data are representative of the annual climatology at each site. Figure 6c and d compares simulated annual mean OM mass concentrations from UKESM1 and GC3.1 respectively against 5 ground-based measurements from IMPROVE and EMEP. The RMSE (1.14 for UKESM1 and 1.96 for GC3.1) is higher for OM than for BC in both models and the correlation coefficients are lower at 0.36 and 0.35 for UKESM1 and GC3.1 respectively.
Overall, both models are positively biased against the observations. GC3.1 has a much larger positive bias than UKESM1 with a normalized mean bias (NMB) that is 3 times larger than UKESM1 (NMB=0.87 versus NMB=0.24) (Figure 6d). Given the strong weighting of this comparison to measurement sites across North America the different behaviour in the models for 10 surface OM mass concentrations is likely due to lower contributions to the OM mass from SOA in UKESM1 in this region.
Emissions of monoterpenes over North America from the iBVOC model in UKESM1 are different to the prescribed emissions in GC3.1 (see Figure 2 in the Supplement). It should be noted that the prescribed emissions used in GC3.1 are also model based and may suffer from biases in simulated vegetation fractions and types. The oxidation of emitted monoterpene to SEC_ORG is also different in these two models. The global production of SEC_ORG in GC3.1 is over 50% larger than in UKESM1, in where there is no sink for this species. The inclusion of an interactive emission source which more accurately reflects the change in vegetation distribution and response to changing temperatures combined with interactive oxidants in UKESM1 leads to reduced biases against surface measurements of OM in this region.
To further evaluate the different treatment of biogenic sources in the physical and ES models and its influence on the evolu-5 tion of SOA we now examine the seasonal cycle of organic aerosol at a remote forested site, Hyytiälä in Finland (Figure 7). As monoterpene emissions from vegetation are highly temperature dependent a strong seasonal cycle in OM is observed (see also

Aerosol optical depth
To evaluate the evolution and global distribution of aerosol optical depth in UKESM1 and GC3.1 we use a combination of ground-based and satellite retrievals of AOD. The timeseries of global annual mean AOD at 550 nm from the UKESM1 and GC3.1 ensembles is plotted from 1979 to 2014 in Figure 8. The global mean AOD in GC3.1 is consistently higher than UKESM1 in all years by approximately 0.01 but both models show the same inter-annual variability driven primarily by 5 changes in emissions. The individual historical members of each model ensemble are also plotted as is the UKESM1-AMIP simulation. The variability in global mean AOD among the individual members is small for both models and the AMIP simulation is also in good agreement with the UKESM1 ensemble mean demonstrating good traceability from the fully coupled ES to the atmosphere-only configurations.
Also plotted in Figure 8 are the annual mean AOD from a number of satellite retrieval products using the MODIS, ATSR and 10 AATSR-2 sensors (see Section 4.2) The satellite retrievals are plotted for the retrieval period of each satellite sensor. While a comparison of the different satellite products is beyond the scope of this work it is clear there is a significant uncertainty in the retrieved AOD from the different satellite sensors and aerosol retrieval algorithms. There are numerous sources of uncertainty in satellite remote sensing datasets that can explain the differences shown (Povey and Grainger, 2015). The MODIS and ATSR instruments have different swath widths and overpass times, such that they can observe significantly different aerosol regimes. Biases due to surface albedo, differences in the cloud clearing algorithms and the assumed aerosol microphysical properties are also likely (Popp et al., 2016). Multiple retrievals are used here to provide an indication of the observational uncertainty in AOD. Indeed the inter-and intra-model spread in AOD (less than 0.01) is much smaller than the spread in satellite retrieved AOD which spans a range of appoximately 0.04. During the period of satellite observations (from 1995 onwards), UKESM1 and GC3.1 are within the range of the satellite AOD although UKESM1 sits at the lower end of the observed range. Both models 5 are in good agreement with the MODIS-DT (MYD/MOD), Swansea (SU) and ADV global mean AOD products. Higher AOD in the merged MODIS (MODIS c6) product is expected due to the addition of the Deep Blue retrieval which will include AOD retrievals over dust source regions. But as is discussed below differences in the spatial distribution of the satellite AOD can result in misleading interpretation of global mean values. We now compare the seasonal spatial distributions of a subset of the satellite retrievals shown in Figure 8 for winter (December-January-February, DJF) and summer (June-July-August, JJA). 10 The seasonal spatial distribution of AOD, shown in Figure 9, highlights some notable regional differences in AOD between UKESM1 and GC3.1 (Figures 9a-d). Reflecting the annual AOD differences already noted above, the global mean AOD in UKESM1 is lower than GC3.1 by approximately 0.01 in both DJF and JJA. Across high latitude ocean basins UKESM1 has higher AOD in the respective winter seasons of each hemisphere. This reflects the higher sea-salt emission in UKESM1 which will peak in the winter months. In JJA, GC3.1 has higher AOD across most of the northern hemisphere. This is driven 15 in part by the elevated DMS emissions in JJA in the Northern Hemisphere but also the different rate and location of SO 4 production from other sources in the models. For instance, a higher optical depth is seen over the Mediterranean downwind of the volcanic SO 2 source from Mount Etna in Sicily. Dust AOD is lower in UKESM1 in both seasons over dust source and outflow regions of the Sahara desert. Despite higher emissions and burden in UKESM1 the difference in the dust size distribution between the two models, as highlighted above, results in larger sized particles which are less optically efficient at the mid-visible wavelengths studied here. Over South America, differences in monoterpene emissions plus differences in the oxidising capacity of atmosphere leads to lower AOD in UKESM1.

5
The spatial distribution of AOD among the satellite AOD products also show interesting regional features (Figure 9e-j).
ORAC AOD tends to be higher than both MODIS and SU AOD in both seasons but has lower AOD over dust source regions.
SU retrieves much lower AOD over ocean regions but has higher dust AOD than other products picking up dust sources over Australia and South America not captured by the other satellite datasets or models. In comparing the simulated and observed AOD distributions it is important to note that the models and satellite datasets have not been consistently temporally 10 or spatially sampled due to the low temporal resolution AOD output of the CMIP6 historical simulations. This will likely lead to uncertainties in the resulting evaluation (Schutgens et al., 2017(Schutgens et al., , 2016. Notwithstanding these observational differences and sampling uncertainties the spatial distribution and seasonal cycle of simulated AOD agrees well with the satellite observations, capturing the elevated AOD in JJA and lower AOD in DJF ( Figure   9). In JJA the models broadly capture the elevated AOD over the Northern Hemisphere continents including the contrasting 15 AOD signal over North America with lower AOD across the western United States and higher AOD on the East coast. Boreal forest fires, the primary contributor to summertime AOD across Canada, Alaska and Russia, are accurately captured and AOD from biomass burning sources in the tropics is within the observational range. As already mentioned, both models underestimate the dust AOD over and downwind of key source regions. This leads to low biases in AOD over the Sahara and across the tropical Atlantic Ocean and Arabian Sea. Across the Southern Ocean, AOD appears in reasonable agreement with 20 the observations but both models tend to underestimate in DJF and overestimate in JJA. This is in agreement with the recent evaluation of Southern Ocean aerosol in the atmosphere only configuration of GC3.1 conducted by Revell et al. (2019). The evaluation over high latitude regions should be treated with particular caution due to the relatively low number of satellite retrievals at these latitudes in winter.
The model disparity in AOD over the North Atlantic and Pacific Oceans in JJA is difficult to assess given the relatively 25 low AOD in the SU dataset and much higher AOD in MODIS and ORAC datasets. UKESM1 underestimates the MODIS and ORAC AOD in this region but agrees well with SU AOD and the opposite biases are found in GC3.1. Disparity amongst the satellite observations here makes quantitative evaluation of the simulated AOD over the remote oceans difficult. Figure 10 compares the annual mean AOD from both models with ground-based AERONET observations. AERONET sunphotometers provide a direct measurement of the attenuation of sunlight due to aerosol and so are not affected by the same 30 large uncertainties as are satellite retrievals of AOD, for instance uncertainty in the underlying surface properties. The high measurement frequency from these long-term observing sites provides us with a globally representative climatology at 67 sites shown in Figure 10a  While both models show a high degree of correlation with the observations (Figure 10c and d) (correlation coefficient, R > 0.79) GC3.1 shows a slightly higher correlation due to the smaller bias in the dust-prone sites in North Africa.

Aerosol number
One of the key advantages of 2-moment aerosol schemes such as GLOMAP over simpler bulk schemes is the ability to interactively simulate the evolution of the aerosol number size distribution. Here we assess the skill of the simulated aerosol number 5 concentrations at size ranges important for influencing cloud droplet formation. There is some uncertainty in this evaluation given the observations largely represent campaign data from short (often a single month) periods that often target specific aerosol regimes. While the model and observations have been sampled consistently from the same month and altitude, the extracted N 50 and N tot concentrations have then been averaged in the vertical and over all months to illustrate the overall annual mean bias at each observation location. This provides a more representative view of the coupled model's ability to simulate Overall, higher number concentrations are found in UKESM1 compared to GC3.1 across all size modes (not shown). This could be due to the different treatment of natural emissions, notably DMS and the inclusion of PMOA, but also could reflect an increase in the binary homogeneous nucleation rate when coupled to the interactive chemistry model. Figure 11 plots the concentrations of N 50 from UKESM1 and GC3.1 at the locations of the observations. The ratio of model to observed values ( Figure 11d and e) demonstrates that both models are generally within a factor of 2 of the observations. In general, UKESM1 5 has higher N 50 than GC3.1 globally which acts to reduce negative biases over the Northern Hemisphere continents and high latitude oceans but increases a positive bias in other ocean region basins. N 50 in the stratocumulus region off the west coast of North America is also positively biased in both models. While there is a notable absence of observations in the Southern Hemisphere high latitudes, data from ACE1 (Clarke et al., 1998) off the Southeast coast of Australia shows an underestimation of N 50 in GC3.1. The introduction of the PMOA source in UKESM1 increases the N 50 in this region (by up to 50 cm −3 in the 10 austral summer) and generally improves the bias although a positive bias is introduced at some grid points.
Similar to N 50 , simulated N tot (Figure 12) in both models is generally within a factor of 2 of the observations. Over most ocean regions, with the exception of high latitudes, both models overestimate N tot . A low bias is found over Northern Hemisphere continents in both models with GC3.1 also underestimating N tot downwind of anthropogenic source regions off the east coast of North America. In high latitude regions, such as the Southern and Arctic oceans, both models underestimate 15 N tot with a larger negative bias found in GC3.1. In the Southern Ocean, the overestimation of N 50 and underestimation of N tot could reflect inaccuracies in the prescribed emission size distribution of the PMOA potentially leading to an excess of PMOA residing in the accumulation mode. It could also point to missing sources such as the absence of a boundary layer nucleation scheme in the models.
Overall, the differences in the N 50 and N tot model biases are small although the model to observed ratio is higher for 20 N 50 over the remote oceans than for N tot highlighting potential biases in N 50 and subsequently CCN. However, definitive conclusions on the model performance here are difficult to draw, given the potentially large inter-annual variability in these simulated variables combined with the uncertainty in the observations (Johnson et al., 2019a;Watson-Parris et al., 2019).

Cloud droplet number concentration (N d )
The annual mean spatial distribution of N d from the two satellite N d products, UKESM1 and GC3.1 is shown in Figure 13. It is 25 first important to note the large difference in N d between the two satellite products, with the Grosvenor N d being systematically higher than the Bennartz N d (on average 30% higher over ocean, Figure 13a and b). This is likely due to the different filtering techniques applied based on the effective radii retrieved at different wavelengths in the Bennartz dataset, which is likely to lead to an underestimate in N d as discussed in Section 4.4. In addition, the restriction to cloud fractions greater than 80% in the Grosvenor dataset, whilst likely giving more accurate retrievals, may lead to overestimates compared to datasets where 30 this sampling is not performed due to the positive correlation between cloud fraction and N d . This highlights the inherent difficulties in retrieving N d from space, further complicating our ability to constrain this variable in models.
In contrast the difference in N d between UKESM1 and GC3.1 is much smaller, with differences of 3-4% over the ocean and 2% over land. UKESM1 has higher N d than GC3.1 over most ocean basins but differences are marginal over land. The small Over land, UKESM1 underestimates over most continents apart from Asia and parts of North Africa where N d is overestimated ( Figure 14a). The lower land N d in GC3.1 in the Southern Hemisphere will improve the bias in GC3.1 in this region but degrade the bias over North America and China. Differences in N d between the models over land will be due to a combination of factors including differences in natural emissions of terrestrial biogenic sources, aerosol scavenging and aerosol activation processes. The aerosol activation over land surfaces is driven by the boundary layer turbulent kinetic energy flux which deter-5 mines the sub-grid variability of the updraught velocities. Given the differences in the land surface properties of UKESM1 and GC3.1 we expect differences in the turbulent mixing which will also impact the vertical distribution of the aerosol.

Evaluation of primary marine organic aerosol
The emission of PMOA is a new source of marine aerosol in UKESM1. We therefore explore and evaluate the impact of this additional source of OM in the model and in comparison with GC3.1 which instead scales the marine DMS emission as a 10 proxy for this missing source (Mulcahy et al., 2018). We focus our evaluation on the Southern Ocean region due to the high occurence of pristine air masses in this region (Hamilton et al., 2014) and therefore a low risk of the OM mass concentrations being contaminated by anthropogenic emissions. The annual average global emission of PMOA is 4.5 Tg yr −1 during the present-day period evaluated here ( Figure 15). This is in good agreement but slightly below the global PMOA emission rate of 6.3 Tg yr −1 found in Gantt et al. (2012) who use the same emission parameterization but apply an additional global scaling factor of 6 in Equation 3. The value is also within the range of emission rates calculated using different emission parameterizations in the same model (Gantt et al., 2012) and in numerous other modelling studies which span a range of 2-70 Tg yr −1 (see Gantt and Meskhidze (2013) for a review) 5 but with many studies showing values in the region of 10 Tg yr −1 (Spracklen et al., 2008;Lapina et al., 2011;Vignati et al., 2010). Scaling the PMOA emissions by 6 in Gantt et al. (2012)  In order to evaluate the PMOA we run an additional UKESM1-AMIP simulation where we deactivate the PMOA emission source. This "NoPMOA" simulation is required as we don't track the PMOA as a separate tracer in the model but add the emitted mass and number to the existing OM tracers which will, in addition, be composed of both anthropogenic and terrestrial    (Yool et al., 2013). Overall, the scaled monthly Chl-a in UKESM1 agrees well with Globcolor Chl-a in this region when scaled and while small biases exist the mean monthly Chl-a is within 1 standard deviation of the observations. It is worth noting that the simulated seasonal peaks in both OM surface concentration and N d in Figure 16 are correlated with the peaks in simulated Chl-a but not DMS concentrations demonstrating that the 10 seasonal cycle of N d in this region is controlled largely by the organic aerosol in the model. The observations suggest however that the seasonal peak in N d is driven by DMS than Chl-a although a lagged response to Chl-a is possible (Rinaldi et al., 2013).
While the UKESM1 Chl-a peaks in November and DMS peaks in December, the observations show a peak in December and January for Chl-a and DMS respectively. UKESM1 simulated DMS is overpredicted in spring, underpredicted in summer and agrees well with observations in the autumn (17). The inability of the simulated Chl-a and DMS to capture the correct seasonal cycle highlights some deficiencies in the ability of the ocean biogeochemistry model, MEDUSA, to capture the complex biological productivity in the Southern Ocean (Yool et al., 2019(Yool et al., , 2013.

Discussion
UKESM1 and GC3.1 offer a unique opportunity to explore and improve understanding of aerosol-climate interactions through the exploitation of a traceable hierarchy of global climate models. These two models employ the same physical atmosphere-5 ocean components and in essence the same aerosol scheme but differ in their level of interaction with the full Earth system and in the specification of a small number of other aerosol properties, most notably the inclusion of an additional organic aerosol source over the ocean in UKESM1. A number of notable differences in the simulation of aerosols between the models have been highlighted in the current study. This paper attempts to characterize the overall climatology of aerosol and aerosol-cloud properties in both models with the aim of facilitating a broad understanding of the key drivers of the underlying differences and 10 associated model uncertainties. Subsequent future analysis will consider in more detail interactions between simulated marine and terrestrial biogeochemistry, atmospheric chemistry and aerosol properties in UKESM1.
The additional ES components in UKESM1 add complexity in particular with respect to aerosols. Coupling of the aerosol emissions and chemistry to dynamic vegetation, ocean biogeochemistry and a complex chemistry scheme introduces extra degrees of freedom in fully coupled ES models. This leads to the potential for biases in the interactively simulated processes 15 where in GC3.1 they are prescribed, in most cases from present-day observational based climatologies. The GC3.1 treatment of emissions and chemistry does not allow for future changes in these variables, for instance climate feedbacks on ocean productivity influencing marine emissions of DMS or ozone depletion influencing the aerosol oxidation pathways. Therefore one might expect smaller model biases in GC3.1 given the present-day nature of the evaluation presented here, although in many instances we find this is not the case.
The inclusion of an interactive emission scheme for BVOCs enables the BVOC emissions to change in response to changes 5 in land use and climate over the industrial period. This is found to lead to improvements in the simulation of organic aerosol over North America. The nature of the aerosol chemical oxidation is also found to be important with oxidation of monoterpene by NO 3 for instance, significantly higher in GC3.1 due to lack of a removal mechanism for this species. The different aerosol chemistry also leads to notable differences in the aerosol sulphur cycle. Despite nearly double the emissions of marine DMS in GC3.1 the annual mean sulphate loads are comparable. This is due to the different oxidation and scavenging lifetimes in 10 the models. The former is likely driven by a combination of different oxidants as well as differences in the DMS chemistry.
Limitations in the offline oxidant scheme are apparent where perpertual sources of oxidants could significantly influence the amount of aerosol produced in oxidant limited regions. This has potentially important implications for the aerosol forcing as recently highlighted by Karset et al. (2018).
While dust emissions in UKESM1 are more than double the emissions in GC3.1, the relative increase in dust burden of 25% 15 is much smaller. The different tuned parameters for dust employed in UKESM1 is a balance between achieving an optimal performance of dust metrics such as surface dust concentrations and aerosol optical depth against present day observations and achieving a realistic dust size distribution which has impacts for the remote transport of dust and subsequent deposition into the ocean in the fully coupled UKESM1. Recent observational studies (Ryder et al., 2019) support the existence of giant dust particles close to source and highlight the potential important longwave and shortwave radiative effect associated with 20 such large dust particles.
Biogenic emissions from the ocean are believed to be responsible for the observed seasonal cycle in aerosol and consequently N d in the Southern Ocean (Behrenfeld et al., 2019;Sanchez et al., 2018;McCoy et al., 2015;Meskhidze and Nenes, 2006) and also in clean marine regions such as the North Atlantic during periods of high ocean productivity. Biogenic sources include DMS, methanesulphonoic acid (MSA) as well as marine organics. Uncertainty in the CCN from these sources has 25 large implications for aerosol forcing in remote marine regions (Carslaw et al., 2017;Moore et al., 2013). The implementation of PMOA in UKESM1 clearly brings improvements in terms of the seasonal cycle of organic aerosol mass and N d in the Southern Ocean. However an underestimation of N d still remains. Given the global PMOA emissions are to the lower end of published ranges discussed above one could argue for the inclusion of a global scaling factor as used in Gantt et al. (2012).
We adopt a conservative approach here in order to balance the low bias in N d with the impact on top-of-atmosphere radiation 30 biases. Furthermore, the good agreement of OM mass with observations at Amsterdam Island suggests an alternative source of error, possibly a low bias in the underlying DMS concentration in summer. The apparent low sensitivity of N d to DMS shown in Figure 16 is inconsistent with a previous study (Korhonen et al., 2008) Korhonen et al. (2008) use the Nightingale et al. (2000) parameterization for the air-sea emission flux of DMS while both UKESM1 and GC3.1 use the parameterization of Liss and Merlivat (1986). The emitted flux in Nightingale et al. (2000) varies as a function of U 2 10 in comparison to the linear dependence on U 10 in the Liss and Merlivat (1986) parameterization. The stronger windspeed dependence will lead to higher DMS emissions particularly at high windspeeds when the Nightingale et al.

5
(2000) parameterization is used. The use of Liss and Merlivat (1986) is supported by recent direct measurements of DMS air-sea exchange (Yang et al., 2011) and studies which show that the high solubility of DMS results in lower air-sea transfer velocities at high winds compared to less soluble gases like CO 2 (Bell et al., 2013;Wanninkhof, 2014). Revell et al. (2019) highlight further possible biases in the simulation of sea salt and role of DMS chemistry in aerosol biases in the Southern Ocean. Our evaluation of AOD and N d is in agreement with Revell et al. (2019) who use a variant of the UKESM1 and GC3.1 10 atmosphere model, GA7.1, with the full StratTrop chemistry scheme so further improvements in these areas will be investigated in the future.
Over NH continents, such as Europe and northeast America, missing species such as nitrate aerosol is likely to contribute to low biases found in NH AOD. However, underestimation of the surface SO 4 and BC aerosol mass is also found. In-depth analysis here of the oxidation timescales in both models highlights the important contribution of differences in the aerosol 15 chemistry schemes and driving oxidant fields. Consistent with the underestimation of continental aerosol mass, N tot and N 50 are also underestimated in these regions. In addition, the evaluation of the size segregated number concentrations highlights a potential overestimation of both N 50 and N tot over most ocean regions, with the exception of high latitude oceans. This issue appears to be worse in UKESM1 due to the higher number concentrations in all size modes in UKESM1. This is believed to be driven by difference in the treatment of natural aerosol, different oxidation and scavenging lifetimes, but also aerosol 20 higher nucleation rates in UKESM1 which increase nucleation mode number concentrations which can subsequently grow via condensation and coalescence to larger CCN sizes. Model differences in N d , the variable most important for the aerosol-cloud forcing, are relatively small and are typically less than 10%. Indeed over ocean, where the satellite data is most reliable, both models underestimate N d in high latitude regions but have a positive bias in the marine stratocumulus cloud regimes which are consistent with an overestimation of N 50 found in Californian stratocumulus clouds. 25 There are a number of caveats pertaining to the evaluation of the aerosol number concentration and N d , most importantly the lack of representative measurements on the global scale. The evaluation of aerosol number presented in this study comprises observations from a large number of measurement campaigns compiled by GASSP as well as ground-based measurements.
Use of campaign data which are often targeting specific physical processes or aerosol regimes may not be appropriate for evaluation of a global climate model but to-date this remains the most comprehensive dataset available (Reddington et al.,30 2017). Similarily, satellite retrievals of N d have large uncertainties and have not been fully validated in different cloud regimes.
Having globally representative aerosol measurements is essential in order to constrain global aerosol microphysical processes and subsequent aerosol forcing in models. Long-term monitoring networks are predominantly close to source with often sparse aerosol information in remote oceans regions which are often key regions of aerosol forcing uncertainty (Regayre et al., 2018).
Notwithstanding these limitations, the overall simulation of aerosol in the historical free-running climate simulations of both models compares remarkably well with the observations in this study.

Conclusions
The aerosol scheme employed by the physical and Earth system models, HadGEM3-GC3.1 and UKESM1, is documented in detail. Differences in the aerosol representation relate to the interactive simulation of the natural aerosol emissions in UKESM1, 5 including dust, DMS and terrestrial BVOCs as well as the inclusion of a new marine organic aerosol source replacing the scaled marine DMS emissions in GC3.1. The impact of these differences on the aerosol distributions are fully characterized and a detailed evaluation of the present-day period of the historical CMIP6 simulations is conducted.
Overall, both models compare well with observations and capture the global spatial distributions in AOD and cloud droplet number concentrations. Some regional biases are noted including an overestimation of N d in the marine stratocumulus cloud 10 regimes and an underestimation of aerosol optical depth in dust-dominated regions. Regional trends in surface sulphate concentrations are well represented in the models although they generally tend to underestimate the absolute magnitude of the sulpate concentrations over Europe and the eastern US while overestimations are apparent over the western US are found. The inclusion of the interactive BVOC emission scheme and marine organic aerosol source in UKESM is found to improve surface mass concentrations of organic aerosols. The inclusion of marine organic aerosol is furthermore found to improve the seasonal 15 cycle of cloud droplet number concentration in the Southern Ocean although biases associated with the interactive simulation of DMS and Chl-a in UKESM1 are evident. Future model developments will focus on improving these prognostically coupled components and an in-depth evaluation of the chemistry-aerosol coupling will be conducted via detailed evaluation of the complete sulphur cycle including sulphate aerosol production rates.
In the development of UKESM1 we consciously worked to ensure as many of the process and cross-component couplings 20 were fully prognostic and interactive as possible allowing the model to simulate a large set of future feedbacks. Based on this we believe UKESM1 is one of the most process and coupling complete ESMs available today, in particular with respect to aerosols. It is therefore highly encouraging that such interactions with the terrestrial and ocean biogeochemical and atmospheric chemistry systems not only do not degrade present-day model performance but in many instances improve the present-day comparison against observations. This builds confidence in the use of this model in the wide-ranging forcing and feedback 25 experiments being conducted as part of CMIP6 and the potentially important role of aerosol in modulating or amplifying future climate feedbacks. While GC3.1 also compares well on the whole against observations, limitations with respective to the simplified chemistry scheme employed and the representation of natural aerosol sources are evident. The implications of such ES interactions on the aerosol forcing will be explored in more detail in a future study.
Code and data availability. Due to intellectual property rights restrictions, we cannot provide either the source code or documentation papers Obtaining the UM. The Met Office Unified Model is available for use under licence. A number of research organisations and national meteorological services use the UM in collaboration with the Met Office to undertake basic atmospheric process research, produce forecasts, develop the UM code, and build and evaluate Earth system models. For further information on how to apply for a licence, see http://www.metoffice.gov.uk/research/ modelling-systems/unified-model (last access: 13 December 2019).
Obtaining JULES. JULES is available under licence, free of charge. For further information on how to gain permission to use JULES for research purposes see http://jules-lsm.github.io/access_req/JULES_access.html (last access: 13 December 2019).
The HadGEM3-GC3.1 historical simulations are identified by the following Variant Labels: r1i1p1f3 to r4i1p1f3. UKESM1 historical simulations are identified by the following Variant Labels: r1i1p1f2 to r4i1p1f2, r5i1p1f3 to r7i1p1f3, and r8i1p1f2 to r9i1p1f2. We acknowledge the use of MODIS AOD data from https://earthdata.nasa.gov and ESA CCI AOD data from http://www.esa-aerosol-cci.org.
Information on the EMEP network can be found in Tørseth et al. (2012). For making their measurement data available to be used in this study we would like to acknowledge the EMEP (http://ebas.nilu.no/), IM-PROVE (https://views.cira.colostate.edu/fed/), and EANET (https://www.eanet.asia/product/index.html) measurement networks along with any data managers involved in data collection. IMPROVE is a collaborative association of state, tribal, and federal agencies and international partners. The U.S. Environmental Protection Agency is the primary funding source, with contracting and research support from the National Park Service.