The CHIMERE v2020r1 online chemistry-transport model

Abstract. The CHIMERE chemistry-transport model v2020r1 replaces the v2017r5 version and provides numerous novelties. The most important of these is the online coupling with the Weather Research and Forecasting (WRF) meteorological model via the OASIS3 – Model Coupling Toolkit (MCT) external coupler. The model can still be used in offline mode; the online mode enables us to take into account the direct and indirect effects of aerosols on meteorology. This coupling also enables using the meteorological parameters with sub-hourly time steps. Some new parameterizations are implemented to increase the model performance and the user's choices: dimethyl sulfide (DMS) emissions, additional schemes for secondary organic aerosol (SOA) formation with volatility basis set (VBS) and H2O, improved schemes for mineral dust, biomass burning, and sea-salt emissions. The NOx emissions from lightning are added. The model also includes the possibility to use the operator-splitting integration technique. The subgrid-scale variability calculation of concentrations due to emission activity sectors is now possible. Finally, a new vertical advection scheme has been implemented, which is able to simulate more correctly long-range transport of thin pollutant plumes.


• ntotprocs and ratioproc: with the previous offline version, the user had the possibility to specify the number of subdomains in x and y directions, and only for CHIMERE. With the online coupling in this version, two models are running at the same time, WRF and CHIMERE, and they have to share the whole amount of available processors. The user has only to specify the 60 total number of available processors. If the user wants to favor one of the two models, it is possible to change the ratioproc value. For example, if 60 processors are available on the cluster, ntotprocs=60 and for ratioproc=2, 20 processors will be assigned to WRF and 40 to CHIMERE.
• accurmet: Enables to select what meteorological variables are written in the out file. Possible values are low, medium or full.
• imethod and splitcfl: These flags are added to manage the pollutants' concentrations calculation. Historically, CHIMERE uses a method integrating all processes with a production and loss term. Many models are also using operator-splitting: this later possibility was added in CHIMERE. The flag splitcfl is here to separate the time-step calculation between meteorology and chemistry when the operator splitting technique is chosen. Details are presented in section 10.2.
• timeverb: A flag to have, verbose, on screen, the time spent in each routine or part of the model. It is mainly for developers, i.e. if you decide to add a new parameterization in the code and want to check the time used.

70
• outloc: A flag to write output results in the same directory as the executable file. Sometimes, depending on the computer architecture, to write output files on remote disks may cause instabilities and bus errors. To avoid this problem, output files are written locally, then moved to remote disks at the end of the simulation only.
• imegan: It enables to select the landuse database used by MEGAN to calculate the biogenic emission fluxes. Two datasets are available in the CHIMERE distribution:

75
• imegan=1 for the dataset with a resolution of 2.5 min, with monthly LAI.
• iuse_firemis and ifirevprof: These flags are for the use of fire emissions. These emissions have to be pre-calculated, using external data and models. Details are provided in section 8.4.
• ilinox: To take into account NO x emissions by lighning.
Values must be real and positive, and zero is possible.
• online: The management of the coupling. It selects if the user wants a simulation in offline or online mode.
• cpl_case: Type of coupling in case of online=1. Four possibilities are implemented: 100 1. cpl_case=1: The exchanges only go from WRF to CHIMERE. This is the same as in offline mode, except that in this case the flow is through OASIS and not by reading previously prepared WRF NetCDF files. In addition, CHIMERE receives the meteorological state at the physical time-step (selected by the user and generally about 10min) and not only hourly.
2. cpl_case=2: The Aerosol Direct Effect. This case is equivalent to case 1 with, in addition to, the exchange of aerosol 105 optical properties from CHIMERE to WRF.
3. cpl_case=3: The Aerosol Indirect Effect. This case is equivalent to case 1 with, in addition to, the exchange of the cloud fraction and the droplet number from CHIMERE to WRF.
• geodata, dgrb, wrfrestart, wrf_restart_dir, wrf_tstep_grid: These flags are linked to the use of WRF in online mode. They 110 are not used in offline mode. geodata and dgrb are data paths for input data. wrfrestart and wrf_restart_dir are for the management of the restart files in case of chained simulations. wrf_tstep_grid is a parameter to control the time-step estimated for WRF and is mandatory in its namelist. In general, it is recommended to have a maximum time step, in seconds, equal to 6 × dx, with dx as the grid cell size in km. Here, the user can modulate this value.
• nest_topconc: The users can select the way to use model data at the top of the domain in the case of nesting.

115
• ithermo: This option is used by SOAP to compute the partitioning of semi-volatile compounds.
3. ithermo=3: non-ideal partitioning with interactions with inorganics ions. This option is used only with the H 2 O mechanism.
and compensate the errors in mixing ratio introduced thereby, even though this introduces errors in mass conservations.
• idiagblh: A diagnostic report of the Boundary Layer Height (BLH) has been available in CHIMERE for many versions. With this new option, it is now possible to select the BLH to use: in case of coupling, the BLH calculated in WRF may be directly 130 read and used. The way to estimate this BLH therefore depends on the user's choice with the WRF's namelist. The same is true for the diagnostic report or use of the friction velocity with idiagustar and the surface sensible heat flux with idiagshf.
• istopdom: This flag enables the creation of a horizontal domain using the WPS and WRF tools and then stop. It is useful if the user wants to check the domain before launching a simulation or use the created geography file for the calculation of surface emissions (with the emiSURF or APIFLAME programs for example).

135
• cfl_max: A limit to ensure that the Courant Friedrich Levy (CFL) number is respected. The maximum is 1. But for numerical stability during horizontal transport, cfl_max=0.8 is recommended.
• sgmodel: The subgrid scale model for anthropogenic emissions. Note that in this case, anthropogenic emissions must be provided with activity sectors for each chemical species.  For all simulations, the domain was chosen large enough to take into account all possible kinds of sources, Figure 2 . By including the northern half of Africa and Europe, the modelled region is able to take into account anthropogenic, biogenic, biomass burning, sea-salt and dust sources, with an horizontal resolution of 60 × 60km. The whole year of 2014 was modelled.

180
Meteorological parameters have also been validated using two datasets. The ECA&D project provides the E-Obs database containing measurements for a large number of stations over the European region. This dataset provides values for daily temperature (average, minimum and maximum), daily average wind speed and direction, relative humidity and precipitation; the data can be downloaded both in a gridded format or per station files. The data provided by the British Atmospheric Data Centre network (BADC, http://data.ceda.ac.uk/badc) is also integrated into the validation chain, providing hourly values for 185 temperature, wind speed and precipitation. While these results are not presented in this article, they are checked in regular validation processes performed for all simulations.
All datasets are integrated into the Evaltools python package (https://opensource.umr-cnrm.fr/projects/evaltools, version 1.0.6r) which is designed as a free assessment tool for atmospheric models versus in-situ observations. Given the output for all simulations and all the observations, this tool provides a statistical comparison between the aforementioned simulations and 190 the provided measurements. The tables and images presented in this paper are all produced by this tool.

Improvement of v2020r1 compared to v2017r5
In order to quantify the quality of this new version, two simulations are performed, one with v2017r5 and one with v2020r1.

230
The time-series of surface concentrations of PM 10 is presented in Figure 5 . The two model configurations are able to simulate the annual variablity of Particulate Matter, to the same order of magnitude. The new version gives lower values than v2017r5, but with a better day to day variability (correlation shown in Table 2 ).  correlations for each model level are shown in Figure 6 . The coordinates and the number of sondes are given on top of each section.

Vertical profiles of O 3
Apart from surface measurement validations, we have also included a vertical profile comparison between v2017r5 and 235 v2020r1. Since the model is used for national air quality forecast simulations, it is important to have a good vertical repre- level is, the more visible the differences between the two versions become: in most stations, the 2020r1 manages to simulate higher altitudes better regarding the correlation, spread of the concentrations, and bias. This is especially visible looking at the correlation of each level shown in Figure 6 , which in the majority of cases is higher for the v2020r1 compared to the 2017r5.
The average statistics for all stations are also included at the end of Table 3 , showing general improvement on correlation, bias, and RMSE in the v2020r1. Online models are dedicated to take into account the retroactions between radiation, clouds and aerosols in the atmosphere, 255 (Zhang, 2008), (Baklanov et al., 2014). The model developed here is in the category online access model and couples the Weather and Research Forecasting (WRF), (Powers et al., 2017), and CHIMERE, , regional models with the OASIS3-MCT coupler, (Craig et al., 2017). Two kinds of retroactions are considered: the direct and the indirect effects, respectively detailed for WRF-CHIMERE in Briant et al. (2017) and Tuccella et al. (2019). The variables exchanged between the two models are described in Figure 7 . The exchange frequency ranges from 1 to 30 min, depending on the horizontal 260 resolution and the user's choice. The meteorology is calculated using the version 3.7.1 of WRF and with a configuration identical to the one used in Menut et al. (2018). The model uses spectral nudging to follow the large scale dynamics, (Von Storch et al., 2000). The National Centers for Environmental Prediction (NCEP) fields are nudged for pressure, temperature, humidity, and wind and for wavelengths greater than ≈ 2000 km and only for pressure above 850 hPa. The WRF model remains thus free to create its own dynamics 265 in the boundary layer. This configuration allows the regional model to create its own dynamics within the boundary layer. The large scale follows the thermodynamics fields from the NCEP analyses.
For the model distribution, note that, for the first time, CHIMERE is distributed with OASIS3-MCT3.4 and WRF v3.7.1.
OASIS is used as is but WRF was modified in order to be able to receive data via the OASIS coupler.

Direct effects 270
The direct effect impacts the thermodynamics below an aerosol layer after scattering and absorption of solar radiation, (Haywood and Boucher, 2000;Helmert et al., 2007;Zhang, 2008). To take into account this effect, the CHIMERE model calculates Aerosol Optical depth (AOD), Single Scattering Albedo (SSA), and an asymmetry factor and sends it to WRF (in place of the climatology used by the meteorological model when it is used alone). These new informations varies every physical time-step (i.e a few minutes) and are used in WRF by the RRTMG scheme (Iacono et al., 2008). Twomey's effect). A variation in CCN burden induces also changes in cloud lifetime and precipitation pattern (second indirect effect or Albrecht effect), (Andreae and Rosenfeld, 2008).

310
For cpl_case=1, there is no aerosol feedback from CHIMERE to WRF but the meteorological parameters is sent from WRF with the coupling time step. This is still a step forward when compared to the previous CHIMERE versions that were only able to read hourly meteorological outputs. For cpl_case=2, only the direct effects are taken into account. For cpl_case=3, only the indirect effects. With cpl_case=4, the direct and indirect effects are both taken into account.

340
Results show the bias increases with the coupling for the mean ozone concentrations values. But for the daily maximum ozone values, the bias decreases. The use of the online coupling is more realistic but may be also an improvement for the forecast, when searching for exceedances based on threshold values.
Statistical scores for these simulations are also presented in Table 5 . For NO 2 , the bias remains the same and decreases for PM 10 , but increases for PM 2.5 . For the correlation, the results show low changes among the different configurations. This 345 means that the online coupling is an improvement for modelling realistic physico-chemical interactions, but is not a major change on average for surface pollutants concentrations.  Regional scale chemistry transport models such as CHIMERE do not resolve physical and chemical processes occurring at spatial scales bellow 1km 2 . If such effects are to be accounted for, they must be parametrized. Heterogeneous surface emissions 350 at the unresolved subgrid scale (i.e. at spatial scales smaller than the model resolution) have been shown to have a large impact on grid-averaged pollutant concentrations, (Galmarini et al., 2008;Valari and Menut, 2010;Korsakissok and Mallet, 2010;Cassiani et al., 2010). Especially over urban areas with large population density, very contrasted emission sources, such as roads and buildings co-exist within small distance. To account for this subgrid scale effect the new version of CHIMERE includes the scheme described in Valari and Menut (2010) following which the CHIMERE model grid cell area is divided 355 into several subgrid areas, each representing a different emission source. During the CHIMERE simulation, at each model time-step, the calculation of chemistry splits into the different subgrid volumes.  Figure 10 . It is shown that the 'on-road' component of the subgrid-scale simulation models significantly higher NO 2 levels compared to the grid-averaged value especially during rush hour. On the contrary, as expected, over the urban background monitor site the grid-averaged concentration is generally closer to observations. 7 Secondary organic aerosol (SOA)

365
The Secondary Organic Aerosols calculations changed a lot with this model version. Mainly, two different schemes were added: the H 2 O (Hydrophilic/Hydrophobic Organics) mechanism, , and the VBS scheme, (Zhang et al., 2013), with a version including the fragmentation process, (Cholakian et al., 2018).

The several mechanisms
There are two different choices available for the Secondary Organic Aerosols (SOA) modeling. The first one is the flag carb The values of carb are: • carb=0: no primary carbonaceous species • carb=1: primary carbonaceous species (OCAR and BCAR) are used.
Important: when selecting the carb flag, it is important to be consistent with the choice made for this variable in the emisurf program/namelist if the user is using this tool.  Table 7 . Note that it is possible to use carb=0 and soatyp=4 or 5. In this case, POA are not used to estimate the relative part of organics in SOAP (see  for details).

Single-step oxidation mechanism
For soatyp equal 1, 2, or 3, the formation of SOA is taken into account with single-step oxidation schemes. Anthropogenic precursors include: • TOL (benzene, toluene and other mono-substituted aromatics), • TMB (Trimethylbenzene and other poly-substituted aromatics),
The base SOA module was tested against the smog chamber data of Odum et al. (1997) for anthropogenic compounds and those of Griffin et al. (1999) for biogenic compounds, and was shown to satisfactorily reproduce SOA formation for these compounds. Higher alkanes and isoprene were added to the original chemical mechanism of Pun et al. (2006). The formation of SOA from higher alkanes follows the formulation of Zhang et al. (2007) for the stoichiometric SOA yield and it is assumed 415 that the SOA species can be represented by a hydrophobic surrogate compound with moderate saturation vapor pressure. The formation of SOA from the oxidation of isoprene by hydroxyl radicals is represented with two surrogate products and follows the formulations of Kroll et al. (2006) and Zhang et al. (2007).

The H 2 O mechanism
The H 2 O mechanism is presented in detail in Couvidat et al. (2012) and its performance was evaluated over Europe in Couvidat 420 et al. (2018). This mechanism is an evolution of the Pun et al. (2006) mechanism. It uses the molecular surrogate approach, into which semi-volatile organic compounds with similar thermodynamical properties are lumped together and are represented with a single molecular structure. By attaching a molecular structure to the surrogate structure, several processes can be represented that are often not taken into account. For example, absorption of organic compounds into the aqueous phase of aerosols, the acidic dissociation and non-ideality (interactions between organic compounds and with inorganics compounds) can be estimated. The model distinguishes two types of compounds: hydrophilic compounds (compounds with a high affinity for water that can condense into the aqueous phase of particles) and hydrophobic compounds (compounds with a low affinity for water that condense into an organic-rich phase).
The mechanism takes into account SOA formation from the oxidation of isoprene, monoterpene, sesquisterpenes and aromatic compounds. The properties of the surrogate compounds are described in the parameter file species.cxx and are summa-430 rized in the Table 2 of Couvidat et al. (2018). It should be noted that some compounds (like AnClP or POA) do not have a predefined molecular structure. In that case, a hydrophobic default structure is used.
The complete mechanism (soatyp=4) has 15 species and has been reduced to 7 species (soatyp=5). Contrary to the other SOA mechanisms, equilibrium is computed with SOAP to account for non-ideality and aqueous-phase absorption. ithermo in chimere.par must be above 1.
SOAP computes the gas-particle partitioning into an organic phase based on Raoult's law by taking into account the nonideality (interactions between organic compounds) of the organic phase. The partitioning is computed with the following algorithm.
with T, the temperature (in K), M ow , the mean molar mass of the organic phase (in g mol −1 ), and the saturation vapour 460 pressure, P 0 i , in Torr. In SOAP, activity coefficients γ i,org are computed with the thermodynamic model UNIFAC (UNIversal Functional group Activity Coefficient, (Fredenslund et al., 1975)).
Similarly, the partitioning into the aqueous phase of particles (due to water being absorbed by inorganic compounds) is computed with a non-ideal Henry's law: with H i (in M.atm −1 ), ρ water , the density of the aqueous phase (in kg m −3 ), M aq , the molar mass of the aqueous phase (in g/mol), which can be slightly different from the molar mass of water due to the presence of other compounds, and ζ i , the activity coefficient by reference to infinite dilution. ζ i is computed with: where γ ∞ i,aq is the activity coefficient at infinite dilution in water, which is computed with UNIFAC. However, UNIFAC 470 only computes the activity coefficients due to short-range interactions and does not take into account medium and long range interactions due to the presence of electrolytes in the aqueous phase. In the aqueous phase, activity coefficients are computed with: γ LR , γ M R , and γ SR are, respectively, the activity coefficients at long, medium, and short range interactions. γ SR is computed 475 with UNIFAC, whereas γ LR and γ M R are computed with the AIOMFAC method where parameters, (Zuend et al., 2008(Zuend et al., , 2011Zuend and Seinfeld, 2012), therefore the influence of inorganic ions on the partitioning of organic compounds, are taken into account.
SOAP can be used with three ithermo flag values in chimere.par. For ithermo=1, ideality is assumed, with reference to Raoult's law for hydrophobic partitioning (γ i,org =1) and Henry's law for hydrophobic partitioning (ζ i =1). For ithermo=2, non-480 ideality is computed with UNIFAC, but long and medium range interactions are not taken into account (i.e. no interactions with electrolytes). Hygroscopicity of organic compounds is also taken into account. For ithermo=3, long and medium range interactions are taken into account. The effect of non-ideality on SOA formation has been studied by Couvidat et al. (2012) and Kim et al. (2019). For carb equal to 2 or 3 in chimere.par, the POA are distributed in bins covering the range of 10 −2 µg.m −3 to 10 6 µg.m −3 , (Robinson et al., 2007). These volatility bins cover three groups:
The POA compounds are listed in the Table 9 .
1. VBS scheme with functionalization process: For carb=2, the fraction of POA in the gazeous phase can be oxidized by OH radical to form an oxidized POA noted OPOAx (x represents the bin of volatility defined in Table 9 ).  Cholakian et al. (2018). Two processes were added to the VBS scheme presented above. Fragmentation processes, corresponding to the breakup of oxidized OA compounds in the atmosphere into smaller and thus more volatile molecules, (Shrivastava et al., 2011) and the formation of nonvolatile SOA, where SOA can become nonvolatile after forming, (Shrivastava et al., 2015).

VBS scheme with functionalization process
For soatyp=6, the formation of SOA from VOC is taken into account with the VBS scheme with functionnalization process. Anthropogenic precursors include: • OLE1: Alkens k OH <7x10 4 • OLE2: Alkens k OH >7x10 3

505
• ALK4: Alkans 5x10 3 <k OH <1x10 4 • ALK5: Alkans 1x10 4 <k OH • ARO1: Aromatics k OH <2x10 4 • ARO2: Aromatics k OH >2x10 4 SOA yields were kept the same as in the standard VBS scheme described in Zhang et al. (2013); however, instead of 510 using the low-NO x or the high-NO x regimes, an interpolation between the yields of these two regimes was added to the model. For this purpose, a parameter is added to the scheme, which calculates the ratio of the reaction rate of RO 2 radicals with NO (high-NO x regime) with respect to the sum of the reaction rates of the reactions with HO 2 and RO 2 (low-NO x regime). For this purpose, a parameter (α) was added to the scheme, which calculates the ratio of the reaction rate of RO 2 radicals with NO (ν N O , and high-NO x regime) with respect to the sum of reaction rates of the reactions with It represents the part of RO 2 radicals reacting with NO (which leads to applying "high NO x yields"). It is calculated for each grid cell by using the instantaneous NO, HO 2 , and RO 2 concentrations in the model. Then, the following equation is used to calculate an adjusted SOA yield using this α value, (Carlton et al., 2009).

VBS scheme with functionalization and fragmentation process
For soatyp=7, the formation of SOA from VOC is taken into account with the VBS scheme with the functionalization and fragmentation process. The fragmentation processes for the SVOC start after the third generation of oxidation because fragmentation is favored with respect to functionalization for more oxidized compounds. Therefore, three series of 525 species in different volatility bins were added to present each generation, similar to the approach setup in Shrivastava et al. (2013). For biogenic VOC, fragmentation processes come into effect starting from the first generation, as in Shrivastava et al. (2013), because the intermediate species are considered to be more oxidized. A fragmentation rate of 75% (with 25% left for functionalization) is used here for each oxidation step following Shrivastava et al. (2015). The formation of non-volatile SOA is performed by moving a part of each aerosol bin to nonvolatile bins with a reaction constant 530 corresponding to a lifetime of 1h, similar to Shrivastava et al. (2015).

Natural emissions
Numerous changes were made in this model version for natural emissions: Dimethylsulfide emissions were added, sea-salt emissions have new parameterizations, mineral dust emissions were improved with mineralogical speciation, biomass burning can be calculated either with CAMS or APIFLAME models and with new vertical profiles; NO x emissions by lightning were 535 added.

Dimethylsulfide emissions
The dimethylsulfide (DMS) emissions may now be calculated over the oceans. DMS is a major source of atmospheric sulfur.
It contributes to cloud condensation nuclei (CCN) and has thus an impact on cloud formation. DMS atmospheric emission fluxes are parameterized and correspond to a fraction of aqueous DMS sea surface concentrations (nM). These concentrations In CHIMERE, these monthly mean global fields are projected on the current simulation domain, using landuse information to realistically downscale the values. Using these concentrations, the emitted flux (from ocean to air) is expressed as, (Zavarsky 545 et al., 2018): where k (m s −1 ) is the transfer velocity (also called "piston velocity" in some studies). c air and c water are the concentrations of dimethylsulfide in the air and the water (ocean), respectively, then in gas and liquid phase. H is the Henry's law constant and represents the gas solubility, depending on sea salinity and temperature. The most important assumption in the flux calculation 550 is to consider that c water >> c air . This is the case in all situations where studies showed that there is at least an order of magnitude between the two. The equation reduces to: The key point is to estimate k, the transfer velocity. Elliott (2009)  velocity k w . These four schemes are implemented in CHIMERE and may be selected using a dedicated user's option. The schemes are Liss and Merlivat (1986), Wanninkhof (1992), Nightingale et al. (2000), and Elliott (2009). The last one is a blend of the first three schemes.
The schemes correspond to best-fit functions built to parameterize the k value. This value depends on the wind speed in the atmosphere (the 10m wind speed is used here) and the sea surface temperature via the calculation of the DMS Schmidt number.

560
Equations used are those explained in Kettle and Andreae (2000).
After the emission, the DMS reacts chemically in the atmosphere. Chemical reactions were added in the model, both in the MELCHIOR and SAPRC mechanims. DMS is oxidized mainly by OH during the day and by NO 3 radicals during the night.
It produces SO 2 , then sulfate SO 2− 4 (Tesdal et al. (2016)). Another pathway of the OH oxidation forms dimethylsulfoxide (DMSO), intermediate methane sulfinic acid (MSIA), and methane sulfinic acid (MSA) (Chen et al. (2018)). The complete 565 chemistry of DMS being extremely complicated, a reduced version is implemented in CHIMERE. The chemical scheme suggested by Mihalopoulos et al. (2007) is used. While aqueous reactions are also available in this scheme, they were not implemented in CHIMERE, since their importance compared to the gas-phase DMS chemistry is small. The final set of added reactions is: F is the flux of sea salt particle number in (particles m −2 s −1 µm −1 ), r is the particle radius (µm), and U 10 is the 10m wind speed (m s −1 ). The sea salt flux is calculated with several limitations: only for 10m wind speed less than or equal to 30 m s −1 and with a fixed size distribution. This scheme is for the indirect mechanism (bubbles bursting) and not the direct mechanism (spume) as explained by Grini et al. (2002). For the direct mechanism, the parameterization proposed by Monahan (1986) was 585 found to produce too much and is thus not implemented in our case. This scheme is designed to be used for particles bigger than 0.8 µm, but is used for all sizes in many models, such as in CHIMERE's previous version. In order to have an alternative method for small size particles, two additional schemes are implemented: (Martensson et al., 2003) and (Grythe et al., 2014) schemes. In addition, a mix is possible between (Monahan, 1986) and (Martensson et al., 2003) for the size distribution. This mix can be selected by the user by using iusesalt=2. In this case, the flux is calculated using Monahan (1986), but the way to The scheme of (Martensson et al., 2003) is available with iusesalt=3. This scheme is designed for an aerosol mean mass median diameter of until D p =2.8 µm. Up to D p =2.8 µm, the mass flux is estimated using (Monahan, 1986). The interest of this scheme is that the size distribution is temperature dependent. This scheme makes also the differences between the dry and wet radius of the particle for the calculation.

595
The same temperature influence is taken into account with the scheme of Grythe et al. (2014)  The problem of using a threshold is that the transition between a zero and a non-zero value is very sensitive to the accuracy of the forcing parameter and on its spatial and temporal representativity. For the mineral dust emissions calculation, the key parameters are the 10m wind speed and the surface and soil type, including the aeolian roughness length : , Shao (2001); Engelstaedter et al. (2006). Their spatial representativity is the one of the model cell , in general from 5km to 1 o ; it can be very large and much higher ::: and :::: may :: be ::::::: different : than the local phenomenon to model.

615
The best way to represent such processes is to calculate distributions: the input parameters are not used with a single value, which has high uncertainty, but with their distribution. This distribution represents at the same time the uncertainty and the spatio-temporal variability around the mean averaged value.
For the 10m wind speed, the subgrid scale variability was first represented using a Weibull distribution, (Menut, 2008). The Weibull k shape parameter was constant and equal to k=4. It was shown that the emissions were very sensitive to a change of 620 the k value. In fact, there is no reason for the k parameter to be constant. Its value could depend on many parameters to express the surface and soil variability in the modelled cell. In Menut (2018), the k parameter is linked to the subgrid scale variability of the orography. The differences in emissions are quantified over Africa: the correlation is slightly improved but the RMSE, compared to AERONET AOD data, is well decreased. The use of the k parameter variability is a user's option.
For soil and surface properties, a subgrid scale variability was also implemented. In general, global, and regional models use

Mineralogy
Among many others, one interest of the mineral dust modeling is to calculate accurate aerosol deposition fluxes : , : Sokolik and Toon (1999). For biogeochemical studies, the deposition of mineral dust over the ocean is of interest, in particular for phytoplancton growth, : Zhang et al. (2015). In this new model version, the detailed mineralogy of dust was implemented : , :::::::: following ::: the :::::::::: development ::: of Balkanski et al. (2007). Instead of one mineral dust species called 'dust', twelve mineral species 635 are emitted, transported, and deposited. They are: calcite, chlorite, feldspar, goethite, gypsum, hematite, illite, kaolinite, mica, quartz, smectite, and vermiculite : , Claquin et al. (1999); Balkanski et al. (2007). Each one has a specific density, silt/clay ratio, and refractive index. As with the mean mineral dust, these species are described with a binned size distribution (in general 10 bins as recommended in the parameter file). The complete parameterization and results are presented in Menut et al. (2020b).
The use of this mineralogy may be numerically expensive due to the larger number of emitted species (twelve), which are then 640 transported and deposited. This doubles the number of aerosols in the CHIMERE model. The pre-processor for ECMWF CAMS data formatting is a program distributed with the new CHIMERE version. The goal is to download the ECMWF model outputs of Kaiser et al. (2012) and adapt them to the CHIMERE format. For this formatting, several conversions are needed: projecting the data onto the CHIMERE simulation grid, converting the proposed chemical model species to the CHIMERE chemical mechanism species, and temporally refining the data by shifting the time resolution 655 from daily to hourly fluxes. All available chemical species are downloaded and used to create the CHIMERE emitted species.

Biomass
Temporally, having no realistic information to move from daily to hourly emissions fluxes, an academic diurnal profile is applied to the daily data as: with the constant values: h, the current GMT hour, d w =0.2, d s =2.0, t 0 =13.5, and δ=longitude/360x24. An example is 660 displayed in Figure 11 .

APIFLAME v2
APIFLAME was designed to calculate the emissions from biomass burning at high spatial and temporal resolution with flexibility in terms of the input data used. While the CAMS approach is based on fire radiative power (FRP) detection (active fires), APIFLAME calculates emissions based on the burned area and resulting biomass consumed (in terms of carbon), as described the ORCHIDEE land surface model (Turquety et al., 2014). Emissions for inventory species are then calculated using a list 670 of emission factors, and converted to emission fluxes for CHIMERE species using aggregation matrices accounting for the reactivity to OH (either the SAPRC or MELCHIOR chemical scheme).
The main improvement included in version 2  is related to temporal variability, with the possibility to: (i) redistribute the total monthly burned area (MCD64 data) according to the FRP value (MOD14 data) so that higher emissions will correspond to enhanced fire intensity; (ii) redistribute daily emissions using the diurnal variability of the FRP 675 in the geostationary observations (SEVIRI above Europe and Africa). Small fires that may be observed with the active fire detection (FRP), but that are not present in the burned area product, may also be included.
The flexibility of the use of input datasets for fire activity or vegetation type burned allows for the calculation models in an ensemble that provide information on the uncertainties related to these choices. The analysis of a case study in Portugal during the summer of 2016 shows an average dispersion of ∼ 75 % of daily carbon emissions. The list of species included may also 680 be modified without modifying the core of the model (input datasets).

The height and shape of the vertical injection
The CAMS or APIFLAME data provide a biomass burning surface flux. It is necessary to redistribute vertically the flux along a profile representing pyroconvection. The injection height, H p , estimation is based on the Sofiev et al. (2012) scheme, as: with α=0.24, β=170m, γ=0.35, and δ=0.6, P f 0 =10 6 W, and N 2 0 =2.5 × 10 −4 s −2 . N F T is the Brünt-Väisälä frequency and is calculated between H abl and the pressure level corresponding to 300hPa (to ensure it remains in the free troposphere). The FRP is expressed in [W] and is provided by the emission flux data. In the case of 'huge' fires, i.e. if the H p value is greater than 1500m AGL, the calculation is redone following Veira et al. (2015), after replacing F RP with F RP * = F RP H p /H deep , with H deep = 1500m.

690
After the estimation of the height of the mass injection, it is possible to estimate the shape of the injection profile. Three user options have been implemented in the model, Figure 12 .
-PR1: The mass flux is separated into two parts: 20% is injected between the surface and 0.9 × H p and 80% between 0.9 × H p and 1.1 × H p . Inside these two intervals, the flux is constant with altitude. This profile is close to the shape of the convective profile obtained using thermal parameterizations. -PR2: The emissions are injected between the surface and H p , following a profile close to the one used in Veira et al. (2015) with a K z -like shape: with EF as the emission flux and z n = z / H p .
-PR3: This profile also uses a K z -like shape, but with two maxima: one in the surface layer and one around H p .

700
These profiles were tested in Menut et al. (2018), with the simulation of biomass burning episodes in the center of Africa.
It was shown that, after long range transport, the atmospheric concentration of pollutants is not very sensitive to the shape of the vertical profile. It is mainly due to the fact that pyroconvection has higher temperature and more turbulence than the environment: the emitted species are quickly and homogeneously mixed in the whole column between the surface and H p .

NO x by lightning 705
The emissions of NO x from lightning were added in this version. These emissions occur in case of deep convection and convective precipitation. The detailed implementation and its impact on chemical concentration is explained in Menut et al. (2020a). The scheme implemented is simple and follows the Price and Rind (1993) scheme, based on the presence and vertical structure of convective clouds. The use of this parameterization is more realistic, in particular at tropical latitudes, when lightning flashes are the most often observed. These emissions improve the Aerosol Optical Depth in Africa but have limited 710 impact on the surface concentrations of pollutants in Europe.

Impact of several changes in the model
The previous sections presented all the changes made in this model version. For some of these changes, simulations were carried out in order to examine their impact. Note that compared to the section 3, the reference simulation was done here with WRF forced by NCEP global meteorological fields.

715
The following sensitivity tests were performed: • 'ref' is the reference simulation. The following simulations are done by changing one (and only one) parameter of this simulation.
• 'BCclim"': in place of the daily boundary conditions from CAMS, we use the climatological boundary conditions, distributed with the model and performed using the GOCART and LMDz-INCA global models. Results are presented in Figure 13 , with the daily mean bias (in µg m −3 ) and correlation (between -1 and 1) for surface ozone and PM 10 concentrations. For ozone, the reference case shows a mean daily bias of +3.29 µg m −3 . By changing the boundary conditions, this bias increased to +4.81 µg m −3 . The fact to not use biomass burning emissions reduced this bias to 3.10 µg m −3 . It means that the addition of fires emissions does not have a large mean impact on the ozone surface 730 concentrations. But this is normal as fires events, even if they may induce a large increase in ozone concentrations, are only present for a few days during a year. The change of the dry deposition scheme has a large impact for ozone, decreasing the bias a lot to 2.35 µg m −3 . The largest impact on ozone bias was from the new vertical transport scheme of Després and Lagoutière (1999): the bias was reduced to -0.05 µg m −3 . Finally, the change of the anthropogenic emissions inventory, by using HTAP in place of CAMS, also has an impact, with a decreased bias of 1.48 µg m −3 .

735
For PM 10 , the bias also changed a lot, depending on the used parameterization. With the reference case, the bias was -1.42 µg m −3 . The bias became larger and was always negative for the change in boundary conditions and when not taking into account biomass burning, with biases of -2.12 and -1.87 µg m −3 , respectively. The most important change was the bias becoming positive (+2.89 µg m −3 ) with the dry deposition scheme of Wesely (1989). For the use of the vertical transport scheme of Després and Lagoutière (1999), the bias remained negative but was reduced to -0.92 µg m −3 . Finally, in another 740 Figure 13. Bias (µg m −3 ) and correlation (-1:1) for O3 and PM10 surface concentrations, for several sensitivity tests. positive case, was the change of the emissions inventory, with a bias of +0.43 µg m −3 with the use of HTAP. For the correlation, for ozone and PM 10 , the changes were not as large as the biases. For ozone, the values were between 0.51 and 0.60 µg m −3 , when the reference was 0.56 µg m −3 , and, for PM 10 , the values were between 0.34 and 0.48 µg m −3 , when the reference was 0.43 µg m −3 . For several cases, some compensation errors were possible, but, it was noticeable that the use of the new vertical transport scheme of Després and Lagoutière (1999) was the only case where scores were improved for all pollutants. Reduction 745 on the positive ozone bias close to the ground could be linked to the antidiffusive properties of the Després and Lagoutière (1999) scheme, reducing erroneous downward transport of ozone from the higher troposphere which has been shown by, e.g., Emery et al. (2011), to be a major source of overestimation of ozone concentration in the midlatitudes. Implementation of the Després and Lagoutière (1999) scheme in CHIMERE is discussed in detail in Lachatre et al. (2020), which shows the strong reduction in vertical diffusion obtaine with the use of this scheme relative to Van Leer (1977). Mailler et al. (2021) discusses 750 in more detail the strength and weaknesses of this scheme relative to Van Leer (1977).
The model's time integration benefits from two main changes for this new version: (i) the three time-steps were optimized and the adaptive time-step is more efficient and robust. (ii) the process integration may now be done with the original approach of 'prod-loss budget' or through using the newly implemented approach of 'operator splitting'.

Time management
The time management in the model was already defined in Menut et al. (2013). In this model version, some changes were done, mainly due to the need to manage variable time-steps with the online version. Three time-steps are defined: the hour, a physical time-step, called dtphys, and a chemical time-step called dtchem.
The hour is obviously a fixed time step of one hour. This interval is mandatory since many input model data are provided at 760 an hourly frequency: meteorological fields, emissions, and boundary conditions, for example. It is also the frequency used to read the data in offline mode.
The physical time-step dtphys is defined by the user in the chimere parameter file. In offline mode, it indicates the value that the user wants to interpolate the hourly meteorology at for the transport ::::::::: calculation. The meteorological data is :: are : linearly interpolated between two consecutive hours, therefore the number of steps may change the simulated chemical concentrations 765 in the case of a rapid change in meteorology from one hour to the next one. In online mode, this time-step has another meaning and represents the exchange frequency of variables between WRF and CHIMERE.
The chemical time-step dtchem is also defined in the chimere parameter file. But, in this case, it is just a recommendation.
The CFL is calculated in each cell and at each time-step and if a lower dtphys value is necessary, the calculated value is used.
The dtphys value is estimated as: where CFL max is the maximum acceptable CFL value. Theoretically, CFL max =1, but to avoid numerical instabilities during the transport, CFL max =0.8 is recommended. If the user experiences instabilities, it is possible to reduce this value in the parameter file. ∆x is the grid cell width (m) and U is the wind speed (m s −1 ).

Operator splitting
The integration of all processes to update the concentrations in a chemistry-transport model is a well-known problem and several methods exist. Until the 2017 version, the update in time of the chemical concentrations was done using the Production/Loss approach, as presented in Figure 14 . In this case, all production (P) and loss (L) terms are calculated independently.
The update of a chemical concentration is thus done by integrating the equation: All production and loss terms are calculated at the same time and integrated all together. To solve this complex system, the numerical method used is the two-step algorithm developed by Verwer (1994). It consists of a Gauss-Seidel scheme to have a stable update of all concentrations, whatever their lifetime. Its strength is that there is no splitting error, linked to the order of the processes. Its weakness is that the time-step required is constrained to the smallest time-step, which is, general, the 790 time-step required for the chemical modelling. This means that many of the concentrations are updated for the transport and the mixing much more often than necessary. The results are more accurate but slower to obtain than with the splitting method.
In this version, the splitting-operator approach was added. It is a well known approach, widely used in chemistry-transport models, (McRae et al., 1982). Contrary to the two-step approach, the chemical concentration is updated with each process one by one. A new concentration is thus calculated with the transport, then with the mixing, then with the chemical reactions, etc.

795
Its strength is that the time-step used may be different between each process: knowing that the chemical reactions need a fast time-step, it is possible to update the chemistry with a small time-step, and to later update the concentrations with the transport, which uses a larger time-step. Globally, this speeds up the model integration. But its weakness is that the results depends on the order of the processes. The results are less accurate but obtained faster than with the prod/loss approach. Finally, knowing the strengths and weaknesses, the user can select their own approach, with, for example, the prod/loss approach for analysis 800 and the operator splitting approach for forecasting.

Flexibility on outputs
The output files may be very large, depending on the grid size and the simulation duration. More flexibility was added in this model version to have less variables in the output files if the user is interested in basic output variables only. This may be convenient for forecasting users, for example, where only tracked pollutants are used and not all defined variables in the model.

805
For the output concentration species, there is no change in this version. The difference between the low and full flags are whether to write the aerosols bin per bin or not. In all cases, the total mass of each aerosol species is written. For the deposition, the novelty is that by selecting nsdepos=0, the output file for dry and wet deposition is not created. This is a gain in runtime and disk space if the user is not interested in analyzing the deposition fluxes. For the meteorology, a new flag called accurmet is added in the parameter file, with three different possible values: low, medium and full. With accurmet=low, only the two-810 dimensional meteorological fields are written in the output: the 2m temperature and 2m relative humidity, the 10m wind speed, and the boundary layer height. With accurmet=medium, the previous fields are written, with, in addition, the three-dimensional fields of pressure, temperature, specific humidity, and wind components u and v. Finally, with accurmet=full, all meteorological fields treated by the model are written in output. This includes deep convection mass fluxes, ice, snow, and rain.
11 Nesting and nudging 815 ::: For ::: the ::::::: nesting, ::: the :::: way :::: how :: it :: is ::::::: managed ::::: with ::: this :::: new ::::::: version :: is :::::::: presented :: in : Figure 15 . : As with the previous version, CHIMERE authorizes nesting up to three levels. However, the namelist files can be easily changed to increase the depth of the nesting. For this version :: In ::::: offline ::::: mode, CHIMERE can be used as usual in the offline mode with meteorological forcing through WRF, ECMWF, or from other sources of data. In this case, WRF can be run in 1-way or 2-way nesting, then CHIMERE is run after, as usual, 820 for each domain with a 1-way nesting. If the online flag is activated (online=1), the model is fully coupled with WRF, with different complexities of coupling (cpl_case flag). For this case, the 2-way nesting is not permitted, each level of nesting is run, WRF and CHIMERE exchange information frequently but there is no online feedback to the level above to update both the species concentrations and meteorological fields. To achieve this 1-way nesting procedure the ndown.exe WRF executable is used, permitting the use of meteorological outputs coming from the above domain to drive the nested domain. Therefore, for a 825 given level of nesting, WRF and CHIMERE are always run using the downscaled information from the level above and never benefiting from the interior level, if it exists. A nested domain can use different numbers of vertical levels, as long as the top height of the nested domain is less than the top height of the parent domain.
The :: For ::: the :::::::: nudging, ::: the CHIMERE-WRF coupled suite can use the WRF nudging option to avoid strong divergences inside the models. This used in coarse model simulations (typically above 25km resolution) that are driven by global scale simulations, 830 commonly issued from GFS or ECMWF datasets that have been re-analysed with synoptic observations. At higher resolutions with nested domains, model nudging should be not necessary to allow the model to accurately develop the local physics and dynamics. For the chemistry, model nudging is not used, but the assimilation of observation data (from satellite and ground stations) is an area of interest that is not yet fully operational in this version of CHIMERE.

835
Computing performances and scalability are studied with a series of 1 day long run performed on the Irene Joliot-Curie machine of TGCC with varying number of cores dedicated to WRF and CHIMERE for the coupling case (cpl_case) 1 and 4. To increase the number of points and therefore allow the use of more cores, the MED20 domain (same domain as MED60 but at 20 km horizontal resolution) is used. The effective computational times spent in both components of the coupled system (WRF and CHIMERE) are obtained with the LUCIA tool of the OASIS3-MCT coupler (Maisonnave and Caubel, 2014) which enables 840 separation of computational time and waiting time. The scalability is presented in Figure 16 using a time to solution (in minutes per simulated day) criteria omitting any waiting time due to load imbalance between the two components. In both coupling cases, the CHIMERE behaviour remains the same with a good scalability up to 200 hundred cores. WRF behaviour differs in the cpl_case 4 when direct and indirect aerosol effects are at stake due to specific calculations in this case, however scalability remains correct up to 100 cores. In both cases WRF is faster than CHIMERE, therefore the speed of CHIMERE is 845 key in the overall performance of the coupled model.
The time to solution of the coupled model as a whole is presented in Figure 17 as a function of the ratio of cores attributed to CHIMERE to the cores attributed to WRF for 3 different constant total number of cores. Essentially for the cpl_case 1, the more cores that are given to CHIMERE, the better the performance is, especially if the total number of cores is relatively small.
given to WRF becomes too small, the performance decreases. In this particular experiment, a ratio between 2 and 4 seems to be optimal.
Those results show computational time for both components but in a coupled framework, they do not infer scalabiliity when used in standalone mode. Moreover, scalability and performance are strongly dependent on the machine and the size of the domain. Custom sensitivity analysis on each setup is always better, but as a rule of thumb a ratio of 2 to 4 chimere cores to 1 855 wrf core is recommended for good performance.
Data availability. All data used in this study, as well as the data required to run the simulations, are available on the CHIMERE web site download page https://doi.org/10.14768/8afd9058-909c-4827-94b8-69f05f7bb46d.
python package (https://opensource.umr-cnrm.fr/projects/evaltools). This work was granted access to the HPC resources of TGCC under the allocation 10274 made by GENCI (Grand Equipement National de Calcul Intensif). We thank the OASIS modeling team for their support with the OASIS coupler, the WRF developers team for the free use of their model. We thank the National Aeronautics and Space Agency for the availability of the MODIS data, the investigators and staff who maintain and provide the AERONET data (https://aeronet.gsfc.nasa.gov/), the University of Wyoming for the availability of the atmospheric soundings, along with the European Environment Agency for provid-

885
ing the AirBase data. We acknowledge the E-OBS dataset from the EU-FP6 project UERRA (http://www.uerra.eu) and the data providers in the ECA&D project (https://www.ecad.eu). European Environmental Agency (EEA) is acknowledged for their air quality station data that is provided and freely downloadable (https://www.eea.europa.eu/data-and-maps/data/aqereporting-8). We also thank the World Ozone and Ultraviolet Radiation Data Centre (WOUDC), the Environment and Climate Change Canada, Toronto and the World Meteorological Organization-Global Atmosphere Watch Program (WMO-GAW) for providing ozoneSonde data, downloaded from the WOUDC site