GENerator of reduced Organic Aerosol mechanism (GENOA v1.0): An automatic generation tool of semi-explicit mechanisms

. This paper describes the GENerator of Reduced Organic Aerosol Mechanisms (GENOA) that produces semi-explicit mechanisms for simulating the formation and evolution of secondary organic aerosol (SOA) in air-quality models. Using a series of predeﬁned reduction strategies and evaluation criteria, GENOA trains and reduces SOA mechanisms from explicit chemical mechanisms (e.g., the master chemical mechanism (MCM)) under representative atmospheric conditions. As a consequence, these trained SOA mechanisms can preserve the accuracy of explicit VOC mechanisms on SOA formation (e.g., 5 molecular structures of crucial compounds, effect of non-ideality and hydrophilic/hydrophobic partitioning of aerosols), with a size (in terms of reaction and species numbers) that is manageable for three-dimensional (3-D) aerosol modeling (e.g., regional chemical transport models). Applied to the degradation of sesquiterpenes (as β -caryophyllene) from MCM, GENOA builds a concise SOA mechanism (2% of the MCM size), consisting of 23 reactions and 15 species, six of them being condensable. The generated SOA mechanism has been evaluated for its ability to reproduce SOA concentrations under varying atmospheric 10 conditions encountered over Europe, with an average error lower than 3%.

The state of knowledge on VOC chemistry can be reflected by explicit gas-phase chemical mechanisms, which contain all 25 known essential reaction pathways in VOC degradation. For instance, Jenkin et al. (1997); Saunders et al. (2003) developed the near-explicit Master Chemical Mechanism (MCM), which describes detailed gas-phase chemical processes related to VOC oxidation. Another example is the Generator for Explicit Chemistry and Kinetics of Organics in the Atmosphere (GECKO-A) , which uses a prescribed protocol to assign complete reactions pathways and kinetic data to the degradation of VOCs. Explicit mechanisms represent the current understanding of atmospheric chemistry, including information about 30 reaction pathways, kinetics data, and chemical structures (which may be used to deduce thermodynamic properties based on structure-activity relationships). The MCM mechanism has been used by 2-D Lagrangian models to simulate the chemical evolution of major air pollutants and some SOAs in plumes (e.g., Evtyugina et al., 2007;Sommariva et al., 2008;Zhang et al., 2021). Moreover, it has been used for simulating the formation of more complex SOAs at a regional level in 3-D models over a few weeks (e.g., modified MCM with 4642 species and 13,566 reactions in the simulations of Ying and Li (2011), and with 35 5727 species and 16,930 reactions in the simulations of Li et al. (2015)). Even so, explicit mechanisms of that size are too computationally intensive to be widely employed in 3-D AQMs for SOA formation. For computational efficiency, AQMs generally use implicit gas-phase chemical mechanisms. Two major approaches are frequently adopted to build implicit chemical mechanisms: -The lumped-species approach, which gathers into one surrogate compounds with analogous formulas and properties 40 (e.g., SAPRC-07 Carter (2010), RACM2 Goliff et al. (2013)) -The carbon-bond or lumped-structure approach, which assumes that organic molecules have chemical behaviors equivalent to those of their decomposed functional groups (e.g., CB05 Sarwar et al. (2008)) Implicit gas-phase mechanisms were developed and validated to simulate the concentrations of oxidants and other conventional air pollutants such as ozone and NO 2 . In these mechanisms, VOCs have been grouped into a limited number of model 45 species because of computational considerations, and the SOA formation is usually not considered.
To complete implicit gas-phase mechanisms, implicit SOA mechanisms have been developed (Kim et al., 2011), which model the SOA formation specifically without modifying ozone and radical concentrations. In 3-D modeling, implicit SOA mechanisms or parameterizations are usually added to implicit gas-phase mechanisms, conserving the oxidant chemistry of the implicit gas-phase mechanism. 50 Implicit SOA mechanisms are often established based on experimental data from smog chamber experiments to represent the formation and evolution of SOA, such as the two-product empirical SOA model (Odum et al., 1996) and the volatility basis set (VBS) that splits VOC oxidation products into a uniform set of volatility "bins" (Donahue et al., 2006). In the VBS approach, the successive evolution of oxidation products by aging is determined regardless of the chemical composition and structure of the species. Another approach is based on the molecular surrogate approach (e.g., Griffin et al., 2003;Pun et al., 55 2006; Couvidat et al., 2012). Similarly to the gas-phase chemistry lumped-species approach, the VOC oxidation products are represented via the formation of a few SOA surrogates that are attached to a molecular structure (assumed to be representative of a myriad of semi-volatile compounds). By attaching a molecular structure to the surrogate, several processes otherwise not Pandis, 2015), and their degradation mechanism (as BCARY) is well documented in the near-explicit MCM mechanism (Jenkin et al., 2012). Studies have also compared SOA yields simulated using the MCM mechanism to chamber data for sesquiterpenes (e.g., Xavier et al., 2019). BCARY is, therefore, an ideal candidate for model development and demonstration of the reduction 95 methodology. In this paper, the near-explicit MCM BCARY degradation scheme serves as a reliable benchmark for GENOA.
The experiment data from Tasoglou and Pandis (2015); Chen et al. (2012) are also compared to the newly developed reduced mechanism in Appendix A.

Model development
The GENerator of Reduced Organic Aerosol Mechanisms (GENOA) is an algorithm that generates semi-explicit chemical 100 mechanisms focusing on SOA formation. The generated semi-explicit mechanisms are designed to preserve the accuracy of explicit mechanisms for SOA formation, while keeping the number of reactions/species low enough to be suitable for largescale modeling, particularly in 3-D AQMs. The focus of the semi-explicit mechanism is solely on the accurate modeling of SOA. Because ozone, major radicals, and other inorganics are also affected by inorganic and other VOC chemistry, their concentrations are not tracked with the semi-explicit mechanism. Instead, they are simulated using existing implicit gas-phase 105 chemical mechanisms. a GENOA uses the first value of the targeted variables for initialization, and passes to the next values for subsequent parameter updates.
b Activated under certain circumstances.
As illustrated in Fig. 1, the processes in GENOA can be divided into two main sections: training and testing. The training section, as detailed in Fig. 1, can be divided into two parts: user-specified or preset values.

110
-Reduction cycle, where the actual reduction of the mechanism occurs.
In the parameter selection, GENOA first assigns the error tolerance defined as the largest acceptable error induced by each change in the mechanism (see Sect.2.5), and then employs one of the reduction strategies along with its required parameters (see Sect.2.2).
Afterward, in the reduction cycle, GENOA searches for potential reductions according to the selected reduction strategy.

115
The new mechanism with the first found reduction is then simulated over the conditions from the training dataset (a limited set of conditions used through all the reduction processes, Sect.2.3.1) or from the pre-testing dataset (a more extensive set of conditions used only at the end of the reduction process see Sect.2.3.2). The simulated total SOA concentrations are then compared to those simulated with the reference mechanism, where the differences are used to evaluate the potential reduction (see Sect. 2.5). In case the SOA differences are under the pre-defined error tolerances, the mechanism with the current reduction 120 is accepted and serves as the basis for the next search for reduction. If the reduction is refused, the following reduction attempt starts with the previously validated mechanism. Once no more reduction is found, the current reduction cycle ends. The next step is either selecting the subsequent error tolerance and/or reduction strategy in the next parameter selection, or terminating the GENOA training section. Finally, the performance evaluation of the final reduced mechanism is evaluated under a variety of environmental conditions denoted as the testing dataset (see Sect. 2.3.3). The 0-D aerosol model SSH-aerosol is used to 125 simulate SOA concentration and composition, which is required in all the GENOA sections (e.g., the initialization of reduction parameters, and the evaluation of the reduced mechanism).

Pre-reduction
A pre-reduction process is conducted on the original MCM mechanism before it is used as the reference mechanism for the reduction. This process skips extremely fast reactions (i.e., the reaction rate of 10 6 s −1 corresponding to a lifetime of 1 µs)  Fig. C1. b unit in s −1 for unimolecular reactions and molecule −1 cm 3 s −1 for bimolecular reactions.
-Removing: reactions, species, or gas-particle partitioning with negligible effects on SOA formation are removed from the mechanism.
-Jumping: one compound is substituted by its oxidation product, as if the compound had been "jumped over" in the reaction pathway.
-Lumping: compounds with similar properties are combined to form a new compound.

145
-Replacing: one compound is replaced by another existing compound with similar properties.
The reduction strategies are illustrated with examples from the BCARY reduction in sections 2.2.1 to 2.2.4. A detailed list of all the options and parameters controlling the BCARY reduction is summarized in the supplementary material.
For the BCARY reduction, the reduction strategies are employed in the following order: removing reactions, jumping, lumping, replacing, removing species, and finally removing gas-particle partitioning. The reduction strategies are ordered based on 150 their potential influences on the mechanism. The first applied strategies, removing reactions and jumping, trim trivial reactions and species without altering the properties of the species. They are followed by lumping and replacing (as an extension to lumping), which refine the mechanisms considerably by merging the species and reactions involved. Afterward, the "removing species" strategy attempts to delete all merged and unmerged species. Finally, the strategy of removing gas-particle partitioning is applied in order to remove the partitioning of condensable species, which cannot be removed by removing species. This 155 current order has been tested and found to be efficient for the BCARY mechanism, but it can be changed by the user along with other user-chosen parameters.

Removing strategy
The removing strategy assumes that chemical reactions and/or species with a low probability of contributing to the formation and evolution of SOA can be eliminated from the mechanism. In general, three types of removing are applied depending on 160 the removed subject: -Removing reactions.
-Removing compounds in both the gaseous and particle phases (completely removing a species from the scheme).
-Removing the gas-particle partitioning of semi-volatile compounds (consider the semi-volatile compounds as VOCs that do not condense to the particle phase, but retain their gas-phase chemistry).

165
There is no particular restriction to exclude species from the reduction attempt via the strategy of removing compounds or removing gas-particle partitioning. However, for removing reactions, a threshold on the branching ratio of the reaction is applied to the reduction. The branching ratio is defined as the ratio of the destruction rate of one reaction to the sum of the destruction rates of all reactions of the targeted species. In the BCARY reduction, a maximum branching ratio (B rm ) is defined as a restriction criterion. All reactions with an hourly branching ratio (averaged over the training conditions) under this value 170 (reactions that are likely to have a minimal effect on SOA formation) are considered candidates for removal.
To avoid over-reduction, a small B rm is applied at the beginning of reduction. After going through the reductions for all reduction strategies, the value of B rm is then incremented. In the reduction of BCARY, an ascending list of B rm values equal to 5 %, 10 %, 50 % is employed, which is changed to 10 %, 50 %, and 100 % at the late stage (explained in Sect. 2.5). When B rm equals 100 %, GENOA evaluates the removal of each reaction.

Jumping strategy
The jumping strategy relies on the assumption that compounds can be skipped in successive reactions, as long as it does not adversely impact the SOA concentration. In other words, the predecessor of an organic compound may directly form its destruction products. The jumping strategy is perfectly suited to intermediate compounds whose fast degradation may cause numerical stiffness, commonly including radicals such as oxy radicals (RO) or alkoxy radicals (ROO), as well as Criegee intermediates.
As shown in Table 2, the alkoxy radicals BCALOO formed from ozonolysis of BCAL (reaction No. 11 in Table 1) is jumped over to its only destruction product BCLKET. Consequently, reactions No. 12 to 16 are removed, and reaction No. 11 is updated to reaction R1 ("R" for reaction after reduction strategy). Currently, the jumping strategy is considered when the destruction of a single compound (to be jumped) results in the production of a single compound (jumping). The difference in carbon 185 numbers between reduced species can not exceed three in order to prevent significant differences in organic mass before and after jumping.

Lumping strategy
The lumping strategy (i.e., lumping different compounds into a single surrogate compound) assumes that organic compounds with similar chemical structures may exhibit similar properties and undergo similar physico-chemical processes and may 190 therefore be lumped together. With lumping, both the number of species and reactions decrease.
The lumping strategy is illustrated by the comparison of Table 3 (reactions before lumping) and Table 4 (reactions after lumping). In this example, a total of 13 chemical reactions (No. 17 to 29) involving three organic compounds are reduced to five reactions (production reaction R2 and four destruction reactions R3 to R6 of the new surrogate).
As demonstrated in the tables, the organic compounds BCAO2, BCBO2, and BCCO2 from the original MCM scheme are 195 the peroxy radicals formed from the OH-initiated oxidation of β-caryophyllene (Table 3). It is evident from their structures (shown in Fig. C1) that they are isomers and may share similar chemical properties. When applying the lumping strategy, BCAO2, BCBO2, and BCCO2 are merged into a new surrogate named "mBCAO2" (Table 4)  The key parameter that drives the reduction accuracy is the weighting ratio f w of lumping, corresponding to the weight of the original species in the new surrogate compound. As detailed in Table 4, f w is computed as a function of the chemical lifetime τ following the computation of Seinfeld and Pandis (2016), and the reference concentrations C r that are the arithmetic mean concentrations calculated from 0-D simulations using the explicit VOC mechanism. Both τ and C r are based on averages of simulations across all training conditions. The properties of the new surrogate compound (e.g., molecular structure, saturation 205 vapor pressure, molar mass, degradation kinetics) are estimated by weighing the properties of the initial compounds, while the stoichiometric coefficients and the kinetic rate coefficient of the new reaction are obtained by weighing those of the initial reactions.
In practice, GENOA attempts to lump only two species in a single reduction in order to ensure accuracy and effectiveness.
Lumping is subject to certain restrictions:

210
-No lumping between a compound and its oxidation products.

215
-The difference in the molecular weight should be negligible (i.e., smaller than 100 g mol −1 ).
-The difference in the carbon number should be no more than two.  fw,a weighting ratio of BCAO2 τaCr,a/(τaCr,a + τ b C r,b + τcCr,c) a Name of new surrogate contains the letter "m" revealing lumping and the name of the relatively dominant lumped species. This notation of lumping is used hereafter.
b reaction number after lumping, where reactions R3 to R6 preserve the destruction of BCAO2, BCBO2, and BCBO2, and reaction R2 presents the production.
c reaction numbers before lumping as presented in Table 3.
d subscript a, b, and c stands for BCAO2, BCBO2, and BCCO2, respectively.
e The calculation method also applies to other BCARY derived organics.
f [X] in the calculations is the reference concentration of radical and other inorganic species, where X is HO2, NO, NO3, or RO2 in this case. For radicals derived from the SOA precursor, the reference concentration is the produced concentration without considering their rapid destruction.
-The difference in the chemical lifetime should be less than 10-fold.
-Lumping is not considered for biradicals (ROO) that degrade rapidly into closed shell molecules, as jumping is considered to be more appropriate for these compounds. 220

Replacing strategy
The replacing strategy assumes that a compound with a negligible contribution to SOA formation can be substituted by a compound having a similar structure or undergoing the same reactions. In comparison to lumping, the replacing strategy reduces the number of reactions/species without creating new surrogate species.  The replacing strategy (Table 5) is expected to reduce more computational time than the lumping one (Table 4), since all reactions originating from the replaced species are removed from the mechanism. Hence, it does not require the computation 230 of weighting ratios and new surrogates. However, as a compromise, replacing could be less accurate than lumping, because replacing may discard some compounds and part of the mechanism, and therefore, lead to more error.
Thus, in efforts to prioritize the accuracy of reduction, GENOA currently employs replacing only after lumping and exclusively on species from the same reaction. In this way, species that were not lumped (because the lumping was rejected or because they do not respect the lumping restriction) can be reduced by replacing. During the training of BCARY reduction, a 235 restriction is applied on small organic compounds with a molar mass less than 100 g mol −1 , which are excluded from replacing.
Overall, the searches for viable reductions are conducted in reverse order of the reaction/species list. For removing, GENOA attempts to remove reactions from the bottom of the list and moves to the previous reactions. The same reverse sequence is followed for other strategies. When applied to the jumping strategy, for instance, GENOA tries to jump the species that has the highest generation and then move down to the species that has the lowest generation. Among all reduction strategies, only lumping alters the saturation vapor pressure of condensable species. Therefore, a rank of saturation vapor pressure is used exclusively in lumping to determine the most appropriate lumpable species. At each reduction, GENOA attempts to reduce only one species/reaction via removing, or one pair of compounds via lumping/ replacing/ jumping. This restriction allows exhaustive tracking of every detailed modification and its effect on SOA concentrations.

Datasets of atmospheric conditions applied to reduction 245
All the atmospheric conditions applied to the reduction are extracted from a 3-D simulation spanning the latitudes from 32 • N to 79 • N and the longitudes from 17 • W to 39.8 • E over continental Europe in a one-year period (2015) using the chemistrytransport model CHIMERE. The CHIMERE model and the configuration used for the simulation are described in Lanzafame et al. (2022). The 3-D CHIMERE simulation was conducted with the implicit gas-phase MELCHIOR2 mechanism (Derognat et al., 2003), which contains 120 reactions and less than 80 lumped species. The MELCHIOR2 mechanism describes the 250 degradation of sesquiterpenes by three oxidant-initiated reactions (HUMULE reacts with OH, O 3 , and NO 3 , respectively), where the species HUMULE represents the lumped class of all sesquiterpenes.
The monthly diurnal profiles of hourly meteorological data (e.g., temperature, relative humidity), and hourly concentrations of oxidant, radical, and other inorganic species were extracted from each location. That information is required in the 0- is used to estimate the SQT concentration. For the purpose of calculating reduction parameters (e.g., weighting ratio f w , branching ratio B) and evaluating the reduced mechanisms, a dataset of representative physio-chemical conditions extracted from CHIMERE simulation results is employed in GENOA. Depending on their usage, three groups of conditions are defined: the training dataset, the pre-testing dataset, and the testing dataset.

Training dataset
The training dataset is the set of conditions used to initialize the reduction parameters, estimate the reference concentrations, as well as to evaluate the mechanism at each potential reduction. For a mechanism containing over 1000 reactions and 500 species, a complete reduction may require more than 10 000 SOA simulations to evaluate all the reduction attempts. To reduce the number of simulations and the computational cost, a limited number of conditions can be evaluated at each reduction 270 attempt.
For the reduction of BCARY degradation, a training dataset of eight conditions is selected, which contains six chemistryrelevant conditions and two additional meteorological conditions. The geographic and meteorological information of each condition is described in Table 6, where the conditions cover a broad range in time (summer and winter conditions), temperatures ranging from 260 K to 302 K, and relative humidity from 39 % to 89 %.
275 Table 6. Geographic and meteorological conditions of the training dataset in Appendix Table C1): -The reacting ratios of the precursor with the oxidants O 3 (R O3 ), OH (R OH ), and NO 3 (R N O3 ), whose sum equals 1, indicate the relative reactivity of the first-generation oxidation pathways that lead to the formation of distinct kinds of RO 2 species.
These ratios indicate the competition between autoxidation and bimolecular reactions that result in different SOA types. A combination of these seven reacting ratios determines the chemical regime and favorable reaction pathways under a given atmospheric condition.  a high R OH of 95 % at midnight is not due to an abundance of OH, but rather to extremely low concentrations of O 3 (2.9 × 10 −4 ppb) and NO 3 (1.1 × 10 −9 ppb) that leads to an absence of nighttime reactivity.

Pre-testing dataset
The pre-testing dataset contains a greater number of conditions than the training dataset, providing a more accurate estimation of the reduction mechanism on SOA formation. After the mechanism has been significantly reduced, the pre-testing dataset is included along with the training dataset in order to evaluate the reduction attempts at the late-stage reduction. At this point of reduction, a slight change in the mechanism significantly impacts the SOA concentrations; therefore, merely evaluating reduc-305 tion based on the training dataset may not be adequate. Meanwhile, the size of the mechanism has already been significantly reduced, which makes the evaluation of each reduction attempt on the pre-testing dataset less computationally expensive.
In principle, the pre-testing dataset should be able to provide a fairly accurate representation of the testing dataset. However, this may not always be the case, since the pre-testing dataset is selected almost randomly from the testing dataset. Therefore, an adjustment may be required to increase the representativeness of the pre-testing dataset by adding or removing a few conditions.

310
For the application to BCARY, a pre-testing dataset with 150 atmospheric conditions is selected from the testing dataset, among which 50 conditions for each level (low, medium, and high) of SQT emissions. The locations of the training and pre-testing conditions are presented in Fig. 3.

Testing dataset
The final reduced mechanism, obtained from training, is eventually evaluated with a large number of atmospheric conditions 315 in the testing section. This set of conditions for the final evaluation is referred to as the testing dataset. Among all datasets, the results on the testing dataset are most likely to reflect the performance of the reduced mechanism for 3-D modeling.
In the BCARY reduction, the testing dataset is selected based on the concentrations of the CHIMERE sesquiterpene surrogate. Its maximum hourly concentration C SQT in ppb is used to exclude conditions with negligible SQT concentration. A testing dataset within a total of 12 159 conditions is applied (see Sect. 3.2), including all conditions (2 159 conditions) with 320 high SQT concentration (C SQT ≥ 0.1 ppb), 5 000 random-select ones with medium SQT concentration (C SQT between 0.01 and 0.1 ppb) and 5 000 random-select ones with low SQT concentration (C SQT (0.001, 0.01]). The conditions with extremely low SQT concentration (C SQT < 0.001 ppb) are not included in the testing dataset. Fig. B1 indicates the locations of the testing dataset as well as the testing results for BCARY reduction.

325
The chemical composition and time variation of SOA due to gas-phase chemistry and condensation/evaporation are simulated using the 0-D aerosol module SSH-aerosol (Sartelet et al., 2020). As detailed in Couvidat and Sartelet (2015), the gas/particle partitioning is estimated with Raoult's law (for the partitioning between the gas phase and the organic phase) and Henry's law (for the partitioning between the gas phase and the aqueous phase). Therefore, some properties of condensable compounds, such as the saturation vapor pressure P sat and the decomposition in functional groups, are crucial for modeling. For BCARY 330 derived organics, P sat is calculated using UManSysProp (Topping et al., 2016). The vapor pressure is computed using the method of Nannoolal et al. (2008) and the boiling point estimation from Joback and Reid (1987). This functional group method was selected since it provides the best performance when compared with the chamber experiment data of Chen et al.
Unless stated otherwise, two simulations are performed for each condition starting at midnight (0 h) and noon (12 h), taking into account both the daytime and nighttime chemistry. All 0-D simulations are run for five days in order to consider SOA formation and aging processes adequately. The initial BCARY concentration is set to 1 µg m −3 in order to ensure high SOA 340 production the (SOA concentration is always greater than 1 µg m −3 at all evaluated conditions at all conditions). For optimal computational efficiency, the gas-particle partitioning is assumed to be at thermodynamic equilibrium.

Settings for evaluation
For the different datasets, the performance of the reduced mechanism on SOA concentrations is evaluated using the fractional mean error (FME) computed with Eq. 1, where C val,i and C ref,i denote the SOA mass concentration at time step i simulated 345 with the reduced and the reference mechanisms, respectively.
The error of one simulation is defined as the larger of the FME on day one and the FME on days two to five, in order to address the difference in the performance of the reduced mechanisms at the early stage of the simulations (SOA formation dominates) and at the later stage (SOA aging dominates). This error is used to evaluate reduction by comparing it to the error tolerance specified in training. For the evaluation on the training dataset, two errors are estimated compared to the previously 350 verified reduced mechanism with a tolerance denoted pre , and the MCM mechanism with a tolerance denoted ref . The error tolerances are used to restrict both the maximum and the average (half of the tolerance) errors of the training conditions. As for the evaluation on the pre-testing dataset, only the error compared to the MCM mechanism is calculated. The error tolerances ave pre−testing and max pre−testing are set to the average and maximum errors, respectively. . (1)

355
In order to begin with a conservative BCARY reduction, the initial values of pre and ref are both set to 1 %. The values of these error tolerances are then increased to larger values, reflecting the looser criteria used throughout the reduction. ref is used to track the performance of the reduction, while pre is used to avoid large errors introduced by one reduction attempt. mechanism, and then accepts reductions that introduce larger errors up to ref .
The maximum values for both ref and pre are set to 10 %. When ref reaches 3 %, the mechanism is expected to be largely reduced. From then, the evaluation under the pre-testing dataset is considered to be added to the reduction. This means that all subsequent reductions are evaluated using both the training and pre-testing datasets. The average and maximum errors ( ave pre−testing and max pre−testing ) are restricted to be lower than 3 % and 20 %, respectively. As a result of the above error 365 tolerances, a reduced SQT-SOA mechanism with an average inaccuracy on SOA formation lower than 3 % (maximum 20 %) is expected.
Additionally, another error factor noted as the fractional bias (FB, computed as detailed in Eq. 2) is used to visualize the temporal performance of the reduced mechanism at each simulation time step. As examples, Fig. 8 and Fig. 10 show the average FB at each time step for the pre-testing conditions.

370
When trying to remove reactions, GENOA first removes reactions with low hourly branching ratios (B rm ≤ 5 %), since the removing reactions with B rm is likely to have a minimal effect on SOA formation. After no reduction is accepted by all applied reduction strategies under the defined error tolerance, the value of B rm is increased to 10 % and then 50 %.

Settings for aerosol-oriented treatments 375
In late-stage training, an intense competition between different potential reductions is observed, and a minor modification may induce significant uncertainty in the mechanism and prevent further reduction. Besides, because the formation of aerosols costs more CPU time than gas-phase chemistry, specific treatments are employed in the late stage of training to reduce the number of condensable species preferentially. These treatments, which reduce species rather than reactions, are done when the size of the mechanism is below a certain threshold (20 for BCARY reduction). Consequently, late-stage treatments encourage the 380 reduction via removing condensable species and are referred to as aerosol-oriented treatments. The treatments consist in: -Restricting the reduction of the number of reactions. Thus, strategies that reduce the number of aerosols are favored to result in fewer condensable species.
-Bypassing the evaluation of aerosol-oriented reductions on the training dataset when applied to lumping, replacing, and jumping. As a result, the aerosol-oriented reduction is evaluated only on the pre-testing dataset to avoid being rejected The additional reduction strategy of removing elementary-like reactions is targeted to reaction with multiple products. After rewriting the reaction into a set of elementary-like reactions, each with one oxidation product and integer stoichiometric 390 coefficient, GENOA investigates the possibility of removing the elementary-like reactions one by one. In practice, removing elementary-like reactions is inserted after the strategy of removing reactions and before jumping, when no further reduction that reduces condensable species can be found with the current parameters.
3 Application to β-caryophyllene mechanism GENOA is applied to the SQT degradation mechanism of the Master Chemical Mechanism v3.3.1 (Jenkin et al., 2012). Here β-395 caryophyllene (BCARY) is considered a surrogate for SQT primary VOCs. The degradation of β-caryophyllene in the original MCM mechanism consists of 1 626 reactions and 579 species (223 radicals and 356 stable species). After pre-reduction, the mechanism contains 1 241 reactions and 493 species (137 radicals and 356 stable species), which is employed as the starting point and the reference for the reduction (hereafter referred to as MCM).
Moreover, at the beginning of the GENOA training, all the stable species are assumed to be condensable (referred to as 400 condensables), and their saturation vapor pressures and activity coefficients are calculated based on their molecular structures (as detailed in Sect. 2). Applying the effective partitioning coefficients (K p at 298 K) described by Seinfeld and Pandis (2016), condensables can be classified into: semi-volatile organic compound (SVOC, K p between 10 −2 and 10 1 m 3 µg −1 ), low volatile organic compound (LVOC, K p between 10 1 and 10 4 m 3 µg −1 ), and extremely low volatile organic compound (ELVOC, K p larger than 10 4 m 3 µg −1 ).

405
The semi-explicit SQT-SOA mechanism "Rdc." presented in this section is trained from MCM with GENOA. used for global modeling. As presented in Sect. 3.2, the "Rdc." mechanism accurately reproduces the SOA concentration and 410 composition simulated by MCM with only six condensables. Table B3 summarizes the new surrogates and the lumped MCM species that are included in the final "Rdc" mechanism.

Building of the reduced SOA mechanism
As shown in Fig. 4, the "Rdc." mechanism is built from 113 validated reduction steps. In GENOA, a reduction step refers to all reduction attempts based on the performed reduction strategy and reduction parameters, while a validated reduction step -Early stage, from the first to the 74 th reduction step. By the end of the 74 th reduction step, the mechanism is reduced to 68 reactions and 41 species (including 20 condensables). The early-stage reduction is trained only on the training dataset with the seven pre-described reduction strategies. After ref reaching 3 %, the list of B rm is changed from [0.05, 0.10, 420 0.50] to [0.10, 0.50, 1.0].
-Late stage I, from the 75 th to the 107 th reduction step. By the end of the 107 th reduction step, the reduced mechanism consists of 38 reactions and 19 species (including seven condensables), and no further reduction can be found within ref ≤ 10 % and pre ≤ 10 %. In this stage, the reduction is trained on the pre-testing dataset if condensables are removed with lumping, replacing, or jumping. For reduction with other types of reduction strategies, it is first trained on the 425 training dataset and then on the pre-testing datasets. From all reduced mechanisms with seven condensables, GENOA selected the one with the minimum average errors on the pre-testing dataset (2.44 %) to start the next stage.
-Late stage II, from the 108 th to the 113 rd reduction step. At this stage, the reduction strategy of removing elementarylike reactions is applied to the training. All reductions that reduce the condensables are evaluated exclusively on the pre-testing dataset. The size of the reduced mechanism was reduced to 23 reactions and 15 species, among which the 430 number of condensables is reduced to 6. The average (maximum) error of the final reduced mechanism "Rdc." is 2.65 % (17.00 %) under the pre-testing dataset compared to MCM.
The extent of the reduction due to each strategy is summarized in Table 7. Compared to MCM, up to 99 % of reactions and 97 % of species are reduced in "Rdc.". As expected, the reduction strategy of removing reactions contributes the most to the decrease in the number of reactions (48 %), followed by the strategy of removing species with a contribution of 37 %. 435 Table 7. Reduction accomplished per each reduction strategy during the building process of the "Rdc." mechanism.
Meanwhile, both lumping and removing species are significant in the reduction of species, by 35 % and 31 %, respectively.
The number of condensables decreases in proportion to the number of species, except for the strategy of removing partitioning.
In that case, the gas-particle partitioning is removed and the species remains in the gas phase with no changes in the chemical mechanism.
As shown in Fig.5, which describes the chemical scheme of the "Rdc." mechanism, the three oxidants (i.e., O 3 , OH, and

440
NO 3 ) initiate reactions, leading to common oxidation products (e.g., mBCSOZ, mBCALO2) that dominate the successive  Figure 5. Representation of the chemical scheme of the "Rdc." mechanism. VOC, LVOC, and ELVOC are presented in ellipse, square, and diamond boxes, respectively. Radicals are written in plain text, without boxes. Reaction with OH, O3, NO3, NO, and H2O, HO2 are shown by arrows in red, blue, green, orange, purple, and sky-blue, respectively. The complete species and reaction lists of the "Rdc." mechanism are in Appendix Table B1 and B2, respectively. through the "Rdc." species mBCSOZ, which is a lumped surrogate of several MCM representative BCARY derived oxidation products: BSCOZ (the major secondary ozonize with a molar yield ≥ 65 % reported by Jenkin et al. (2012)), BCAL (the primary product formed from both OH and O 3 -initiated chemistry), and BCKET (from OH-initiated reactions).

Reproduction of the SOA concentrations
During the testing procedure, the "Rdc." mechanism is evaluated at 12 159 locations, with two different starting times (0 h and 12 h). The testing for "Rdc." took approximately 2% of the CPU time consumed for MCM.
Compared to MCM, "Rdc." presents a high level of accuracy with an average error of 2.66 % and a maximum error of 17.29 %. The monthly distribution of the number of the testing conditions as well as the testing errors are described in Fig. 6.

460
The error is lower than 10 % for more than 99 % of the simulations. The summer conditions, between June and September, covering more than half of the testing conditions (  An error map of testing conditions in July and August is displayed in Fig. 7. It indicates the locations of testing conditions and the errors of each condition, especially highlighting outliers during this period. Detailed error maps of all testing conditions can be found in Appendix B. It shows that the "Rdc." mechanism induces low errors, lower than 6 %, for most of the testing conditions. The conditions with errors over 6 % are mainly concentrated in northern Africa near the Atlas Mountains and in the Eastern Mediterranean, where the conditions most likely correspond to a dry Mediterranean climate with low RH and high 470 temperature. Other conditions with errors above 6 % are dispersed in the Pô valley of Northern Italy and along the coasts of southern Spain. More accurate results could be obtained with stricter parameters for reduction (e.g., lower error tolerance), or by updating the conditions (e.g., training and pre-testing datasets) covering more extreme conditions in the training process.

Reproduction of the SOA composition
The SOA concentrations and chemical composition simulated with the "Rdc." mechanism and with MCM are compared in this 475 section. The temporal profiles of the total SOA concentrations on an average of the pre-testing dataset and non-ideal conditions are displayed in Fig. 8. Throughout the entire five-day simulation period, there is excellent agreement between hourly SOA concentrations simulated with MCM and those obtained from the "Rdc." mechanism. The SOA concentration builds up rapidly in the first few hours, where the results of the "Rdc." mechanism present relatively larger fluctuations (The maximum FB of 3.74 % is observed at 1 h on the average pre-testing results).

480
The average SOA concentrations per volatility class on the pre-testing dataset at two simulation times (8 h and 72 h) are listed in Table 8. At both 8 h and 72 h, the "Rdc." mechanism accurately reproduces the total SOA mass with a relative difference lower than 0.1 % compared to MCM. An accumulation of the SOA mass into the ELVOC class is observed (51 % of the total SOA mass at 8 h and 66 % at 72 h), with both the MCM and the "Rdc" mechanisms. The aging of SOA produces compounds of low and extremely low volatility. Regarding the volatility classes, the "Rdc." mechanism tends to slightly overestimate the 485 SOA resulting from ELVOCs and underestimate the SOA resulting from LVOCs, especially at 72 h. This suggests that aging leads to "Rdc." condensables of slightly lower volatility than the MCM ones. However, the differences are low, up to 0.4 µg mechanism (black, plain) mechanisms under the non-ideal conditions. The average (blue, plain) and maximum FB (blue, shadow) between MCM and the "Rdc." mechanism is also presented. The average SOA composition per functional group simulated on the pre-testing dataset at 72 h is displayed in Fig.9. No significant change in the functional group distributions is found between 8 h and 72 h of oxidation. The alkyl (RC) and carbonyl 490 groups (RCO) contribute the most to the SOA mass, by more than 1 µg m −3 , whereas the other functional groups contribute by less than 1 µg m −3 . Overall, the "Rdc." mechanism satisfactorily reproduces the composition of the MCM-simulated SOA composition for most functional groups, except for nitrogen-containing groups. In comparison to MCM, only two condensables containing nitrogen are retained in the "Rdc." mechanism: NBCOOH and C131PAN, leading to an underestimation of the organic nitrate group (0.31 in MCM and 0.04 in "Rdc.") and an overestimation of the nitrate mass of the peroxyacetyl nitrate group (0.10 µg m −3 in MCM and 0.30 in "Rdc."). To obtain better results on the reproduction of nitrogen groups, GENOA may be further restricted to distinguish nitrogen compounds in training. Additionally, the peroxyacetyl acid group results in an extremely low SOA mass in MCM (less than 0.01 %), and therefore, is not kept in the "Rdc." mechanism.   Moreover, the temporal profiles of the OM/OC ratio, as well as the H/C, O/C, and N/C atomic ratios are presented in Fig. 10.
Comparable patterns are observed in the OM/OC (1.65 in MCM and 1.63 in "Rdc." on average), the O/C (0.37 in MCM and 500 0.36 in "Rdc."), as well as the H/C ratios (1.62 in MCM and 1.60 in "Rdc."). During the first 8-hour simulation, "Rdc." tends to slightly overestimate the OM/OC and O/C ratios, while the H/C ratio remains fairly stable throughout the entire simulation with a negligible difference (0.02) between MCM and "Rdc.". The N/C ratio, however, is underestimated by the "Rdc." mechanism by 37 % on average (ratio equal to 0.019 in MCM and to 0.012 in "Rdc."), indicating the over-reducing organic nitrites in "Rdc.". A total of three nitrogen-containing organics (NBCO2, NBCOOH, and C131PAN) are preserved in "Rdc.", of which 505 two (NBCO2, NBCOOH) are first-generation products. Therefore, during the first 10 hours, the N/C ratio curve simulated by "Rdc." drops, whereas in MCM it increases as higher-generation nitrates are produced.

Sensitivity on environmental parameters
The sensitivities of the "Rdc." mechanism to temperature, relative humidity (RH), and SOA mass conditions are investigated with the pre-testing dataset. The default value of BCARY concentration is 5 µg m −3 , and the default RH and T are set to 510 constant 50 % and 298 K, respectively. As presented in Fig. 11, the SOA yields simulated by the "Rdc." mechanism with different environmental parameters show a remarkable resemblance with the SOA yields simulated by MCM.
Under 10 µg m −3 , the simulated SOA yields are not affected by the SOA mass loading. This result is consistent with the large contribution of ELVOC reported in Table 8. A discrepancy of 25 % in the average SOA yield at 1 h with an SOA mass loading of 10 3 µg m −3 at 1 h and a discrepancy of 8 % at 72 h with an SOA mass loading of 10 −3 µg m −3 are observed. The 515 result indicates that the "Rdc." mechanism may introduce relatively large uncertainty with extreme SOA loading (larger than 500 µg m −3 ), which was outside the range of conditions used for the construction of the "Rdc." mechanism. SOA formation is  is also presented.
affected by RH, because of both the gas-phase chemistry (reaction with H 2 O vapors) and the gas-particle transfer (condensation of hydrophilic SOA precursors on aqueous aerosols). The sensitivity tests show that the "Rdc." mechanism reproduces well (differences lower than 2 %) the SOA yields of MCM with relative humidity ranging from 5 % to 95 %. For temperature, the 520 "Rdc." mechanism reproduces very well the SOA aging at 72 h, but larger discrepancies are observed in the earlier period, when the oxidation products are more volatile. However, the discrepancies in SOA yield stay low: differences up to 7 % (at 1 h and 72 h) and 10 % (at 8 h) are observed for temperatures equal to 263 K and 323 K, respectively. This finding is consistent with the testing results. To sum up, the discrepancies suggest that the reduced mechanism performs quite well, although larger discrepancies with MCM are observed under conditions that are outside the range of conditions used during training. types of reduction strategies (lumping, replacing, jumping, and removing) are adopted to locate the potential reduction in the 530 mechanism. Each reduction attempt is evaluated against the explicit mechanism under a sequence of near-realistic atmospheric conditions (the training dataset, and/or the pre-testing dataset at the late stage of reduction). Finally, the reduced mechanism is evaluated on various conditions of a testing dataset. Under each condition, two five-day 0-D simulations starting at midnight and noon are conducted with the aerosol model SSH-aerosol to evaluate the performance of the reduced SOA mechanism.
GENOA successfully generated semi-explicit SOA chemical mechanisms for the degradation of sesquiterpene, for which the the "Rdc." mechanism and 1.60 in MCM), H/C, and O/C ratios. Nitrogen-containing SOA, which contributes to only 7 % of the total mass, is not as well represented as other groups, and the ratio N/C is slightly underestimated in the "Rdc." mechanism (0.016 against 0.021 in MCM). The similarity of the representation of the functional group decomposition allows reproducing the non-ideality of SOA similarly in the "Rdc." mechanism and in MCM. Additionally, the sensitivity tests on RH, temperature, and organic mass loading show that the SOA simulated with the "Rdc." mechanism is in good agreement with MCM results 545 under most conditions (except for conditions with extremely high temperature or with massive organic aerosol loading where discrepancies in the SOA yields may reach 8 % (temperature) and 25 % (massive mass loading)). It indicates that the reduced mechanism performs well for conditions in the training range, but the performance may decrease for conditions outside of this range. To improve the performance of the semi-explicit SOA mechanism under conditions outside the training range, two methods can be employed: the first is to include the outlier conditions in the training procedure if they are considered influential 550 to SOA formation, and the second is to adopt strict error tolerance to restrict the reduction.
Code and data availability. The source code for GENOA v1.0 is hosted on GitHub at https://github.com/tool-genoa/GENOA/tree/v1.0 (last The ozonolysis experimental data reported in Tasoglou and Pandis (2015) and Chen et al. (2012)  Yalkowsky (1997) and "v1": Nannoolal et al. (2008)) and three methods to compute the boiling point ("b0": Nannoolal et al. (2004), "b1": Stein and Brown (1994), and "b2": Joback and Reid (1987)). As shown in Fig. A1, the SOA distribution simulated with "v1b2" agrees best with the experimental data. Therefore, this method with the vapor pressure computed by Nannoolal et al. (2008) and the boiling point computed by Joback and Reid (1987) is used in the BCARY reduction. The results simulated with the final reduced mechanism "Rdc." is also presented in Fig. A1, which has a great resemblance to the experimental data.  sim  evp  v0b0  v0b1  v0b2  v1b0  v1b1  v1b2 Rdc. Figure A1. The SOA yields versus the total SOA mass from the experimental data reported by Chen et al. (2012); Tasoglou and Pandis (2015), simulated in SSH-aerosol with the MCM mechanism and different saturation vapor pressures methods, and simulated with the "Rdc." mechanism. The "Rdc." mechanism is trained from the MCM mechanism with the "v1b2" method.
Appendix B: An overview of the "Rdc." mechanism Table B1. Species list of the "Rdc." mechanism.
Notice that the species in the reduced case may be different from the MCM species with identical names.   6.0 6.6 (b) Concentration ( g m -3 ) Figure B1. Maps of (a) error and (b) average SOA concentration of the testing results simulated using the "Rdc." mechanism on all (i.e., 12 159) testing conditions. Appendix C: Information related to the reduction

C1 Additional examples of lumping
Besides the example shown in Sect. 2.2.3, two additional examples have been added from the BCARY reduction: one illustrates the lumping of two similar compounds formed by different reactions, and the other illustrates the lumping of two more 570 distinct compounds. The first example is the MCM species C1313NO3 and C152NO3 (see Fig. C1). These two species come from different reactions. The molecular structures of both compounds are similar (they contain organic nitrates, aldehydes, and alcohols), but C152NO3 contains an additional carboxylic acid where C1313NO3 contains an aldehyde. The corresponding reactions before and after lumping are summarized in Table C2, where the new surrogate "mC1313NO3" is built from C1313NO3 with a weighting ratio of 83% and C152NO3 with a weighting ratio of 17%. As a result of this lumping, the 575 average error increase under training conditions is 0.001 % (the tolerance is 0.01 %).
Another example of lumping is the MCM species BCALBOC and C1310OH. Unlike the previous example, these two species are more distinct. According to MCM, BCALBOC are generated through O 3 -initiated reactions, while C1310OH are generated through high-generation oxidations. There is less similarity in the structures or chemical reactions of the two molecules. MCM contains the OH reaction of BCALBOC, and the O 3 and OH reactions of C1310OH. However, this reduction was accepted Figure C1. Molecular structures of the MCM species that are mentioned in this paper. For more information, please visit the MCM website. surrogate "mBCALBOC" is constructed from BCALBOC with a weighting ratio of 98 % and C1310OH with a weighting ratio of 2 %.
As C1310OH has a low weighting ratio, the lumping would be substituted by replacing (a special case of lumping), where the weighting ratio of BCALBOC is set to 100 % and of C1310OH is set to 0 %. In that case, instead of forming a new surrogate, 585 C1310OH is replaced by BCALBOC. In BCARY reduction, this type of replacing was not used, but it can be activated by the user by setting the weighting ratio threshold.
Author contributions. ZW developed the model code and performed the simulations. ZW, FC, KS designed the research, developed the methodology. ZW wrote the manuscript with contributions from FC and KS. FC and KS were responsible for funding acquisition.
Competing interests. The authors declare that no competing interests are present.