<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">GMD</journal-id><journal-title-group>
    <journal-title>Geoscientific Model Development</journal-title>
    <abbrev-journal-title abbrev-type="publisher">GMD</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Geosci. Model Dev.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1991-9603</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/gmd-19-4601-2026</article-id><title-group><article-title>S2AS v1.0 and 2D polarity–volatility lumping framework  v1.0: automated compound classification and scalable  lumping for organic aerosol modelling</article-title><alt-title>2D polarity–volatility framework</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Amaladhasan</surname><given-names>Dalrin Ampritta</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Hassan-Barthaux</surname><given-names>Dan</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Zuend</surname><given-names>Andreas</given-names></name>
          <email>andreas.zuend@mcgill.ca</email>
        <ext-link>https://orcid.org/0000-0003-3101-8521</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Department of Atmospheric and Oceanic Sciences, McGill University, Montreal, Quebec, H3A 0B9, Canada</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Andreas Zuend (andreas.zuend@mcgill.ca)</corresp></author-notes><pub-date><day>28</day><month>May</month><year>2026</year></pub-date>
      
      <volume>19</volume>
      <issue>10</issue>
      <fpage>4601</fpage><lpage>4631</lpage>
      <history>
        <date date-type="received"><day>23</day><month>September</month><year>2025</year></date>
           <date date-type="rev-request"><day>30</day><month>January</month><year>2026</year></date>
           <date date-type="rev-recd"><day>24</day><month>April</month><year>2026</year></date>
           <date date-type="accepted"><day>11</day><month>May</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Dalrin Ampritta Amaladhasan et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026.html">This article is available from https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026.html</self-uri><self-uri xlink:href="https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026.pdf">The full text article is available as a PDF file from https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e100">Advancements in near-explicit chemical reaction mechanisms, such as the Master Chemical Mechanism (MCM) or the Generator of Explicit Chemistry and Kinetics of Organics in the Atmosphere (GECKO-A), have enabled highly detailed simulations of atmospheric chemistry. Such simulations offer a bottom-up approach to accompany and inform laboratory chamber experiments of organic aerosol formation or to model the complex chemistry of mixtures of volatile aerosol precursors for specific tropospheric conditions. These chemical reaction mechanisms, while comprehensive, generate hundreds to millions of organic components, creating computational challenges for subsequent applications in multiphase equilibrium gas–particle partitioning models to predict secondary organic aerosol (SOA) mass concentrations, phase compositions, and hygroscopicity. The wealth of simulated reactions and components also requires substantial simplifications for reduced-complexity representations in large-scale atmospheric models. This study introduces a suite of software tools to automate relevant pure-component property predictions as well as a 2-dimensional (2D) polarity–volatility lumping framework to systematically reduce the complexity of chemical mechanism outputs. We introduce a new polarity metric for use in the 2D framework, a ratio of a component's activity coefficients in water and an organic solvent (hexanediol). This ratio is computed using the Aerosol Inorganic–Organic Mixtures Functional groups Activity Coefficients (AIOMFAC) model. The 2D framework offers grid-based and cluster-based methods to select an adjustable number of surrogate species and offers flexibility in the choice of polarity axis. Our methods utilize the Simplified Molecular Input Line Entry System (SMILES) description of molecular structures. A new tool, SMILES to AIOMFAC subgroups (S2AS), is introduced to automatically generate AIOMFAC-model input files and to handle exception cases consistently. We demonstrate the application of our framework using systems of hundreds to thousands of components generated by near-explicit chemical mechanisms. The new framework enables tailored reduced-complexity representations of gas–particle systems.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>Natural Sciences and Engineering Research Council of Canada</funding-source>
<award-id>RGPIN/04315-2014</award-id>
<award-id>RGPIN-2021-02688</award-id>
</award-group>
<award-group id="gs2">
<funding-source>Government of Canada</funding-source>
<award-id>GCXE26S058</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e112">Secondary organic aerosol material (SOA) is formed through chemical processing and gas–particle partitioning of volatile organic precursors. SOA can consist of hundreds to millions of distinct kinds of molecules, stemming from biogenic and anthropogenic emission sources <xref ref-type="bibr" rid="bib1.bibx25 bib1.bibx35" id="paren.1"/>. In addition, atmospheric aerosols frequently contain primary organic aerosol material (POA), water and dissolved electrolytes, as well as insoluble species. The complexities in sources, chemical and physical transformations and resulting gas- and particle-phase mixtures, introduces computational challenges when attempting to predict component properties and the partitioning behaviour of such organic–inorganic aerosol systems. Atmospheric chemical transport models often resort to the use of highly simplified volatility binning or approaches relying on surrogate components for gas–particle partitioning predictions of organics. Those models often also assume ideal condensed phase mixing behaviour, in part due to computational time considerations and in part due to a lack of efficient thermodynamic mixing models <xref ref-type="bibr" rid="bib1.bibx11 bib1.bibx54" id="paren.2"/>.</p>
      <p id="d2e121">Laboratory experiments, field studies and theory suggest that nonideal mixing in condensed particulate matter (PM) phases impacts the gas–particle partitioning process and influences the physicochemical properties of the condensed phase, often leading to liquid–liquid phase separation over a wide range of environmental conditions <xref ref-type="bibr" rid="bib1.bibx42 bib1.bibx20 bib1.bibx55 bib1.bibx8 bib1.bibx72 bib1.bibx28 bib1.bibx51" id="paren.3"/>. Models have been developed to predict SOA formation based on the thermodynamic equilibrium partitioning of semivolatile organic oxidation products, including versions for application in atmospheric large-scale models <xref ref-type="bibr" rid="bib1.bibx47 bib1.bibx24 bib1.bibx12 bib1.bibx63 bib1.bibx45 bib1.bibx64" id="paren.4"/>. The model by <xref ref-type="bibr" rid="bib1.bibx24" id="text.5"/> allows for gas–particle equilibrium of organic and inorganic compounds considering an organic and an inorganic phase, but it does not allow organics and salts to partition between the two PM phases. A gas–particle partitioning approach using the activity coefficient model X-UNIFAC proposed by <xref ref-type="bibr" rid="bib1.bibx12" id="paren.6"/>, an extension of the UNIquac Functional group Activity Coefficients (UNIFAC) model <xref ref-type="bibr" rid="bib1.bibx21" id="paren.7"/>, enabled equilibrium of all species between all phases present yet restricted to single electrolyte components. Since then, improved thermodynamic multiphase modelling frameworks have been introduced <xref ref-type="bibr" rid="bib1.bibx72" id="paren.8"><named-content content-type="pre">e.g.</named-content></xref> for applications in box models, yet all such models reach computational limitations, such as excessive memory requirements, when applied to highly complex aerosol systems containing many hundreds to thousands of interacting components.</p>
      <p id="d2e145">Based on theory, the equilibrium gas–particle partitioning of a certain semi-volatile (organic) compound is mainly governed by three key properties: (1) the pure-component saturation vapour pressure, (2) the effective activity coefficient in the absorbing aerosol phase and (3) the total mass concentration of all the material in the absorbing condensed phase <xref ref-type="bibr" rid="bib1.bibx42 bib1.bibx16 bib1.bibx72" id="paren.9"><named-content content-type="pre">e.g.,</named-content></xref>. Thus, the pure-component saturation vapour pressure is a critical input for equilibrium partitioning models, including for box models based on the Aerosol Inorganic–Organic Mixtures Functional group Activity Coefficients (AIOMFAC) model <xref ref-type="bibr" rid="bib1.bibx71 bib1.bibx73" id="paren.10"/>, X-UNIFAC and other UNIFAC variants <xref ref-type="bibr" rid="bib1.bibx26 bib1.bibx43 bib1.bibx13 bib1.bibx14" id="paren.11"/>.</p>
      <p id="d2e159">One way to generate the chemical composition of an air parcel is by predicting the gas- and/or particle-phase composition by means of integrating a chemical reaction scheme over time. In this study, we focus on the development and discussion of necessary tools for handling the output of detailed reaction mechanisms. The aim is to process such output for subsequent equilibrium gas–particle partitioning computations, which in turn predict the composition, PM mass concentration and other SOA properties.</p>
<sec id="Ch1.S1.SS1">
  <label>1.1</label><title>Near-explicit chemical mechanisms</title>
      <p id="d2e170">The Master Chemical Mechanism (MCM, v3.3.1) is a near-explicit reaction scheme of the gas-phase chemistry, covering a substantial set of (volatile) aliphatic and aromatic hydrocarbon compounds in atmospheric chemistry models <xref ref-type="bibr" rid="bib1.bibx29 bib1.bibx50" id="paren.12"/>. Oxidation products of volatile or intermediate-volatility compounds may be of sufficiently low volatility to contribute to condensed aerosol mass. In past work, molecular concentrations of a subset of oxidized compounds simulated by MCM have been used as input composition information in the gas–particle partitioning model by <xref ref-type="bibr" rid="bib1.bibx69" id="text.13"/> to predict SOA mass concentrations at varying levels of relative humidity (RH). Another state-of-the-art, near-explicit model is the Generator for Explicit Chemistry and Kinetics of Organics in the Atmosphere (GECKO-A) by <xref ref-type="bibr" rid="bib1.bibx6 bib1.bibx37" id="text.14"/>. GECKO-A is a chemical mechanism generator, which automates the creation of thousands of reactions and thousands to millions of oxidation and fragmentation products from a single precursor or a mixture of precursors (depending on the structural complexity of the precursors). GECKO-A achieves this by algorithmically generating the likely chemical products and related kinetic rate constants for multiple generations of reactions of a precursor and its derivatives <xref ref-type="bibr" rid="bib1.bibx6 bib1.bibx37" id="paren.15"/>. A box model (as part of GECKO-A) can then be run under given conditions of temperature, RH, reaction time and oxidant concentrations to generate the molecular output concentrations at specified times of interest <xref ref-type="bibr" rid="bib1.bibx6 bib1.bibx37" id="paren.16"/>. The processing of the wealth of component information from such near-explicit methods and related box model simulations requires the use of automated compound classification tools – the motivation for this study.</p>
</sec>
<sec id="Ch1.S1.SS2">
  <label>1.2</label><title>Cheminformatics tools</title>
      <p id="d2e196">Mapping molecular information from near-explicit chemical mechanisms onto a lower-dimensional parameter space enables representations of large data sets at customized resolution and allows for running equilibrium thermodynamic models within their computationally feasible range. Therefore, such mappings aid in achieving adjustable-resolution model–measurement comparisons of aerosol properties. One approach for achieving this dimensionality reduction involves representing molecular structures using methods that capture essential features in a compact format. Molecular structures can be represented using the Simplified Molecular Input Line Entry System (SMILES) <xref ref-type="bibr" rid="bib1.bibx65 bib1.bibx62" id="paren.17"/>, a linear ASCII text string convertible to 2D (or 3D)  molecular drawings and related internal representations by cheminformatics packages such as OpenBabel <xref ref-type="bibr" rid="bib1.bibx39" id="paren.18"/>, and RDKit <xref ref-type="bibr" rid="bib1.bibx34" id="paren.19"/>. Additionally, the SMiles ARbitrary Target Specification (SMARTS) language allows specifying substructure patterns in molecules using pattern matching relations. The SMARTS notation enables the development of customized algorithms to extract targeted molecular structure information of interest for a variety of applications, including to identify functional (sub)groups used to describe molecular structures within the AIOMFAC model.</p>
      <p id="d2e208">Cheminformatics toolboxes such as the free Open Babel chemistry toolbox <xref ref-type="bibr" rid="bib1.bibx39" id="paren.20"/>, the Chemistry Development Kit <xref ref-type="bibr" rid="bib1.bibx58" id="paren.21"/>, the OEChem (OpenEye Scientific) software <xref ref-type="bibr" rid="bib1.bibx40" id="paren.22"/>, and the open-source RDKit software <xref ref-type="bibr" rid="bib1.bibx34" id="paren.23"/>, are capable of converting and managing chemical molecular data and can be utilized to apply existing or newly developed tools for substructure pattern matching <xref ref-type="bibr" rid="bib1.bibx1 bib1.bibx19" id="paren.24"/>. These tools can match the substructures of given functional groups by parsing molecular structures that are internally stored as assemblies of atoms and associated bonds using SMARTS strings for queries <xref ref-type="bibr" rid="bib1.bibx49" id="paren.25"/>. Since several AIOMFAC-based functional groups differ from the UNIFAC-based functional groups, an AIOMFAC-specific SMARTS-based pattern matching algorithm was developed (see Sect. <xref ref-type="sec" rid="Ch1.S2.SS1.SSS1"/>) based on the open-source cheminformatics API from the epam Indigo toolkit <xref ref-type="bibr" rid="bib1.bibx46" id="paren.26"/>, which builds on the Open Babel toolbox and offers an efficient and user-friendly option for customizing SMILES–SMARTS applications.</p>
</sec>
<sec id="Ch1.S1.SS3">
  <label>1.3</label><title>Need for reduced-complexity frameworks</title>
      <p id="d2e243">Coupled liquid–liquid phase separation and gas–particle partitioning calculations, such as with the AIOMFAC-based model, are limited to systems containing less than <inline-formula><mml:math id="M1" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">1000</mml:mn></mml:mrow></mml:math></inline-formula> components for reasons of computational speed and limited random access memory – and in many practical applications to systems of less than <inline-formula><mml:math id="M2" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">50</mml:mn></mml:mrow></mml:math></inline-formula> components. Therefore, output from near-explicit chemical mechanism simulations need to be drastically reduced in complexity. To address this at the system level, a two-dimensional (2D) structure–property space and related component lumping framework is introduced in this study. The main purpose of our framework is to effectively lump the hundreds to millions of system components into a manageable set of representative surrogate components while retaining an overall similar gas–particle partitioning behaviour. Furthermore, the method is designed to select surrogate components in an objective, automated manner, offering applications beyond the main use case discussed in this study.</p>
      <p id="d2e266">Our scheme builds on related prior work by <xref ref-type="bibr" rid="bib1.bibx44" id="text.27"/>, who introduced a similar 2D carbon-number–polarity grid and the work by <xref ref-type="bibr" rid="bib1.bibx31 bib1.bibx17 bib1.bibx33" id="text.28"/> and <xref ref-type="bibr" rid="bib1.bibx18" id="text.29"/>, who introduced so-called volatility basis set spaces to characterize chemical compound evolution and/or thermodynamic mixing behaviour of organic aerosol systems. In contrast to a 1-dimensional (1D) volatility basis set (VBS) <xref ref-type="bibr" rid="bib1.bibx16 bib1.bibx56" id="paren.30"><named-content content-type="pre">e.g.,</named-content></xref>, a 2D scheme allows for a more nuanced representation of complex organic aerosol systems by considering both volatility and polarity (hygroscopicity) characteristics of individual components. The 2D space also offers a visual representation of the chemical diversity within organic aerosol systems, enabling researchers to identify patterns and time evolution trends in component behaviour. This approach facilitates a more intuitive understanding of complex aerosol systems and aids in the development of simplified models that retain essential physicochemical characteristics. Our approach introduces a new polarity metric and offers a flexible framework that can be adapted to various levels of detail required for different modelling scenarios. The restriction to two dimensions is both related to the theoretical basis of the dominant factors of volatility and polarity (and related nonideal mixing) on SOA gas–particle partitioning, as well as to account for the trade-off between computational cost and resolved details.</p>
      <p id="d2e283">Section <xref ref-type="sec" rid="Ch1.S2"/> describes our chain of tools developed for automatic characterization of the relevant pure-component properties as well as the use of different polarity axis choices and surrogate selection methods in our new 2D lumping framework. Section <xref ref-type="sec" rid="Ch1.S3"/> shows applications to example systems generated by MCM or GECKO-A and discusses the performance of the new tools.</p>
</sec>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Methods and data</title>
      <p id="d2e299">The equilibrium gas–particle partitioning model applied in this study has been introduced in previous work <xref ref-type="bibr" rid="bib1.bibx72 bib1.bibx69" id="paren.31"/>. Briefly, this thermodynamic equilibrium model is built around the AIOMFAC thermodynamic model of nonideal mixing <xref ref-type="bibr" rid="bib1.bibx71 bib1.bibx73" id="paren.32"/>. AIOMFAC predicts the mixing behaviour of organic–inorganic solutions by calculating the activity coefficients of electrolytes, water, and organics for any given (liquid or amorphous) mixture composition. The gas–particle partitioning calculations include the simultaneous consideration of liquid–liquid phase separation while treating the gas phase as an ideal gas mixture.</p>
      <p id="d2e308">Figure <xref ref-type="fig" rid="F1"/> provides a schematic overview of the set of methods used to automate the processing of molecular-level data for gas–particle partitioning calculations and component property characterization. The different pure-component or system-level tools (blue and yellow boxes) will be discussed in separate subsections in the following.  The gas-phase chemical mechanisms targeted in our work provide outputs at a selected time in the form of lists of components whose structures are expressed in (or converted to) SMILES format, alongside with the corresponding molecular amounts per unit volume of air, usually provided in units of <inline-formula><mml:math id="M3" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">molec</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> or <inline-formula><mml:math id="M4" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mol</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. These lists form the inputs to our multi-model toolchain (Fig. <xref ref-type="fig" rid="F1"/>). In terms of aerosol applications, one key question concerns how much of the organic mass remains in the gas phase and how much of it contributes to the condensed PM mass concentration under equilibrium conditions. Additional questions concern the hygroscopicity, the potential multi-phase structure of the aerosol material, and related morphology and surface properties. The methods introduced here support answering such questions quantitatively and systematically, even for cases of highly complex chemical mechanism outputs. As indicated in Fig. <xref ref-type="fig" rid="F1"/>, other temperature-dependent pure-component properties could be determined via existing or new SMILES- and SMARTS-based methods. Such examples include predictions of pure-component surface tension of interest for cloud droplet activation and liquid–liquid interfacial tension <xref ref-type="bibr" rid="bib1.bibx53 bib1.bibx61 bib1.bibx52" id="paren.33"/>, the solid- and liquid-state densities of organic compounds <xref ref-type="bibr" rid="bib1.bibx59 bib1.bibx23" id="paren.34"/>, and the glass transition temperature and related pure-component viscosity of interest for molecular diffusion and aerosol mixing timescale modelling <xref ref-type="bibr" rid="bib1.bibx22 bib1.bibx5 bib1.bibx15" id="paren.35"/>.</p>

      <fig id="F1"><label>Figure 1</label><caption><p id="d2e363">Schematic overview of the procedures for automatically handling detailed molecular-level data of the numerous components of an organic gas–aerosol system.  Component inputs include SMILES for molecular structure, temperature <inline-formula><mml:math id="M5" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula>, and the system's overall (gas plus particle) composition in terms of mass or molar species concentrations. At the pure-component level (blue boxes), SMARTS libraries are employed for estimating pure-component vapour pressure at <inline-formula><mml:math id="M6" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> and AIOMFAC (functional) subgroup characteristics of each molecule. At the system and/or mixture composition level (yellow boxes), 2D product lumping and surrogate selection is performed based on the volatility, polarity and mass concentrations of the system's organic compounds. Thermodynamic equilibrium partitioning computations are then performed using the selected surrogate system.</p></caption>
        <graphic xlink:href="https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026-f01.png"/>

      </fig>

      <p id="d2e387">As mentioned in Sect. <xref ref-type="sec" rid="Ch1.S1.SS2"/>, the SMILES format is a plain-text notation for describing a component's chemical structure in great detail. SMILES data can be processed by many existing chemical informatics tools. In this work, we make use of the tools based on the Open Babel project, the related Python bindings (via pybel) and/or the application processing interface (API) from the Indigo cheminformatics library (toolkit version 1.7.0) <xref ref-type="bibr" rid="bib1.bibx46" id="paren.36"/>. These third-party libraries can be imported into Python programs. They offer a straightforward means to processing SMILES input, including molecule completeness verification, conversion of generic SMILES into unique SMILES, and the application of customized SMARTS pattern matching. Of note, the Indigo cheminformatics library, even when accessed via its Python library, is running performance-critical computations using an efficient, compiled version of the Open Babel code and Indigo features written in the C++ language.</p>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Tools for pure-component property predictions</title>
<sec id="Ch1.S2.SS1.SSS1">
  <label>2.1.1</label><title>SMILES to AIOMFAC subgroups (S2AS) tool</title>
      <p id="d2e409">The SMILES to AIOMFAC subgroups (S2AS) tool is a new, automated algorithm written in Python. It is designed to identify and classify functional groups comprising organic aerosol components in the input format required by the AIOMFAC model. Details of AIOMFAC's representation of molecular structures by a so-called subgroup notation are provided elsewhere <xref ref-type="bibr" rid="bib1.bibx71 bib1.bibx73" id="paren.37"/>; see also examples at <uri>https://aiomfac.lab.mcgill.ca/about.html</uri> (last access: 25 May 2026). Briefly, the notation of aromatic and aliphatic organic compounds in AIOMFAC is based on that of UNIFAC; e.g. a molecule like ferulic acid (SMILES code: COc1cc(ccc1O)/C=C/C(=O)O), is comprised of the following AIOMFAC subgroups: 1 <inline-formula><mml:math id="M7" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> (<inline-formula><mml:math id="M8" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">CH</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="normal">CH</mml:mi></mml:mrow></mml:math></inline-formula>), 3 <inline-formula><mml:math id="M9" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> (<inline-formula><mml:math id="M10" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">ACH</mml:mi></mml:mrow></mml:math></inline-formula>), 2 <inline-formula><mml:math id="M11" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> (<inline-formula><mml:math id="M12" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">AC</mml:mi></mml:mrow></mml:math></inline-formula>), 1 <inline-formula><mml:math id="M13" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> (<inline-formula><mml:math id="M14" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">ACOH</mml:mi></mml:mrow></mml:math></inline-formula>), 1 <inline-formula><mml:math id="M15" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> (<inline-formula><mml:math id="M16" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mi mathvariant="normal">O</mml:mi></mml:mrow></mml:math></inline-formula>), 1 <inline-formula><mml:math id="M17" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> (<inline-formula><mml:math id="M18" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">COOH</mml:mi></mml:mrow></mml:math></inline-formula>), where AC denotes aromatic carbon (lower-case c in SMILES). Unlike in SMILES notation, the subgroup input format for AIOMFAC explicitly states the hydrogen atoms, yet the subgroup notation does not contain information about how the subgroups are connected to each other (because AIOMFAC does not require that information). Previously, AIOMFAC subgroup assignments for organic molecules had to be determined either manually (for small sets of structures) or using limited tool-specific pattern matching <xref ref-type="bibr" rid="bib1.bibx59" id="paren.38"><named-content content-type="pre">e.g., the UManSysProp facility;</named-content></xref>. Our S2AS tool automates this process for arbitrary molecules. It can process tens of thousands of compounds in a consistent way, whereas manual assignment would be prohibitively laborious and prone to errors or inconsistencies.</p>
      <p id="d2e524">We note that the existing list of subgroups in AIOMFAC (about 60 subgroups supported for organic compounds plus special subgroups for inorganics) has limitations when it comes to representing rather exotic, highly functionalized compounds, which may not allow for a perfect mapping by the S2AS tool. Consequently, we implemented a mechanism for detecting and handling exceptions, in most cases by introduction of additional SMARTS patterns to cover these cases. Encountering an exception typically means either that not all atoms can be uniquely associated with only one AIOMFAC subgroup, that a functionality needs to be approximated by an imperfect combination of existing subgroups, e.g. in the case of secondary ozonide functionalities, or that after parsing all existing SMARTS patterns, one or several unmatched atoms remain. Treating such exceptions by an algorithm allows for a consistent, user-independent approximation of the suboptimal mapping. Furthermore, encountered exception cases can be flagged to indicate the potential need for an additional SMARTS pattern to recognize a special case and/or to motivate future improvements by introducing new subgroups into AIOMFAC. To this end, based on our tests with tens of thousands of compounds generated by GECKO-A or MCM (see Sect. <xref ref-type="sec" rid="Ch1.S3"/>), unhandled exception cases are a rare occurrence.</p>
      <p id="d2e529">The S2AS program carries out the following key steps: (i) parsing of each SMILES string from an input list to determine whether the SMILES input is valid and whether the component falls into the special category of being a pure alcohol or polyol according to the definition used by AIOMFAC; (ii) rendering of molecules for structure visualization as portable network graphics (.png) or scalable vector graphics (.svg) files (this step is optional); (iii) matching substructures to related AIOMFAC subgroups by iterating over a list of SMARTS as outlined in the flowchart of Fig. <xref ref-type="fig" rid="F2"/>. During step (i), a character string filter is also applied to remove chirality information from input SMILES, which is unnecessary for AIOMFAC subgroups, and to replace radical atoms in a SMILES string by the corresponding non-radical atom (e.g., [O.] by O). The latter is done since AIOMFAC does not support radicals. The pure-component properties and thermodynamic mixing behaviour of such a compound is then approximated by that of a similar non-radical molecule.</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e537">Flowchart illustrating the substructure pattern matching algorithm incorporated in the S2AS tool. SMARTS rules corresponding to all available AIOMFAC functional groups (subgroups) have been formulated in a priority-ordered list; see Table <xref ref-type="table" rid="T1a"/>.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026-f02.png"/>

          </fig>

      <p id="d2e548">Our implemented SMARTS pattern matching process follows a hierarchical, priority-based querying approach, with relatively large subgroups, such as peroxy acyl nitrate (SMARTS: [CH0;X3](=O)OO[NX3;0,+1](=O)-,=[O;0,-1]), assigned one of the highest matching priorities, while the SMARTS pattern for a single aliphatic carbon bonded to two non-hydrogen atoms and two hydrogens (SMARTS: [CH2]), is among the last patterns applied. The query list and order of SMARTS patterns is provided in Table <xref ref-type="table" rid="T1a"/>. The company Daylight Chemical Information Systems, Inc., provides manuals, examples and tutorials for understanding and customizing the SMARTS and SMILES languages on their website (<uri>https://www.daylight.com</uri>, last access: 16 June 2025).</p>
      <p id="d2e556">Figures <xref ref-type="fig" rid="F3"/> and <xref ref-type="fig" rid="F4"/> provide examples of the individual mappings of AIOMFAC subgroups by SMARTS. Alkyl groups, having the lowest matching priority, are matched after all other groups. Pure aliphatic alcohols and polyols are initially detected as such and treated in a separate code branch based on a distinct list of SMARTS, as demonstrated in the example of Fig. <xref ref-type="fig" rid="F4"/>. Based on the polyol-specific subgroups and nomenclature introduced into a variant of UNIFAC by <xref ref-type="bibr" rid="bib1.bibx36" id="paren.39"/>, which is also supported in AIOMFAC, the alcohols and polyols make use of a set of special alkyl subgroups for added specificity and better accuracy of AIOMFAC water uptake and liquid–liquid equilibrium predictions for this class of compounds. A key strength of the S2AS tool is its ability to systematically and efficiently account for the special alkyl groups in a wide variety of straight-chain, branched and cyclical aliphatic alcohols/polyols. For these compounds, the algorithm in S2AS follows a three-step procedure. In step (1) we determine and count the <inline-formula><mml:math id="M19" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula>, 1, 2, 3) directly bonded to <inline-formula><mml:math id="M21" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">OH</mml:mi></mml:mrow></mml:math></inline-formula> groups as well as the associated <inline-formula><mml:math id="M22" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">OH</mml:mi></mml:mrow></mml:math></inline-formula> groups, which are separate subgroups. In step (2), we match all alkyl groups belonging to hydrophobic tails by first marking alkyl chains that terminate in <inline-formula><mml:math id="M23" display="inline"><mml:mrow class="chem"><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> (where <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula>, 1, 2) as tails and then iteratively following along those alkyl chains one <inline-formula><mml:math id="M25" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> group at a time. An alkyl group is part of a hydrophobic tail if, and only if, it connects to at least one other alkyl group known to be part of a hydrophobic tail, while not being bonded to an <inline-formula><mml:math id="M26" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">OH</mml:mi></mml:mrow></mml:math></inline-formula> group (those were already determined in step 1). In step (3) all thus far unassigned alkyl groups are detected and classified as being of “alkyl within alcohols” <inline-formula><mml:math id="M27" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula>, 1, 2, 3) subgroup type.</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e685">Example of AIOMFAC subgroup determination. Sequences 1–8 show the selection of SMARTS patterns for which at least one match was found, while the structure is probed sequentially by SMARTS from the list (ordered from high to low priority; see Table <xref ref-type="table" rid="T1a"/>). Atoms matched by a stated SMARTS pattern are highlighted in orange, e.g. the ketone group in panel 1, and subsequently ignored. Light grey colouring denotes ignored atoms/bonds at a certain SMARTS parsing stage.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026-f03.png"/>

          </fig>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e698">Example of the distinct AIOMFAC subgroup pattern matching applied to pure aliphatic alcohols or polyols, following the distinction of several alkyl group types introduced by <xref ref-type="bibr" rid="bib1.bibx36" id="text.40"/>. The figure illustrates our three-step algorithm to correctly identify alkyl groups in structures qualifying as pure aliphatic alcohols/polyols. Highlighted SMARTS matches follow the colour scheme of <xref ref-type="bibr" rid="bib1.bibx36" id="text.41"/>.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026-f04.png"/>

          </fig>

      <p id="d2e714">In general, the implementation of a hierarchical order and processing of the list of SMARTS is highly advantageous. It allows one to write the SMARTS codes for subgroups of lower priority in a far simpler notation than when each SMARTS code were required to work correctly regardless of the order of execution. For example, if [<inline-formula><mml:math id="M29" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>] were applied as one of the first SMARTS pattern tested, it would likely result in several unwanted matches, such as matching the <inline-formula><mml:math id="M30" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> atoms associated with a ketone subgroup (e.g., <inline-formula><mml:math id="M31" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="normal">CO</mml:mi></mml:mrow></mml:math></inline-formula> in AIOMFAC notation); subsequently, the <inline-formula><mml:math id="M32" display="inline"><mml:mrow class="chem"><mml:mo>&gt;</mml:mo><mml:mi mathvariant="normal">C</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi></mml:mrow></mml:math></inline-formula> of the remaining atoms of the ketone group would not be detected as being part of a full ketone subgroup (clearly a mistake). To avoid this, the SMARTS pattern for only detecting intended <inline-formula><mml:math id="M33" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> groups would need to be much more complicated, such that it would avoid matching atoms that could be matched as part of a bigger substructure. When a SMARTS match is found for the molecule under consideration, the corresponding matched atoms are marked and excluded from further parsing if unmatched atoms remain in the component; the Indigo toolkit provides “ignore atom” and “highlight atom” functions that conveniently aid in avoiding any unwanted double-matching of atoms. A numerical counter corresponding to the key of the matched SMARTS rule and associated subgroup is incremented for each successful pattern match and added to the subgroup-array representation of the compound for later S2AS output. After all atoms have been matched or when the end of the list of SMARTS is reached, a check is performed to determine whether any unmatched atoms remain in a given molecule, potentially indicating an exception case (very rare). After, the next molecule from the SMILES list is processed. Once all SMILES have been processed, the S2AS program outputs a text file in the format of AIOMFAC-web input files (see examples at <uri>https://aiomfac.lab.mcgill.ca/about.html</uri>, last access: 25 May 2026).</p>
      <p id="d2e780">To validate the S2AS tool and associated SMARTS patterns, we used a comprehensive data set of aerosol-relevant organic compounds, including those produced by MCM v3.3.1 simulations for monoterpenes and alkanes, an excerpt is shown in Fig. <xref ref-type="fig" rid="F5"/>. These tests serve as proof of concept of the tool's ability to extract and convert detailed molecular information from a diverse range of compounds commonly encountered in atmospheric chemistry simulations.</p>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e787">A sample subset of input SMILES strings on the left hand side and their corresponding output functional subgroups generated by the S2AS tool on the right hand side. The generated text file can serve as input file for the AIOMFAC model alongside with mixture composition information.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026-f05.png"/>

          </fig>

</sec>
<sec id="Ch1.S2.SS1.SSS2">
  <label>2.1.2</label><title>Pure-component vapour pressure estimation</title>
      <p id="d2e804">The UManSysProp project developed by <xref ref-type="bibr" rid="bib1.bibx59" id="text.42"/> is an open-source facility that employs cheminformatics tools for molecular and mixture property predictions. This facility allows users to input molecular information in the SMILES format, from which the relevant information for aerosol property calculations are extracted <xref ref-type="bibr" rid="bib1.bibx59" id="paren.43"/>. This tool utilizes the Open Babel and pybel cheminformatics libraries for the parsing of molecules using tool-specific SMARTS pattern matching, similar to the AIOMFAC-specific S2AS procedure described in Sect. <xref ref-type="sec" rid="Ch1.S2.SS1.SSS1"/>. Of most relevance for this study, the pure-component, liquid-state saturation vapour pressure tool, available as part of the UManSysProp facility, provides several methods for estimating temperature-dependent vapour pressures. It uses method-specific SMARTS pattern matching to calculate the pure-component vapour pressures using predictive models, including the Estimation of VApour Pressure of ORganics, Accounting for Temperature, Intramolecular, and Non-additivity effects (EVAPORATION) model <xref ref-type="bibr" rid="bib1.bibx14" id="paren.44"/>, the model by <xref ref-type="bibr" rid="bib1.bibx38" id="text.45"/>, hereafter called Nanoolal method, and the SIMPOL method <xref ref-type="bibr" rid="bib1.bibx43" id="paren.46"/>. The source code for the original and subsequent releases of UManSysProp is available from an online repository (see code and data availability). We note that UManSysProp also includes a few tools for aerosol mixture predictions, including a version of the AIOMFAC model and related SMARTS patterns for generating AIOMFAC input data. However, the list of SMARTS patterns for AIOMFAC in UManSysProp differs in several ways from the more extensive SMARTS list and related S2AS program introduced in this study. Specifically, <xref ref-type="bibr" rid="bib1.bibx59" id="text.47"/> do not follow the same SMARTS priority order, use a different, less comprehensive approach for handling matching exceptions, and the way pure alcohol compounds are detected and mapped may differ as well for more complex polyols. Hence, SMARTS codes from their study are not directly transferrable for use in the S2AS tool.</p>
      <p id="d2e828">Several studies have compared liquid-state pure-component vapour pressure prediction methods suitable for SOA systems alongside with critical evaluations of existing experimental data for the particularly relevant semi-volatile and low-volatility compounds. It has been shown that, when applicable, the EVAPORATION method and the Nanoolal method are among the best performing options in those volatility ranges of particular importance for the gas–particle partitioning of SOA <xref ref-type="bibr" rid="bib1.bibx7 bib1.bibx41 bib1.bibx9" id="paren.48"><named-content content-type="pre">e.g.,</named-content></xref>. The Nanoolal method is more versatile in terms of the variety of functional groups and chemical elements covered, while the EVAPORATION method is constrained to compounds containing the elements C, H, O, N. However, these are also the main elements supported in AIOMFAC and we opted to use the EVAPORATION method as our first choice for the lumping framework and the gas–particle partitioning calculations.</p>
      <p id="d2e836">The parameterization of the temperature-dependence of pure-component vapour pressures used in our work is identical to the two-parameter relation introduced for the EVAPORATION model <xref ref-type="bibr" rid="bib1.bibx14" id="paren.49"/>: 

              <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M34" display="block"><mml:mrow><mml:msub><mml:mi>log⁡</mml:mi><mml:mn mathvariant="normal">10</mml:mn></mml:msub><mml:mfenced close="]" open="["><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msubsup><mml:mi>p</mml:mi><mml:mi>j</mml:mi><mml:mo>∘</mml:mo></mml:msubsup></mml:mrow><mml:mrow><mml:msup><mml:mi>p</mml:mi><mml:mtext>ref</mml:mtext></mml:msup></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mo>=</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msup><mml:mi>T</mml:mi><mml:mi mathvariant="italic">κ</mml:mi></mml:msup></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

            Here, <inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:msubsup><mml:mi>p</mml:mi><mml:mi>j</mml:mi><mml:mo>∘</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> denotes the pure-component, liquid-state (saturation) vapour pressure of component <inline-formula><mml:math id="M36" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula> in units of atmospheres (<inline-formula><mml:math id="M37" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">atm</mml:mi></mml:mrow></mml:math></inline-formula>) and <inline-formula><mml:math id="M38" display="inline"><mml:mrow><mml:msup><mml:mi>p</mml:mi><mml:mtext>ref</mml:mtext></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mrow class="unit"><mml:mi mathvariant="normal">atm</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula> is the unit reference pressure. <inline-formula><mml:math id="M39" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> is the temperature (<inline-formula><mml:math id="M40" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">K</mml:mi></mml:mrow></mml:math></inline-formula>), and <inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are two component-specific parameters. A common value of <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:mi mathvariant="italic">κ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1.5</mml:mn></mml:mrow></mml:math></inline-formula> was adopted based on optimization tests by <xref ref-type="bibr" rid="bib1.bibx14" id="text.50"/>. It was shown to be appropriate for estimating vapour pressure of hydrocarbons with or without heteroatoms across a wide temperature range. The <inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> values are directly predicted by the EVAPORATION model. Alternatively, they can be determined by running any pure-component vapour pressure prediction method at two sufficiently distinct temperatures (typically <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>T</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">30</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mrow class="unit"><mml:mi mathvariant="normal">K</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>), including the temperature interval of interest, followed by solving the system of two linear equations for <inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. The use of Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>) enables the flexibility of <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:msubsup><mml:mi>p</mml:mi><mml:mi>j</mml:mi><mml:mo>∘</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> estimation at any reasonable temperature for the given set of input molecules, serving as a key input to the gas–particle partitioning model. Computing the <inline-formula><mml:math id="M50" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M51" display="inline"><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> parameters for all organic aerosol system components once eliminates the need for calling the EVAPORATION model repeatedly when the temperature changes, thus improving computational efficiency.</p>
      <p id="d2e1099">This approach streamlines the application of distinct SMILES-based tools for (any) pure-component properties and enhances the flexibility of the gas–particle partitioning model in terms of its readiness for computations being carried out over a range of temperatures. In summary, the UManSysProp pure-component property models (written in Python) are run for a given list of SMILES characterizing a system. The <inline-formula><mml:math id="M52" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M53" display="inline"><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> parameters and corresponding SMILES of all organic system components are then written to a text file for read-access by other tools. In particular, the list of parameters is read by the Fortran code of the AIOMFAC equilibrium model, which makes subsequent use of Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>) to obtain the <inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:msubsup><mml:mi>p</mml:mi><mml:mi>j</mml:mi><mml:mo>∘</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> at a temperature of interest. The pure-component vapour pressure files are also used in the 2D lumping framework described next.</p>

<table-wrap id="T1a" specific-use="star"><label>Table 1</label><caption><p id="d2e1143">Priority-ordered SMARTS query list for the parsing of non-polyol aliphatic and aromatic organic component SMILES and associated matching to the corresponding AIOMFAC subgroups. The key value indicates the corresponding AIOMFAC subgroup identifier a pattern will be mapped to – or, in exception cases (values <inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">300</mml:mn></mml:mrow></mml:math></inline-formula>), an index for exception handling by the S2AS program.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="6.5cm"/>
     <oasis:colspec colnum="3" colname="col3" align="justify" colwidth="9cm"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Key</oasis:entry>
         <oasis:entry colname="col2" align="left">SMARTS</oasis:entry>
         <oasis:entry colname="col3" align="left">Description and remarks</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">172</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH0;X3](=O)OO[NX3;0,+1](=O)-,=[O;0,-1]</oasis:entry>
         <oasis:entry colname="col3" align="left">peroxy acyl nitrate <inline-formula><mml:math id="M56" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi><mml:mo>(</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>)</mml:mo><mml:msub><mml:mi mathvariant="normal">OONO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">155</oasis:entry>
         <oasis:entry colname="col2" align="left">[C;H2,H3][OH0X2][NX3;+0,+1](=O)-,=[O;0,-1]</oasis:entry>
         <oasis:entry colname="col3" align="left">organonitrate <inline-formula><mml:math id="M57" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">ONO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> subgroup (also map <inline-formula><mml:math id="M58" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">ONO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> as exception case)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">156</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH1][OH0X2][NX3;+0,+1](=O)-,=[O;0,-1]</oasis:entry>
         <oasis:entry colname="col3" align="left">organonitrate <inline-formula><mml:math id="M59" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CHONO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">157</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH0][OH0X2][NX3;+0,+1](=O)-,=[O;0,-1]</oasis:entry>
         <oasis:entry colname="col3" align="left">organonitrate <inline-formula><mml:math id="M60" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CONO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">157</oasis:entry>
         <oasis:entry colname="col2" align="left">[OH0;X2][OH0X2][NX3;+0,+1](=O)-,=[O;0,-1]</oasis:entry>
         <oasis:entry colname="col3" align="left">exception: peroxide organonitrate <inline-formula><mml:math id="M61" display="inline"><mml:mrow class="chem"><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">ONO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> subgroup mapped as <inline-formula><mml:math id="M62" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CONO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> group due to lack of a specific subgroup;</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">1550</oasis:entry>
         <oasis:entry colname="col2" align="left">[OH0X2][NX3;+0,+1](=O)-,=[O;0,-1]</oasis:entry>
         <oasis:entry colname="col3" align="left">exception: special organonitrate <inline-formula><mml:math id="M63" display="inline"><mml:mrow class="chem"><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">ONO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> subgroup without the <inline-formula><mml:math id="M64" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> group (remove one <inline-formula><mml:math id="M65" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> later if possible);</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">54</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH3;X4][NX3;+0,+1](=O)-,=[O;0,-1]</oasis:entry>
         <oasis:entry colname="col3" align="left"><inline-formula><mml:math id="M66" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> nitro group</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">55</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH2;X4][NX3;+0,+1](=O)-,=[O;0,-1]</oasis:entry>
         <oasis:entry colname="col3" align="left"><inline-formula><mml:math id="M67" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">NO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> nitro group</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">56</oasis:entry>
         <oasis:entry colname="col2" align="left">[C;H1,H0][NX3;+0,+1](=O)-,=[O;0,-1]</oasis:entry>
         <oasis:entry colname="col3" align="left"><inline-formula><mml:math id="M68" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CHNO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> nitro group (or as exception: <inline-formula><mml:math id="M69" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CNO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">57</oasis:entry>
         <oasis:entry colname="col2" align="left">[cH0][NX3;+0,+1](=O)-,=[O;0,-1]</oasis:entry>
         <oasis:entry colname="col3" align="left">aromatic nitro group <inline-formula><mml:math id="M70" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">ACNO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">560</oasis:entry>
         <oasis:entry colname="col2" align="left">[NX3;+0,+1](=O)-,=[O;0,-1]</oasis:entry>
         <oasis:entry colname="col3" align="left">pure nitro group; exception for one carbon having two such groups</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">28</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH3;X4]-[NH2;X3]</oasis:entry>
         <oasis:entry colname="col3" align="left">primary amine <inline-formula><mml:math id="M71" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">29</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH2;X4]-[NH2;X3]</oasis:entry>
         <oasis:entry colname="col3" align="left">primary amine <inline-formula><mml:math id="M72" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">30</oasis:entry>
         <oasis:entry colname="col2" align="left">[C;H1,H0;X4]-[NH2;X3]</oasis:entry>
         <oasis:entry colname="col3" align="left">primary amine <inline-formula><mml:math id="M73" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CHNH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> subgroup; as exception also for <inline-formula><mml:math id="M74" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">NH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">31</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH3;X4]-[NH1;X3]</oasis:entry>
         <oasis:entry colname="col3" align="left">secondary amine <inline-formula><mml:math id="M75" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mi mathvariant="normal">NH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">32</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH2;X4]-[NH1;X3]</oasis:entry>
         <oasis:entry colname="col3" align="left">secondary amine <inline-formula><mml:math id="M76" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="normal">NH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">33</oasis:entry>
         <oasis:entry colname="col2" align="left">[C;H1,H0;X4]-[NH1;X3]</oasis:entry>
         <oasis:entry colname="col3" align="left">secondary amine CHNH subgroup; as exception also for <inline-formula><mml:math id="M77" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">CH</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="normal">NH</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">1027</oasis:entry>
         <oasis:entry colname="col2" align="left">[c]1:[c;H0;X3]:[o]:[c]:[c]1</oasis:entry>
         <oasis:entry colname="col3" align="left">furfural variant, mapped as 2 AC + 1 ACH + 1 ether subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">1026</oasis:entry>
         <oasis:entry colname="col2" align="left">[c;H0;X3]1:[c]:[o]:[c]:[c]1</oasis:entry>
         <oasis:entry colname="col3" align="left">furfural, mapped as 2 AC + 1 ACH + 1 ether subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">926</oasis:entry>
         <oasis:entry colname="col2" align="left">[c]1:[c]:[o]:[c]:[c]1</oasis:entry>
         <oasis:entry colname="col3" align="left">furan, mapped as 3 ACH subgroups + 1 ether subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">161</oasis:entry>
         <oasis:entry colname="col2" align="left">[CX3](=[OH0])[OH0;X2][OH1;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">peroxy acid <inline-formula><mml:math id="M78" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi><mml:mo>(</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>)</mml:mo><mml:mi mathvariant="normal">OOH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">158</oasis:entry>
         <oasis:entry colname="col2" align="left">[C;H2,H3][OH0;X2][OH1]</oasis:entry>
         <oasis:entry colname="col3" align="left">hydroperoxide <inline-formula><mml:math id="M79" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="normal">OOH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup; (as exception also <inline-formula><mml:math id="M80" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mi mathvariant="normal">OOH</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">159</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH1][OH0;X2][OH1]</oasis:entry>
         <oasis:entry colname="col3" align="left">hydroperoxide <inline-formula><mml:math id="M81" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">CHOOH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">160</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH0,cH0][OH0;X2][OH1]</oasis:entry>
         <oasis:entry colname="col3" align="left">hydroperoxide <inline-formula><mml:math id="M82" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">COOH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup (or as exception also aromatic hydroperoxide cOOH)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">1580</oasis:entry>
         <oasis:entry colname="col2" align="left">[$([OH0;X2]-[C])][OH1]</oasis:entry>
         <oasis:entry colname="col3" align="left">exception: <inline-formula><mml:math id="M83" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>-ignored hydroperoxide <inline-formula><mml:math id="M84" display="inline"><mml:mrow class="chem"><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="normal">OH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup, mapped as <inline-formula><mml:math id="M85" display="inline"><mml:mrow class="chem"><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="normal">OH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup minus 1 alkyl subgroup (<inline-formula><mml:math id="M86" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, if possible)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">43</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH1;X3](=O)[OH1;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">formic acid <inline-formula><mml:math id="M87" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">HC</mml:mi><mml:mo>(</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>)</mml:mo><mml:mi mathvariant="normal">OH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup/molecule</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">137</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH0;X3](=O)[OH1;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">carboxylic acid <inline-formula><mml:math id="M88" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi><mml:mo>(</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>)</mml:mo><mml:mi mathvariant="normal">OH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">2224</oasis:entry>
         <oasis:entry colname="col2" align="left">[CX4][CH0;X3](=O)[OH0;X2]-[OH0;X2;A]</oasis:entry>
         <oasis:entry colname="col3" align="left">exception: perester group <inline-formula><mml:math id="M89" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mi mathvariant="normal">C</mml:mi><mml:mo>(</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo></mml:mrow></mml:math></inline-formula> as exception to ester group; mapped as <inline-formula><mml:math id="M90" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mi mathvariant="normal">COO</mml:mi></mml:mrow></mml:math></inline-formula> + <inline-formula><mml:math id="M91" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mi mathvariant="normal">O</mml:mi></mml:mrow></mml:math></inline-formula> subgroups minus <inline-formula><mml:math id="M92" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> subgroup to account for correct number of C and O atoms</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">2220</oasis:entry>
         <oasis:entry colname="col2" align="left">[CX3;H0,H1](=O)[CH0;X3](=O)[OH0;X2]-[OH0;X2;A]</oasis:entry>
         <oasis:entry colname="col3" align="left">exception: perester + carbonyl group <inline-formula><mml:math id="M93" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>)</mml:mo><mml:mi mathvariant="normal">C</mml:mi><mml:mo>(</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo></mml:mrow></mml:math></inline-formula> mapped in part by using aldehyde group and deducting <inline-formula><mml:math id="M94" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> group</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">2022</oasis:entry>
         <oasis:entry colname="col2" align="left">[CX4]-[CH0;X3](=O)[CH0;X3](=O)[OH0;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">exception: ester + aldehyde group for <inline-formula><mml:math id="M95" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mi mathvariant="normal">COO</mml:mi></mml:mrow></mml:math></inline-formula> + <inline-formula><mml:math id="M96" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi></mml:mrow></mml:math></inline-formula> combination of subgroups</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<table-wrap id="T1b" specific-use="star"><label>Table 1</label><caption><p id="d2e2146">Continued.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="6.5cm"/>
     <oasis:colspec colnum="3" colname="col3" align="justify" colwidth="9cm"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Key</oasis:entry>
         <oasis:entry colname="col2" align="left">SMARTS</oasis:entry>
         <oasis:entry colname="col3" align="left">Description and remarks</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">2022</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH1;X3](=O)[CH0;X3](=O)[OH0;X2]-[C]</oasis:entry>
         <oasis:entry colname="col3" align="left">exception: ester + aldehyde group for <inline-formula><mml:math id="M97" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mi mathvariant="normal">COO</mml:mi></mml:mrow></mml:math></inline-formula> + <inline-formula><mml:math id="M98" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi></mml:mrow></mml:math></inline-formula> variant combination of subgroups</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">21</oasis:entry>
         <oasis:entry colname="col2" align="left">[CX4;H3][CH0;X3](=O)[OH0;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">ester <inline-formula><mml:math id="M99" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mi mathvariant="normal">COO</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">22</oasis:entry>
         <oasis:entry colname="col2" align="left">[CX4;H2][CH0;X3](=O)[OH0;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">ester <inline-formula><mml:math id="M100" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="normal">COO</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">22</oasis:entry>
         <oasis:entry colname="col2" align="left">[C;H1,H0][CH0;X3](=O)[OH0;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">exception: ester <inline-formula><mml:math id="M101" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="normal">COO</mml:mi></mml:mrow></mml:math></inline-formula> subgroup also used for <inline-formula><mml:math id="M102" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mi mathvariant="normal">COO</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M103" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mi mathvariant="normal">COO</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">18</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH3;X4][CH0;X3](=[OX1])</oasis:entry>
         <oasis:entry colname="col3" align="left">ketone <inline-formula><mml:math id="M104" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mi mathvariant="normal">C</mml:mi><mml:mo>(</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">19</oasis:entry>
         <oasis:entry colname="col2" align="left">[C;H2,H1;X4][CH0;X3](=[OX1])</oasis:entry>
         <oasis:entry colname="col3" align="left">ketone <inline-formula><mml:math id="M105" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="normal">C</mml:mi><mml:mo>(</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> subgroup (also for <inline-formula><mml:math id="M106" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mi mathvariant="normal">C</mml:mi><mml:mo>(</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> as exception case)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">20</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH1;X3;+0,+1](=O)</oasis:entry>
         <oasis:entry colname="col3" align="left">aldehyde <inline-formula><mml:math id="M107" display="inline"><mml:mrow class="chem"><mml:mo>-</mml:mo><mml:mi mathvariant="normal">CH</mml:mi><mml:mo>(</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">20</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH0;X3](=O)</oasis:entry>
         <oasis:entry colname="col3" align="left">exception: aldehyde subgroup if ketone group cannot be used for a carbonyl in multifunctional structure</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">20</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH2]=O</oasis:entry>
         <oasis:entry colname="col3" align="left">formaldehyde; special <inline-formula><mml:math id="M108" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi></mml:mrow></mml:math></inline-formula> subgroup mapped to subgroup 20</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">162</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH3][OH0;X2][OH0;X2][CH3]</oasis:entry>
         <oasis:entry colname="col3" align="left">peroxide <inline-formula><mml:math id="M109" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">OOCH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">163</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH3][OH0;X2][OH0;X2][CH2]</oasis:entry>
         <oasis:entry colname="col3" align="left">peroxide <inline-formula><mml:math id="M110" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">OOCH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">164</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH3][OH0;X2][OH0;X2][CH1]</oasis:entry>
         <oasis:entry colname="col3" align="left">peroxide <inline-formula><mml:math id="M111" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mi mathvariant="normal">OOCH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">165</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH3][OH0;X2][OH0;X2][CH0]</oasis:entry>
         <oasis:entry colname="col3" align="left">peroxide <inline-formula><mml:math id="M112" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mi mathvariant="normal">OOC</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">166</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH2][OH0;X2][OH0;X2][CH2]</oasis:entry>
         <oasis:entry colname="col3" align="left">peroxide <inline-formula><mml:math id="M113" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">OOCH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">167</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH2][OH0;X2][OH0;X2][CH1]</oasis:entry>
         <oasis:entry colname="col3" align="left">peroxide <inline-formula><mml:math id="M114" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="normal">OOCH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">168</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH2][OH0;X2][OH0;X2][CH0]</oasis:entry>
         <oasis:entry colname="col3" align="left">peroxide <inline-formula><mml:math id="M115" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="normal">OOC</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">169</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH1][OH0;X2][OH0;X2][CH1]</oasis:entry>
         <oasis:entry colname="col3" align="left">peroxide <inline-formula><mml:math id="M116" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">CHOOCH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">170</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH1][OH0;X2][OH0;X2][CH0]</oasis:entry>
         <oasis:entry colname="col3" align="left">peroxide <inline-formula><mml:math id="M117" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">CHOOC</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">171</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH0][OH0;X2][OH0;X2][CH0]</oasis:entry>
         <oasis:entry colname="col3" align="left">peroxide <inline-formula><mml:math id="M118" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">COOC</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">1700</oasis:entry>
         <oasis:entry colname="col2" align="left">[C;H0,H1,H2][OH0;X2][OH0;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">exception: special peroxide <inline-formula><mml:math id="M119" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo></mml:mrow></mml:math></inline-formula> subgroup when second carbon atom at end is already matched to another group; in this case we map <inline-formula><mml:math id="M120" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mi mathvariant="normal">OO</mml:mi></mml:mrow></mml:math></inline-formula> group to the CHOOC subgroup and deduct one alkyl (<inline-formula><mml:math id="M121" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) group detected later (if possible)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">1710</oasis:entry>
         <oasis:entry colname="col2" align="left">[OH0;X2;A]-[OH0;X2;A]</oasis:entry>
         <oasis:entry colname="col3" align="left">exception: peroxide <inline-formula><mml:math id="M122" display="inline"><mml:mrow class="chem"><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo></mml:mrow></mml:math></inline-formula> subgroup when both carbon atoms at ends are already matched to other groups; in this case we map <inline-formula><mml:math id="M123" display="inline"><mml:mrow class="chem"><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo></mml:mrow></mml:math></inline-formula> to the COOC subgroup and deduct two alkyl (<inline-formula><mml:math id="M124" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) groups detected later (if possible)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">1120</oasis:entry>
         <oasis:entry colname="col2" align="left">[CX4;H2]([OX2;H1])[OX2;H1]</oasis:entry>
         <oasis:entry colname="col3" align="left">geminal diol case (<inline-formula><mml:math id="M125" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> aliphatic)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">1121</oasis:entry>
         <oasis:entry colname="col2" align="left">[CX4;H1]([OX2;H1])[OX2;H1]</oasis:entry>
         <oasis:entry colname="col3" align="left">geminal diol case (<inline-formula><mml:math id="M126" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> aliphatic)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">1122</oasis:entry>
         <oasis:entry colname="col2" align="left">[CX4;H0]([OX2;H1])[OX2;H1]</oasis:entry>
         <oasis:entry colname="col3" align="left">geminal diol case (C aliphatic)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">154</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH2;X4;R0][OH0;X2;R0][CH2;X4;R0;$([CH2] [OH0][CH2][CH2][OH0][CH2])]</oasis:entry>
         <oasis:entry colname="col3" align="left">oxyethylene group (<inline-formula><mml:math id="M127" display="inline"><mml:mrow class="chem"><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>-</mml:mo></mml:mrow></mml:math></inline-formula>) in oligomers like Poly(ethylene glycol), PEG; has to have at least two of these groups in succession</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">27</oasis:entry>
         <oasis:entry colname="col2" align="left">[CX4;H2;r5;R1;$([CH2;R1][CH2;R1][CH2;R1])] [OH0;r5;R1]</oasis:entry>
         <oasis:entry colname="col3" align="left">tetrahydrofuran (oxolane), special ether group: THF[CH2O]</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">24</oasis:entry>
         <oasis:entry colname="col2" align="left">[CX4;H3][OH0;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">ether <inline-formula><mml:math id="M128" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">25</oasis:entry>
         <oasis:entry colname="col2" align="left">[$([CX4;H2]([A!O]))][OH0;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">ether <inline-formula><mml:math id="M129" display="inline"><mml:mrow class="chem"><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo></mml:mrow></mml:math></inline-formula>; first prefer a matching option where the C is not bonded to an OH group</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<table-wrap id="T1c" specific-use="star"><label>Table 1</label><caption><p id="d2e3000">Continued.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="6.5cm"/>
     <oasis:colspec colnum="3" colname="col3" align="justify" colwidth="9cm"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Key</oasis:entry>
         <oasis:entry colname="col2" align="left">SMARTS</oasis:entry>
         <oasis:entry colname="col3" align="left">Description and remarks</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">25</oasis:entry>
         <oasis:entry colname="col2" align="left">[CX4;H2][OH0;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">ether <inline-formula><mml:math id="M130" display="inline"><mml:mrow class="chem"><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">26</oasis:entry>
         <oasis:entry colname="col2" align="left">[$([CX4;H1]([A!O])[A!O])][OH0;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">ether <inline-formula><mml:math id="M131" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">CHO</mml:mi><mml:mo>-</mml:mo></mml:mrow></mml:math></inline-formula>; first prefer a matching option where the C is not bonded to an OH group</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">26</oasis:entry>
         <oasis:entry colname="col2" align="left">[CX4;H1,H0][OH0;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">ether <inline-formula><mml:math id="M132" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">CHO</mml:mi><mml:mo>-</mml:mo></mml:mrow></mml:math></inline-formula>, as exception also for ether <inline-formula><mml:math id="M133" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo></mml:mrow></mml:math></inline-formula> (with zero H)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">5</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH2]=[CH1]</oasis:entry>
         <oasis:entry colname="col3" align="left">alkene <inline-formula><mml:math id="M134" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="normal">CH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">6</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH1]=[CH1]</oasis:entry>
         <oasis:entry colname="col3" align="left">alkene <inline-formula><mml:math id="M135" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">CH</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="normal">CH</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">7</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH2]=[CH0]</oasis:entry>
         <oasis:entry colname="col3" align="left">alkene <inline-formula><mml:math id="M136" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">8</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH1]=[CH0]</oasis:entry>
         <oasis:entry colname="col3" align="left">alkene <inline-formula><mml:math id="M137" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">CH</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">70</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH0]=[CH0]</oasis:entry>
         <oasis:entry colname="col3" align="left">alkene <inline-formula><mml:math id="M138" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">149</oasis:entry>
         <oasis:entry colname="col2" align="left">[$([CX4;H3]([OH1X2]))]</oasis:entry>
         <oasis:entry colname="col3" align="left"><inline-formula><mml:math id="M139" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> alkyl attached to OH (while not counting OH)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">150</oasis:entry>
         <oasis:entry colname="col2" align="left">[$([CX4;H2]([OH1X2]))]</oasis:entry>
         <oasis:entry colname="col3" align="left"><inline-formula><mml:math id="M140" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> alkyl attached to OH (while not counting OH)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">151</oasis:entry>
         <oasis:entry colname="col2" align="left">[$([CX4;H1]([OH1X2]))]</oasis:entry>
         <oasis:entry colname="col3" align="left"><inline-formula><mml:math id="M141" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> alkyl attached to OH (while not counting OH)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">152</oasis:entry>
         <oasis:entry colname="col2" align="left">[$([CX4;H0]([OH1X2]))]</oasis:entry>
         <oasis:entry colname="col3" align="left">C alkyl attached to OH (while not counting OH)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">17</oasis:entry>
         <oasis:entry colname="col2" align="left">[cX3][OH1;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">aromatic carbon alcohol ACH subgroup (phenol)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">153</oasis:entry>
         <oasis:entry colname="col2" align="left">[OH1;X2;A]</oasis:entry>
         <oasis:entry colname="col3" align="left">OH group; must always be attached to aliphatic carbon</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">9</oasis:entry>
         <oasis:entry colname="col2" align="left">[cH1]</oasis:entry>
         <oasis:entry colname="col3" align="left">aromatic hydrocarbon ACH subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">10</oasis:entry>
         <oasis:entry colname="col2" align="left">[cH0]</oasis:entry>
         <oasis:entry colname="col3" align="left">aromatic hydrocarbon AC subgroup</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">1</oasis:entry>
         <oasis:entry colname="col2" align="left">[CX4;H3,H4]</oasis:entry>
         <oasis:entry colname="col3" align="left"><inline-formula><mml:math id="M142" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> standard alkyl; as exception also for <inline-formula><mml:math id="M143" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">2</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH2]</oasis:entry>
         <oasis:entry colname="col3" align="left"><inline-formula><mml:math id="M144" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> standard alkyl</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">3</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH1]</oasis:entry>
         <oasis:entry colname="col3" align="left"><inline-formula><mml:math id="M145" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> standard alkyl</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">4</oasis:entry>
         <oasis:entry colname="col2" align="left">[CH0]</oasis:entry>
         <oasis:entry colname="col3" align="left">C standard alkyl</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">2502</oasis:entry>
         <oasis:entry colname="col2" align="left">[OH0;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">exception: map ether oxygen without carbon as ether <inline-formula><mml:math id="M146" display="inline"><mml:mrow class="chem"><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo></mml:mrow></mml:math></inline-formula> minus <inline-formula><mml:math id="M147" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">2501</oasis:entry>
         <oasis:entry colname="col2" align="left">[oH0;X2]</oasis:entry>
         <oasis:entry colname="col3" align="left">exception: map aromatic oxygen without carbon as ether <inline-formula><mml:math id="M148" display="inline"><mml:mrow class="chem"><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="normal">O</mml:mi><mml:mo>-</mml:mo></mml:mrow></mml:math></inline-formula> minus <inline-formula><mml:math id="M149" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">2002</oasis:entry>
         <oasis:entry colname="col2" align="left">[OX1;H0]</oasis:entry>
         <oasis:entry colname="col3" align="left">exception: map carbonyl <inline-formula><mml:math id="M150" display="inline"><mml:mrow class="chem"><mml:mo>=</mml:mo><mml:mi mathvariant="normal">O</mml:mi></mml:mrow></mml:math></inline-formula> as aldehyde group minus the <inline-formula><mml:math id="M151" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> group</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>2D lumping framework</title>
      <p id="d2e3581">We implemented a new variant of a two-dimensional product lumping framework with the aim to (1) categorize and visualize a representation of all oxidation and fragmentation products from an organic aerosol system at a given point in time and (2) to enable an objective, yet adjustable, selection of surrogate species for a reduced-complexity representation of the system. We constructed a 2D space for mapping the entire aerosol component system using the logarithm of the pure-component vapour pressure, <inline-formula><mml:math id="M152" display="inline"><mml:mrow><mml:msub><mml:mi>log⁡</mml:mi><mml:mn mathvariant="normal">10</mml:mn></mml:msub><mml:mo>[</mml:mo><mml:msubsup><mml:mi>p</mml:mi><mml:mi>j</mml:mi><mml:mo>∘</mml:mo></mml:msubsup><mml:mo>/</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mrow class="unit"><mml:mi mathvariant="normal">Pa</mml:mi></mml:mrow><mml:mo>)</mml:mo><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, as the volatility dimension (<inline-formula><mml:math id="M153" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axis). This choice is similar to that of 1D VBS models and several 2D VBS variants, which typically either use the pure-component saturation vapour concentration (<inline-formula><mml:math id="M154" display="inline"><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mi>j</mml:mi><mml:mo>∘</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> in units of <inline-formula><mml:math id="M155" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) or the effective saturation concentration (<inline-formula><mml:math id="M156" display="inline"><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mi>j</mml:mi><mml:mo>⋆</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> in units of <inline-formula><mml:math id="M157" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) on a logarithmic scale as volatility dimension <xref ref-type="bibr" rid="bib1.bibx16 bib1.bibx17" id="paren.51"/>. Assuming the ideal gas law to apply, pure-component vapour pressures and pure-component saturation concentrations can be inter-converted via <xref ref-type="bibr" rid="bib1.bibx69" id="paren.52"/>

            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M158" display="block"><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mi>j</mml:mi><mml:mo>∘</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>p</mml:mi><mml:mi>j</mml:mi><mml:mo>∘</mml:mo></mml:msubsup><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>⋅</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mn mathvariant="normal">9</mml:mn></mml:msup><mml:mspace linebreak="nobreak" width="0.33em"/><mml:mo>[</mml:mo><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">kg</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo>]</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

          Here, <inline-formula><mml:math id="M159" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the molar mass (<inline-formula><mml:math id="M160" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">kg</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">mol</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) and <inline-formula><mml:math id="M161" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula> the universal gas constant (<inline-formula><mml:math id="M162" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">J</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">mol</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">K</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>). Several metrics exist for representations of an organic molecule's effective polarity, including the elemental <inline-formula><mml:math id="M163" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio and the average oxidation state of carbon, denoted by <inline-formula><mml:math id="M164" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx33" id="paren.53"/>. In the case of atmospheric organics consisting of the elements carbon, hydrogen, nitrogen and oxygen, typically the approximation <inline-formula><mml:math id="M165" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub><mml:mo>≈</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>⋅</mml:mo><mml:mo>(</mml:mo><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow><mml:mo>)</mml:mo><mml:mi mathvariant="normal">−</mml:mi><mml:mrow class="chem"><mml:mi mathvariant="normal">H</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5</mml:mn><mml:mo>⋅</mml:mo><mml:mo>(</mml:mo><mml:mrow class="chem"><mml:mi mathvariant="normal">N</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> applies <xref ref-type="bibr" rid="bib1.bibx33" id="paren.54"/>. Our framework offers those metrics as optional choices, yet our preferred choice for the polarity axis (<inline-formula><mml:math id="M166" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis) of the lumping framework is to use a metric based on the logarithm of an activity coefficient ratio (ACR) (as detailed in Sect. <xref ref-type="sec" rid="Ch1.S2.SS2.SSS1"/> and defined by Eq. <xref ref-type="disp-formula" rid="Ch1.E5"/>).</p>
      <p id="d2e3925">In the context of gas–particle partitioning in aerosol systems as a main application of the 2D lumping framework, one way to guide appropriate choices for the two dimensions of the framework is to consider the main factors governing absorptive gas–particle partitioning <xref ref-type="bibr" rid="bib1.bibx42" id="paren.55"/>. <xref ref-type="bibr" rid="bib1.bibx72" id="text.56"/> derived that for equilibrium gas–particle partitioning involving an ideal gas phase and a single (liquid) condensed phase, the following relationship must hold for <inline-formula><mml:math id="M167" display="inline"><mml:mrow><mml:msubsup><mml:mi>K</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mtext>PM</mml:mtext><mml:mo>,</mml:mo><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, the equilibrium partitioning coefficient on a molar basis:

            <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M168" display="block"><mml:mrow><mml:msubsup><mml:mi>K</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mtext>PM</mml:mtext><mml:mo>,</mml:mo><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mi>j</mml:mi><mml:mtext>PM</mml:mtext></mml:msubsup></mml:mrow><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mi>R</mml:mi><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:msubsup><mml:mi>p</mml:mi><mml:mi>j</mml:mi><mml:mo>∘</mml:mo></mml:msubsup></mml:mrow></mml:mfrac></mml:mstyle><mml:mi>R</mml:mi><mml:mi>T</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

          Here, <inline-formula><mml:math id="M169" display="inline"><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mi>j</mml:mi><mml:mtext>PM</mml:mtext></mml:msubsup></mml:mrow></mml:math></inline-formula> is the mole fraction of component <inline-formula><mml:math id="M170" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula> in the liquid particulate matter (PM) phase, <inline-formula><mml:math id="M171" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the partial pressure in the gas phase (ideal gas assumption), and <inline-formula><mml:math id="M172" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> is the activity coefficient in the liquid phase (superscript <inline-formula><mml:math id="M173" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> denotes mole-fraction-based quantities). While <inline-formula><mml:math id="M174" display="inline"><mml:mrow><mml:msubsup><mml:mi>p</mml:mi><mml:mi>j</mml:mi><mml:mo>∘</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> indicates the importance of a component's saturation vapour pressure, <inline-formula><mml:math id="M175" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> indicates the influence of nonideal mixing in the liquid phase. The degree of nonideal mixing depends both on the molecular properties of <inline-formula><mml:math id="M176" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula> as well as its interactions with all other molecules in solution. In this context, a polar organic compound present in an aqueous phase will exhibit a lower activity coefficient than a nonpolar compound. Hence, activity coefficients offer a way to express a component's affinity for less or more polar liquid media. This supports the choice of proxies for polarity as the second dimension of our 2D framework. We note that for aerosol systems undergoing liquid–liquid phase separation (LLPS), Eq. (<xref ref-type="disp-formula" rid="Ch1.E3"/>) holds when appropriately modified, such as by introducing a phase-abundance-weighted effective activity coefficient <xref ref-type="bibr" rid="bib1.bibx68" id="paren.57"/>.</p>
<sec id="Ch1.S2.SS2.SSS1">
  <label>2.2.1</label><title>Activity coefficient ratio as polarity metric</title>
      <p id="d2e4145">The idea of using an activity coefficient ratio (ACR) as a polarity metric is inspired by liquid–liquid equilibrium (LLE) thermodynamics. In a macroscopic LLE state, the chemical potential of a component present in both coexisting phases must be identical. Consequently, the way a component <inline-formula><mml:math id="M177" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula> partitions between two liquid phases, <inline-formula><mml:math id="M178" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M179" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula>, can be described using an equilibrium partitioning constant <inline-formula><mml:math id="M180" display="inline"><mml:mrow><mml:msubsup><mml:mi>K</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx70 bib1.bibx60" id="paren.58"/>:

              <disp-formula id="Ch1.E4" content-type="numbered"><label>4</label><mml:math id="M181" display="block"><mml:mrow><mml:msubsup><mml:mi>K</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mi>j</mml:mi><mml:mi mathvariant="italic">α</mml:mi></mml:msubsup></mml:mrow><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mi>j</mml:mi><mml:mi mathvariant="italic">β</mml:mi></mml:msubsup></mml:mrow></mml:mfrac></mml:mstyle><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mover><mml:mo>=</mml:mo><mml:mtext>LLE</mml:mtext></mml:mover><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mo>,</mml:mo><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>,</mml:mo><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

            Here, <inline-formula><mml:math id="M182" display="inline"><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mi>j</mml:mi><mml:mi mathvariant="italic">α</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M183" display="inline"><mml:mrow><mml:msubsup><mml:mi>x</mml:mi><mml:mi>j</mml:mi><mml:mi mathvariant="italic">β</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> are mole fractions in phases <inline-formula><mml:math id="M184" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M185" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula>; <inline-formula><mml:math id="M186" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>,</mml:mo><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M187" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mo>,</mml:mo><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> are the activity coefficients in those phases under LLE conditions. Equation (<xref ref-type="disp-formula" rid="Ch1.E4"/>) indicates that knowledge of the <inline-formula><mml:math id="M188" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mo>,</mml:mo><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>/</mml:mo><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>,</mml:mo><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> ratio provides an accurate representation of the thermodynamic phase preference of organic components by quantifying the relative enrichment or depletion of a component via the equivalent mole fraction ratio, with a value greater than 1 indicating enrichment in phase <inline-formula><mml:math id="M189" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>. Since activity coefficients depend on a phase's mixture composition established by all components, the values are sensitive to various forms of molecular interactions (e.g., dipole–dipole, dispersion) and reflect the chemical affinity for a phase. That is, activity coefficients are influenced by molecular structure properties, such as whether an oxygen-bearing functional group is more polar (e.g., hydroxyl, carboxyl) or less polar (e.g., ether, ester) and how it interacts with other organic and inorganic components present. Thus, the ACR encompasses more detailed functional group characteristics than simpler metrics like the <inline-formula><mml:math id="M190" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio.</p>
      <p id="d2e4420">In this work, the ACR of a component <inline-formula><mml:math id="M191" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula> is based on a prediction of the component's activity coefficient when present as a dilute solute in a weakly polar organic reference solvent, here we use 1,2-hexanediol, relative to that of being a dilute solute in a strongly polar reference solvent, here water. This metric is therefore similar to the well-established octanol–water partitioning coefficient <xref ref-type="bibr" rid="bib1.bibx32 bib1.bibx66" id="paren.59"><named-content content-type="pre">e.g.,</named-content></xref>, but our choice of organic reference solvent differs. We elected to use a slightly more polar organic than octanol, with 1,2-hexanediol serving as a more typical representation of an organic-rich phase medium in aerosols.</p>
      <p id="d2e4435">While inspired by the LLE isoactivity condition (Eq. <xref ref-type="disp-formula" rid="Ch1.E4"/>), the procedure of obtaining our ACR metric differs from that of solving a ternary solute–solvent-1–solvent-2 LLE problem. This is because in the ternary LLE system the present solvents may be partially miscible, while we choose to compute the activity coefficients of each component independently based on evaluating two binary solute–solvent mixtures. Using binary systems as a gauge is computationally simpler and substantially faster. Specifically, our polarity metric ACR is defined by the following dimensionless quantity:

              <disp-formula id="Ch1.E5" content-type="numbered"><label>5</label><mml:math id="M192" display="block"><mml:mrow><mml:msub><mml:mi>log⁡</mml:mi><mml:mn mathvariant="normal">10</mml:mn></mml:msub><mml:mfenced close="]" open="["><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mtext>hex</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">w</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

            Here, <inline-formula><mml:math id="M193" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mtext>hex</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> is the (predicted) mole-fraction-based activity coefficient of solute <inline-formula><mml:math id="M194" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula> in a binary mixture with the solvent 1,2-hexanediol, in which the mass fraction of <inline-formula><mml:math id="M195" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula> is <inline-formula><mml:math id="M196" display="inline"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.01</mml:mn></mml:mrow></mml:math></inline-formula> (therefore, <inline-formula><mml:math id="M197" display="inline"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mtext>hex</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.99</mml:mn></mml:mrow></mml:math></inline-formula>). Analogously, for <inline-formula><mml:math id="M198" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">w</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula>, except for the solvent being water. These activity coefficients are typically evaluated at a reference temperature of 298.15 <inline-formula><mml:math id="M199" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">K</mml:mi></mml:mrow></mml:math></inline-formula>. Given this definition, one can think of these two separate activity coefficients, as well as their ratio, as pure-component properties. In principle, those values would only need to be computed once for each component (each SMILES code) and could then be saved and retrieved from a look-up table.</p>
      <p id="d2e4591">The selection of the two reference solvents provides a robust basis for characterizing the behaviour of organic components across a wide range of polarities spanning several orders of magnitude (see Sect. <xref ref-type="sec" rid="Ch1.S3"/>). In our framework, the AIOMFAC model is employed to compute activity coefficients for Eq. (<xref ref-type="disp-formula" rid="Ch1.E5"/>). Since AIOMFAC is run only for binary, single-phase systems, the calculations are fast; they are comparable or faster than the pure-component vapour pressure predictions with UManSysProp. Of note, for a system consisting of tens of thousands of organic components, computing activity coefficients of all species simultaneously using the related multicomponent mixtures (containing as many components) with AIOMFAC, is prohibitively slow due to the many functional groups present, all the possible group–group and molecule–molecule interactions that need to be summed over and the associated computer memory requirements. In contrast, computing the ACR values based on the binary solute–solvent mixtures for all those components is fast and small in memory footprint.</p>
</sec>
<sec id="Ch1.S2.SS2.SSS2">
  <label>2.2.2</label><title>Methods for surrogate selection</title>
      <p id="d2e4607">We implemented four distinct methods to analyze organic aerosol data in the 2D lumping framework and to objectively select a set of surrogate components. These methods are designed to reduce the complexity of the gas–aerosol system while preserving important physicochemical aspects, such as the conservation of total mass concentration and the consideration of the system's diversity in terms of volatility and polarity ranges. The four methods are: (a) the grid cell midpoint method, (b) the grid cell medoid method, (c) the grid cell mass-weighted medoid method, and (d) the <inline-formula><mml:math id="M200" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means-based medoids method. The latter is a grid-independent 2D clustering method.</p>
      <p id="d2e4617">The 2D space is subdivided into a number of grid cells (or clusters) based on the targeted reduction in system complexity. This division is accomplished by specifying the number of rows and columns, followed by identification of the component coordinates that set the upper and lower coordinate limits within the range that should be gridded (see use of a volatility threshold below). The grid resolution is adjustable, but typical choices range from <inline-formula><mml:math id="M201" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M202" display="inline"><mml:mrow><mml:mn mathvariant="normal">20</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula> in terms of setting the number of grid subdivisions and associated <inline-formula><mml:math id="M203" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>×</mml:mo><mml:msub><mml:mi>m</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> grid cells.</p>
      <p id="d2e4662">A crucial aspect of our methodology is the primary focus of the gridded domain on the compound volatility ranges of interest for substantial partitioning of organics to the particle phase. These volatility ranges includes semi-volatile (SVOC), low-volatility (LVOC) and extremely low-volatility (ELVOC) organic compounds, while intermediate volatility (IVOC) and volatile organic compounds (VOC) are too volatile under typical tropospheric aerosol mass loading conditions <xref ref-type="bibr" rid="bib1.bibx18 bib1.bibx69" id="paren.60"/>. As such, an adjustable high-volatility vapour pressure threshold (<inline-formula><mml:math id="M204" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mtext>high</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>) is introduced, typically set to <inline-formula><mml:math id="M205" display="inline"><mml:mrow><mml:msub><mml:mi>log⁡</mml:mi><mml:mn mathvariant="normal">10</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mtext>high</mml:mtext></mml:msub><mml:mo>/</mml:mo><mml:mo>[</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mrow class="unit"><mml:mi mathvariant="normal">Pa</mml:mi></mml:mrow><mml:mo>]</mml:mo><mml:mo>)</mml:mo><mml:mo>&gt;</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>. Regardless of the polarity range, all compounds with <inline-formula><mml:math id="M206" display="inline"><mml:mrow><mml:msubsup><mml:mi>p</mml:mi><mml:mi>j</mml:mi><mml:mo>∘</mml:mo></mml:msubsup><mml:mo>&gt;</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mtext>high</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> are identified and lumped into a single high-volatility surrogate component using the mass-weighted medoid method defined below. Examples for this are shown in Sect. <xref ref-type="sec" rid="Ch1.S3"/>, e.g., Figs. <xref ref-type="fig" rid="F7"/> and <xref ref-type="fig" rid="F8"/>. Since the <inline-formula><mml:math id="M207" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mtext>high</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> threshold value is an input parameter, this special VOC lumping step can also be avoided entirely by setting a very high threshold value. Similarly, a low-volatility threshold and related lumping of LVOC and ELVOC compounds into a single (or a few), quasi-nonvolatile surrogate could be introduced to focus the majority of surrogates on representing the SVOC range at comparably higher resolution. However, in this work we have not included a low-volatility threshold since we are interested in a diverse surrogate-based representation of all PM-relevant volatility ranges to better resolve potential trends in polarity with decreasing volatility accross the SVOC, LVOC, and ELVOC ranges.</p>
      <p id="d2e4756">Our lumping process adheres to the principle of mass conservation for the entire system. This is achieved by aggregating the mass concentrations of components within each grid cell or cluster into the selected surrogate component. This approach differs from methods that choose to conserve the number of carbon atoms during lumping. The surrogate component selection process for each of the four methods is visually depicted in Fig. <xref ref-type="fig" rid="F6"/>. The shown examples illustrate how one surrogate component is chosen within a single grid cell (or cluster in case of <inline-formula><mml:math id="M208" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means).</p>

      <fig id="F6" specific-use="star"><label>Figure 6</label><caption><p id="d2e4771">Illustration of the four introduced methods for selecting a single surrogate component (marked in red) within a specific grid cell or cluster of the 2D space. <inline-formula><mml:math id="M209" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">l</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M210" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">u</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> indicate the lower and upper <inline-formula><mml:math id="M211" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axis values of the grid cell; analogously for <inline-formula><mml:math id="M212" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">l</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M213" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">u</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. <bold>(a)</bold> The midpoint method selects a surrogate based on the component closest to the geometric centre of the grid cell, denoted by the dotted lines. <bold>(b)</bold> The medoid method selects the component with the smallest cumulative Euclidean distance from all other components. <bold>(c)</bold> The mass-weighted medoid method prioritizes components with higher mass concentrations. It determines the most representative surrogate as the one closest to the centre of mass established by the components in the cell. <bold>(d)</bold> The <inline-formula><mml:math id="M214" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means-based medoid method is independent of a grid. It iteratively assigns components to clusters, then identifies a cluster's optimal surrogate as the nonzero-mass component closest to the cluster centre (indicated by the blue <inline-formula><mml:math id="M215" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> symbol). Individual clusters are not confined to predetermined grid cells <bold>(</bold>in <bold>d</bold> denoted by dotted outline<bold>)</bold>; here, the four points are assumed to form a specific cluster <inline-formula><mml:math id="M216" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026-f06.png"/>

          </fig>

      <fig id="F7" specific-use="star"><label>Figure 7</label><caption><p id="d2e4877">The toluene SOA system shown in the 2D space of activity coefficient ratio versus saturation vapour pressure at 298 <inline-formula><mml:math id="M217" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">K</mml:mi></mml:mrow></mml:math></inline-formula>. <bold>(a)</bold> The full set of <inline-formula><mml:math id="M218" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">68</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">000</mml:mn></mml:mrow></mml:math></inline-formula> components derived from a simulation based on the GECKO-A mechanism with an overlaid <inline-formula><mml:math id="M219" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> grid (dotted) which is used by the different selection methods to determine grid cell surrogates. <bold>(b–d)</bold> Surrogates selected by <bold>(b)</bold> the medoid method, <bold>(c)</bold> the midpoint method and <bold>(d)</bold> the weighted medoid method. The top horizontal axis indicates the approximate pure-component saturation vapour concentration corresponding to the vapour pressure axis (see details in Sect. S2).</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026-f07.png"/>

          </fig>

      <fig id="F8" specific-use="star"><label>Figure 8</label><caption><p id="d2e4937">2D space representations of the toluene-derived SOA system at 298 <inline-formula><mml:math id="M220" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">K</mml:mi></mml:mrow></mml:math></inline-formula> using the <inline-formula><mml:math id="M221" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering and surrogate selection method. <bold>(a)</bold> 49 clusters plus 1 high-volatility cluster (pink data points) are shown with individual cluster members identified by the same colour. The cluster centres are denoted by <inline-formula><mml:math id="M222" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> and the corresponding selected cluster surrogates by the <inline-formula><mml:math id="M223" display="inline"><mml:mo>⋄</mml:mo></mml:math></inline-formula> symbols. <bold>(b)</bold> The lumped mass concentrations of the surrogate components selected by the mass-weighted <inline-formula><mml:math id="M224" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method (see Table S5 for related surrogate data).</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026-f08.png"/>

          </fig>

</sec>
<sec id="Ch1.S2.SS2.SSS3">
  <label>2.2.3</label><title>Grid cell midpoint method</title>
      <p id="d2e4998">The midpoint method operates on the simple principle that the component closest to a grid cell's midpoint coordinates should be a reasonable choice representing all other components within the same cell. This method involves determining the normalized distance of each grid cell component's location to the grid cell midpoint. The squared Euclidean distance, <inline-formula><mml:math id="M225" display="inline"><mml:mrow><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mtext>mid</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>, of each component <inline-formula><mml:math id="M226" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> within a grid cell is calculated by

              <disp-formula id="Ch1.E6" content-type="numbered"><label>6</label><mml:math id="M227" display="block"><mml:mrow><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mtext>mid</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:msup><mml:mfenced open="[" close="]"><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mtext>mid</mml:mtext></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mtext>range</mml:mtext></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>⋅</mml:mo><mml:mi mathvariant="italic">χ</mml:mi></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mfenced close="]" open="["><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mtext>mid</mml:mtext></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mtext>range</mml:mtext></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

            Here, <inline-formula><mml:math id="M228" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mtext>mid</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M229" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mtext>mid</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> denote the midpoint coordinates of the grid cell, <inline-formula><mml:math id="M230" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M231" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are the coordinates of component <inline-formula><mml:math id="M232" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M233" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mtext>range</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M234" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mtext>range</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> denote the magnitudes of the <inline-formula><mml:math id="M235" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M236" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axes ranges of the 2D space over which the lumping grid was placed. They normalize the distances expressed along the two axes. The <inline-formula><mml:math id="M237" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>-to-<inline-formula><mml:math id="M238" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> aspect ratio <inline-formula><mml:math id="M239" display="inline"><mml:mi mathvariant="italic">χ</mml:mi></mml:math></inline-formula> is introduced to scale the normalized length scales of the grid space. That is, <inline-formula><mml:math id="M240" display="inline"><mml:mi mathvariant="italic">χ</mml:mi></mml:math></inline-formula> is the (prescribed) multiplying factor of the normalized, dimensionless value range along the <inline-formula><mml:math id="M241" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axis that is regarded as equivalent to the normalized, dimensionless range along the <inline-formula><mml:math id="M242" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis. To express the importance of volatility in gas–particle partitioning, we elected to set a <inline-formula><mml:math id="M243" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> aspect ratio as the default choice. If the number of grid lines along the <inline-formula><mml:math id="M244" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>-dimension is already set as twice that of the <inline-formula><mml:math id="M245" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula>-dimension (e.g., 8 by 4), then <inline-formula><mml:math id="M246" display="inline"><mml:mi mathvariant="italic">χ</mml:mi></mml:math></inline-formula> will account for this (<inline-formula><mml:math id="M247" display="inline"><mml:mrow><mml:mi mathvariant="italic">χ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> in that case). In this study, the <inline-formula><mml:math id="M248" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>-coordinate refers to the logarithm of the pure-component vapour pressure (<inline-formula><mml:math id="M249" display="inline"><mml:mrow><mml:msub><mml:mi>log⁡</mml:mi><mml:mn mathvariant="normal">10</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:msubsup><mml:mi>p</mml:mi><mml:mi>i</mml:mi><mml:mo>∘</mml:mo></mml:msubsup><mml:mo>/</mml:mo><mml:mo>[</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mrow class="unit"><mml:mi mathvariant="normal">Pa</mml:mi></mml:mrow><mml:mo>]</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>), while the <inline-formula><mml:math id="M250" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula>-coordinate corresponds to one of the proxies for polarity (ACR, <inline-formula><mml:math id="M251" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio, <inline-formula><mml:math id="M252" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>). The value ranges of the components along the <inline-formula><mml:math id="M253" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M254" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> coordinates may differ substantially, possibly by several orders of magnitude. Therefore, we chose to normalize and scale the magnitudes of the <inline-formula><mml:math id="M255" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M256" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axes ranges relative to each other, as shown by Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>). The <inline-formula><mml:math id="M257" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mtext>range</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M258" display="inline"><mml:mrow><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mo>max⁡</mml:mo></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mo>min⁡</mml:mo></mml:msub><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula>) is determined by the maximum and minimum <inline-formula><mml:math id="M259" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>-coordinates from the set of all system components that contribute nonzero mass concentration and belong to the regular lumping space, i.e., those not lumped to the single high-volatility surrogate compound. The <inline-formula><mml:math id="M260" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mtext>range</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> is determined analogously. Finally, within each grid cell containing at least one component, the component of minimum <inline-formula><mml:math id="M261" display="inline"><mml:mrow><mml:msubsup><mml:mi>D</mml:mi><mml:mtext>mid</mml:mtext><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> is selected as surrogate and the mass concentrations of all other grid cell members are lumped (additively) into the surrogate. Except for the method-specific expressions for computing a component's distance metric within a grid cell, the variable definitions and steps outlined for this method also apply to the other methods described in the following sub-sections.</p>
      <p id="d2e5456">The grid cell midpoint method is straightforward to understand and implement. It has the advantage of providing unbiased coverage of the 2D space by ensuring that surrogate components are nearly evenly distributed across the grid domain. The approach demonstrates scalability, as it can be easily applied to systems with a large number of components, making it suitable for complex mixture analyses. Additionally, the method offers computational efficiency since the calculation of midpoints and distances is relatively fast, even for large datasets. However, this method may not always select the most representative component, especially when a low-resolution grid is applied in which the few cells may exhibit unevenly distributed components (e.g., all components clustered around the lower left corner of a cell). This method is visualized in Fig. <xref ref-type="fig" rid="F6"/>a.</p>
</sec>
<sec id="Ch1.S2.SS2.SSS4">
  <label>2.2.4</label><title>Grid cell medoid method</title>
      <p id="d2e5469">The grid cell medoid method operates by selecting the medoid component of each grid cell as the surrogate. The medoid member is the component in closest cumulative proximity to all other components of the same cell. Therefore, the main step involves calculating the cumulative squared Euclidean distance of each component from all other components of the grid cell, as follows:

              <disp-formula id="Ch1.E7" content-type="numbered"><label>7</label><mml:math id="M262" display="block"><mml:mrow><mml:mi mathvariant="normal">Σ</mml:mi><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mtext>med</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:msup><mml:mfenced open="[" close="]"><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mtext>range</mml:mtext></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>⋅</mml:mo><mml:mi mathvariant="italic">χ</mml:mi></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mfenced close="]" open="["><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mtext>range</mml:mtext></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

            Here, index <inline-formula><mml:math id="M263" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula> covers the <inline-formula><mml:math id="M264" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:math></inline-formula> components of the grid cell, with other variables as defined for Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>).</p>
      <p id="d2e5590">The medoid method offers potential advantages over the midpoint method. First, it may provide better representation by selecting a surrogate that is located close to most other components, which is especially of importance in the case of a grid space with only a few large cells filled with geometrically uneven component distribution. Second, the medoid method demonstrates robustness to outliers, being less influenced by extreme values or uneven clustering of points within a cell compared to the midpoint. In comparison to the midpoint method, the medoid method is computationally more costly, especially for cells that contain a large number of components. However, in practice this is rarely a concern. Figure <xref ref-type="fig" rid="F6"/>b illustrates the grid cell medoid method.</p>
</sec>
<sec id="Ch1.S2.SS2.SSS5">
  <label>2.2.5</label><title>Grid cell mass-weighted medoid method</title>
      <p id="d2e5603">The mass-weighted medoid method prioritizes surrogate selection based on a combination of a component's importance, as measured by its mass concentration, and the cumulative distance from other components of a grid cell, as in the unweighted medoid method. Using such a mass-weighted approach is particularly useful in the case of systems consisting of many components, yet with the majority of the mass contributed by a small minority of molecules. The mass-weighting ensures that the important components are more likely to be selected as surrogates, so that their particular molecular properties are then also more appropriately considered in subsequent equilibrium partitioning computations. The core of this method lies in the calculation of the squared distance between a component and the coordinates of the centre of mass of the cell, <inline-formula><mml:math id="M265" display="inline"><mml:mrow><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mtext>wmed</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>, as follows:

              <disp-formula id="Ch1.E8" content-type="numbered"><label>8</label><mml:math id="M266" display="block"><mml:mrow><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mtext>wmed</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:msup><mml:mfenced close="]" open="["><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mtext>mc</mml:mtext></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mtext>range</mml:mtext></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>⋅</mml:mo><mml:mi mathvariant="italic">χ</mml:mi></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mfenced close="]" open="["><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mtext>mc</mml:mtext></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mtext>range</mml:mtext></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

            with

              <disp-formula id="Ch1.E9" content-type="numbered"><label>9</label><mml:math id="M267" display="block"><mml:mtable class="split" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mtext>mc</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:msub><mml:mi>w</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>⋅</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mtext>mc</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:msub><mml:mi>w</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>⋅</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mtext mathvariant="normal">and</mml:mtext></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:msubsup><mml:msub><mml:mi>C</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

            Here, <inline-formula><mml:math id="M268" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mtext>mc</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M269" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mtext>mc</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> are the coordinates of the centre of mass of the grid cell under consideration. These coordinates are determined by computing the mass-weighted average coordinates of the <inline-formula><mml:math id="M270" display="inline"><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:math></inline-formula> grid cell components, as shown by Eq. (<xref ref-type="disp-formula" rid="Ch1.E9"/>), only requiring information about the cell's components of nonzero mass concentration. <inline-formula><mml:math id="M271" display="inline"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M272" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> represent a component's grid-cell mass fraction and mass concentration, respectively. Components of large mass concentration relative to others in a grid cell have the effect of “pulling” the centre of mass toward their location, thereby benefiting from a smaller <inline-formula><mml:math id="M273" display="inline"><mml:mrow><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mtext>wmed</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>. Among the grid cell components of nonzero mass concentration, the component of minimum <inline-formula><mml:math id="M274" display="inline"><mml:mrow><mml:msubsup><mml:mi>D</mml:mi><mml:mrow><mml:mtext>wmed</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> value is identified as the surrogate. Components of zero mass concentration are excluded from the calculation and from being selected as surrogate.</p>
      <p id="d2e5926">This method strongly favours the selected surrogate components to be from the subset of most abundant species in the gas–aerosol system. The mass-weighted medoid method offers several advantages. The method provides improved representation of dominant components. The resulting lumped system is likely to reflect bulk aerosol properties more accurately, especially when only a low-resolution grid is applied. To aid in understanding the interplay between mass concentration and spatial distribution in the surrogate selection process, Fig. <xref ref-type="fig" rid="F6"/>c exemplifies how the component of highest mass concentration, which is also closest to the centre of mass within the grid cell, is selected as the surrogate component.</p>
</sec>
<sec id="Ch1.S2.SS2.SSS6">
  <label>2.2.6</label><title><inline-formula><mml:math id="M275" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means-based clustering method</title>
      <p id="d2e5946">The mass-weighted, <inline-formula><mml:math id="M276" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means-based medoid clustering method, for brevity hereafter referred to as the <inline-formula><mml:math id="M277" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method, is employed to generate a predefined number of centroids (<inline-formula><mml:math id="M278" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> clusters) in the 2D space. The clustering process is implemented using a variant of the weighted <inline-formula><mml:math id="M279" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering algorithm; specifically, subroutine kmeans_w_01, from the Fortran 90 implementation provided by <xref ref-type="bibr" rid="bib1.bibx10" id="text.61"/>. The Fortran code is based on the theory, algorithm and existing code from the works by <xref ref-type="bibr" rid="bib1.bibx57" id="text.62"/> and <xref ref-type="bibr" rid="bib1.bibx27" id="text.63"/>. This algorithm iteratively reassigns points (here chemical components) to clusters, minimizing the total energy of the system. Refer to <xref ref-type="bibr" rid="bib1.bibx57" id="text.64"/> and <xref ref-type="bibr" rid="bib1.bibx27" id="text.65"/> for a detailed description of the <inline-formula><mml:math id="M280" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method and its variants.</p>
      <p id="d2e6000">In our approach for surrogate selection with the <inline-formula><mml:math id="M281" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method, we combine the mass-weighted clustering with a final, distance-based surrogate component selection process. Figure <xref ref-type="fig" rid="F6"/>d visualizes an example cluster's surrogate selection. Specifically, <inline-formula><mml:math id="M282" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means returns the coordinates of a predefined number of cluster centres, which usually do not coincide with any actual component's coordinates. Among each cluster's population, we then select the component of nonzero mass concentration that is located closest to the cluster centre coordinates as the cluster's surrogate. Similar to the mass-weighted medoid approach, the distance of each cluster component to the centre is determined by the squared Euclidean distance based on normalized coordinates. In outcome, this is therefore akin to (but algorithmically not identical with) the method of <inline-formula><mml:math id="M283" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-medoids clustering.</p>
      <p id="d2e6026">By selecting surrogate species that are typically centrally located within their clusters and of significant mass concentration, this approach ensures that the set of surrogate components represents the overall physicochemical characteristics of the system well. This approach is particularly valuable in atmospheric chemistry, where reducing the complexity of chemical mechanisms while retaining properties of the most abundant components is crucial, since mass or number concentration of components is a critical factor in understanding subsequent partitioning or chemical reaction behaviour.</p>
      <p id="d2e6030">In our implementation, the clustering process begins by setting the desired number of <inline-formula><mml:math id="M284" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> clusters. The initial cluster centre coordinates are then set based on the grid cell midpoints (since in our code the grid-based methods are run prior to running <inline-formula><mml:math id="M285" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means). For comparison with the grid-based methods, we typically chose the cluster numbers as being equal to the number of populated grid cells. However, the <inline-formula><mml:math id="M286" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering method can also be run independently from the gridded approaches. If a higher number of clusters is set as target than the number of populated grid cells, additional initial cluster centre coordinates are generated using pseudo-random coordinates within the scaled 2D space. This initialization step ensures a broad distribution of initial cluster centres across the normalized 2D space. Components associated with the special high-volatility surrogate are filtered out, since they are marked as being part of a special cluster; these are not considered during the <inline-formula><mml:math id="M287" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering.</p>
      <p id="d2e6061">An innovation of the weighted <inline-formula><mml:math id="M288" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means variant is the incorporation of, in our case, mass-concentration-based weighting during the algorithm's assignment of components to clusters. When <inline-formula><mml:math id="M289" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means computes each cluster's “energy” and iteratively assigns components to a certain cluster, the weights are factored in. As discussed in Sect. <xref ref-type="sec" rid="Ch1.S3"/>, actual example cases indicate that the mass concentrations of components may range over several orders of magnitude. Therefore, the weighting aids in prioritizing components with relatively high mass concentrations as potential cluster surrogates, ensuring that the clustering process is not solely based on spatial proximity. The algorithm considers the entire dataset simultaneously and can effectively handle cases in which data points are not uniformly distributed in a 2D space. While we favour weighted <inline-formula><mml:math id="M290" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering, if desired, one can assign each component the same weight, thereby returning to the non-weighted <inline-formula><mml:math id="M291" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method.</p>
      <p id="d2e6094">Overall, the <inline-formula><mml:math id="M292" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method offers several advantages. The method can identify natural groupings in the data, independent of the arbitrary limits of an imposed grid, potentially leading to more meaningful, unbiased surrogate selection. However, this method is computationally the most costly of the four outlined in this study. Users can adjust the targeted number of <inline-formula><mml:math id="M293" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clusters to strike a balance between model simplicity and resolution of the original data. We note that the computational cost of this method is relatively insensitive to the number of clusters targeted.</p>
</sec>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Results and discussion</title>
      <p id="d2e6121">The primary goals of this study were to develop efficient and practical tools for calculating pure-component properties, activity coefficients and the gas–particle partitioning for complex systems containing a large number of organic species. To this end, we implemented and evaluated a 2D lumping framework to reduce system complexity while maintaining an adjustable level of accuracy. Additionally, we sought to compare the effectiveness of different lumping methods. These objectives were pursued to advance our understanding of organic species behaviour and improve computational efficiency in geoscientific modelling. The S2AS pure-component property prediction tool was implemented in Python. The 2D lumping framework was implemented in modern Fortran. The generated computer programs are a main product of this work. Related code and data are provided via code repositories and version-specific archives (see Sect. <italic>Code and data availability</italic>).</p>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Example systems: <inline-formula><mml:math id="M294" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene and toluene oxidation products</title>
      <p id="d2e6142">To demonstrate the application of the new chain of software tools, we introduce two example systems showcasing a multitude of chemical components and related properties derived from simulations of (1) an <inline-formula><mml:math id="M295" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene ozonolysis system simulated using the MCM (v3.3.1) model and (2) a toluene photo-oxidation system simulated based on a near-explicit GECKO-A mechanism of its gas phase chemistry. The input parameters used for these simulations are listed in Tables S1 and S2 in the Supplement. The system component structures were either directly output or subsequently converted to the SMILES format and the component concentrations retrieved from a selected output time of the simulations. For the examples shown, the output data at the final time of a respective mechanism simulation was used.</p>
      <p id="d2e6152">The GECKO-A toluene photolysis system comprises an extensive array of <inline-formula><mml:math id="M296" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">68</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">000</mml:mn></mml:mrow></mml:math></inline-formula> distinct organic components. Figure <xref ref-type="fig" rid="F7"/> provides a visual representation of the obtained system of oxidation and fragmentation products in the 2D polarity versus volatility space. The complete set of components is shown in Fig. <xref ref-type="fig" rid="F7"/>a (ordered by mass concentration to show the most abundant components on top), while panels (b)–(d) show the application of a <inline-formula><mml:math id="M297" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> volatility <inline-formula><mml:math id="M298" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> polarity (here using ACR) grid to select surrogate components by the three grid-based methods. Figure <xref ref-type="fig" rid="F8"/> shows the corresponding data clustered by the <inline-formula><mml:math id="M299" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method and the pertaining selection of surrogate components. The comparison of the full system and the various surrogate representations demonstrate visually the framework's ability to simplify complex chemical systems. Despite the massive reduction in the number of species, our lumping framework successfully preserves key physicochemical characteristics of the system of relevance for SOA formation predictions.</p>
      <p id="d2e6201">The 2D representation of the <inline-formula><mml:math id="M300" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene-derived components is shown in Fig. <xref ref-type="fig" rid="F9"/>, with panel (a) showing the full set of MCM-derived system components. Panels (b)–(d) of Fig. <xref ref-type="fig" rid="F9"/> show the surrogate selections based on a <inline-formula><mml:math id="M301" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> grid. Since several grid cells are empty in this example, only 36 surrogates are determined at the chosen grid resolution. Figure <xref ref-type="fig" rid="F10"/> demonstrates the application of the <inline-formula><mml:math id="M302" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method to this system when using ACR as the polarity axis. Table S6 summarizes the surrogate properties from this <inline-formula><mml:math id="M303" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering example.</p>

      <fig id="F9" specific-use="star"><label>Figure 9</label><caption><p id="d2e6247">The <inline-formula><mml:math id="M304" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA system components shown in the 2D space of activity coefficient ratio versus vapour pressure at 298 <inline-formula><mml:math id="M305" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">K</mml:mi></mml:mrow></mml:math></inline-formula>. <bold>(a)</bold> The full MCM-derived 174 components with an overlaid <inline-formula><mml:math id="M306" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> grid (dotted) that is used by the different selection methods to determine grid cell surrogates. <bold>(b–d)</bold> Surrogates selected by <bold>(b)</bold> the medoid method, <bold>(c)</bold> the midpoint method and <bold>(d)</bold> the weighted medoid method.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026-f09.png"/>

        </fig>

      <fig id="F10" specific-use="star"><label>Figure 10</label><caption><p id="d2e6301">2D space representations of the <inline-formula><mml:math id="M307" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA system at 298 <inline-formula><mml:math id="M308" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">K</mml:mi></mml:mrow></mml:math></inline-formula> using the <inline-formula><mml:math id="M309" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering and surrogate selection method. <bold>(a)</bold> 36 clusters are shown with individual cluster members identified by the same colour. The cluster centres are denoted by <inline-formula><mml:math id="M310" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> and the corresponding selected cluster surrogates by the <inline-formula><mml:math id="M311" display="inline"><mml:mo>⋄</mml:mo></mml:math></inline-formula> symbols. Data points coloured pink are members of the special high-volatility cluster. <bold>(b)</bold> The lumped mass concentrations of the surrogate components selected by <inline-formula><mml:math id="M312" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means (see Table S6 for related surrogate data).</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026-f10.jpg"/>

        </fig>


</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Performance of automated property prediction tools</title>
<sec id="Ch1.S3.SS2.SSS1">
  <label>3.2.1</label><title>Functional group identification (S2AS tool)</title>
      <p id="d2e6377">The S2AS tool demonstrated high accuracy in identifying AIOMFAC functional groups for the <inline-formula><mml:math id="M313" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene and toluene oxidation products. The 174 <inline-formula><mml:math id="M314" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene-derived products were mapped to AIOMFAC subgroups without needing any exception treatments, while an average of 0.5 exceptions (special SMARTS mappings) per molecule were encountered in case of the <inline-formula><mml:math id="M315" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">68</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">000</mml:mn></mml:mrow></mml:math></inline-formula> toluene-derived products. The latter exhibited a higher level of nitrogen-containing functional groups, some of which were responsible for most of the exception cases triggered. Given the systematic handling of exception cases, the S2AS tool was successful in mapping all carbon, oxygen and nitrogen atoms to relevant subgroups. This is particularly noteworthy given the structural complexity of many of the oxidation products, which often contain multiple functional groups, branched functionalized chains and ring structures. The S2AS tool is also more reliable than manual classification in cases involving complex molecular structures, for which human experts may occasionally overlook less prominent functional groups or perform inconsistent exception treatments when an imperfect mapping to the available set of AIOMFAC subgroups is present.</p>
      <p id="d2e6407">The S2AS tool showed high computational efficiency in processing large datasets. It successfully processed the list of 174 <inline-formula><mml:math id="M316" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene oxidation product SMILES in less than 0.8 s, and the <inline-formula><mml:math id="M317" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">68</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">000</mml:mn></mml:mrow></mml:math></inline-formula> SMILES from the toluene system in <inline-formula><mml:math id="M318" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> min – a processing rate of <inline-formula><mml:math id="M319" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">13</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">800</mml:mn></mml:mrow></mml:math></inline-formula> SMILES per minute on a single laptop processor core (1 thread, Intel Core i7-10710U CPU). This automatic classification rate represents a tremendous advance over manually assigning AIOMFAC subgroups for each component, which would be an infeasible task for systems containing thousands of components. The S2AS tool's processing speed scales linearly with dataset size. Further speedup via parallelization on multi-core computers is possible, but such enhancements have not been attempted in the current version of the source code.</p>
</sec>
<sec id="Ch1.S3.SS2.SSS2">
  <label>3.2.2</label><title>Vapour pressure estimation</title>
      <p id="d2e6461">The UManSysProp pure-component vapour pressure estimation tool demonstrated robust performance across the wide range of molecular structures present in the <inline-formula><mml:math id="M320" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene and toluene oxidation systems. It successfully parsed the list of SMILES and predicted the vapour pressures for all molecules. For the examples shown in this study, we used the output from the EVAPORATION method, yet the seven other pure-component vapour pressure prediction methods available from UManSysProp were also completed successfully.</p>
      <p id="d2e6471">The version of UManSysProp we employed in this work includes a new module we developed to determine a two-parameter temperature dependence parameterization of pure-component vapour pressures using the form of Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>) for each of the vapour pressure methods included (not just EVAPORATION). In a first step, a component's vapour pressure is predicted by each method at seven temperatures equally spaced between 260 and 320 <inline-formula><mml:math id="M321" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">K</mml:mi></mml:mrow></mml:math></inline-formula>. In a second step, the two parameters of Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>) are fitted to these data for each method. In the case of EVAPORATION, no fitting is required since one can solve for parameters <inline-formula><mml:math id="M322" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M323" display="inline"><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> using the output from the end points of the temperature range (two equations, two unknowns). If speed is of the essence, the parameter fitting step can also be bypassed for all other methods by solving for the parameters in the same way, usually yielding similar parameter values to those obtained from a more elaborate fit. The single-thread processing of the <inline-formula><mml:math id="M324" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">68</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">000</mml:mn></mml:mrow></mml:math></inline-formula> toluene-derived product SMILES took 482 s on a Intel Core i7-10710U CPU (one thread) for the vapour pressure data creation plus 83 s for determining the parameters <inline-formula><mml:math id="M325" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M326" display="inline"><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> pertaining to each method when bypassing the parameter fitting step. This amounts to a SMILES processing rate of <inline-formula><mml:math id="M327" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">120</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">s</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:math></inline-formula> or <inline-formula><mml:math id="M328" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">7200</mml:mn></mml:mrow></mml:math></inline-formula> SMILES per minute (including the parameter fitting step reduces the rate to <inline-formula><mml:math id="M329" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">1650</mml:mn></mml:mrow></mml:math></inline-formula> SMILES per minute). As in the case of the S2AS tool, further speed-up is possible by introducing parallel processing (with good scaling potential) in the case of extensive lists of SMILES.</p>
      <p id="d2e6584">Together with the introduced S2AS, these two pure-component property prediction tools enable automated and efficient processing of SMILES data for atmospheric and environmental chemistry applications.</p>
</sec>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Evaluation metrics for the 2D lumping framework</title>
      <p id="d2e6596">To facilitate an evaluation of the 2D lumping framework in terms of impacts of polarity axis choices and grid resolutions, we performed 2D lumping at several grid resolutions, each followed by AIOMFAC-based equilibrium gas–particle partitioning computations to generate aerosol properties of interest for a quantitative comparison. Specifically, we calculated the mean absolute percentage error (MAPE) and mean percentage error (MPE) for the resulting SOA mass concentrations as well as the aerosol hygroscopicity parameter <inline-formula><mml:math id="M330" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula>. Aerosol mass concentration and hygroscopicity are two (among several) insightful characteristics of the (gas–)aerosol partitioning behavior and water uptake potential. The MAPE and MPE are relative deviation metrics defined as follows: 

                <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M331" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E10"><mml:mtd><mml:mtext>10</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mtext>MAPE</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>n</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mfenced open="|" close="|"><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mo>×</mml:mo><mml:mn mathvariant="normal">100</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="italic">%</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E11"><mml:mtd><mml:mtext>11</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mtext>MPE</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>n</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mfenced close=")" open="("><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mo>×</mml:mo><mml:mn mathvariant="normal">100</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="italic">%</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          Here, <inline-formula><mml:math id="M332" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> represents the actual (observed or reference) value, <inline-formula><mml:math id="M333" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> the predicted value, and <inline-formula><mml:math id="M334" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> is the number of observations. MAPE is scale-independent, making it useful for comparing overall prediction precision across different datasets (especially datasets free of extreme outliers). MPE measures the average bias in the predictions relative to the reference data. By considering the direction of errors, MPE complements MAPE by highlighting any systematic high or low biases in the model's predictions, which can be crucial for understanding the limitations and potential improvements of each method.</p>
      <p id="d2e6765">MAPE and MPE were calculated for a selection of grid resolutions (<inline-formula><mml:math id="M335" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M336" display="inline"><mml:mrow><mml:mn mathvariant="normal">6</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M337" display="inline"><mml:mrow><mml:mn mathvariant="normal">8</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M338" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula>) and polarity axis metrics for both <inline-formula><mml:math id="M339" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene-derived SOA and toluene-derived SOA. In order to calculate the evaluation metrics for the toluene SOA system, for which a full-system calculation was not feasible with the AIOMFAC equilibrium partitioning model, the mass-weighted <inline-formula><mml:math id="M340" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method with the ACR polarity axis and a higher <inline-formula><mml:math id="M341" display="inline"><mml:mrow><mml:mn mathvariant="normal">25</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula> grid resolution was used as the reference (benchmark case) for all  relative deviation evaluations. A validation check was also carried out using the <inline-formula><mml:math id="M342" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method for a <inline-formula><mml:math id="M343" display="inline"><mml:mrow><mml:mn mathvariant="normal">40</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">20</mml:mn></mml:mrow></mml:math></inline-formula> grid resolution (also with ACR as polarity axis) to verify whether the MAPE and MPE values across different (higher) grid resolutions and associated numbers of <inline-formula><mml:math id="M344" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clusters, were consistent with the reference case. The predicted reference mass concentration at the <inline-formula><mml:math id="M345" display="inline"><mml:mrow><mml:mn mathvariant="normal">25</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula> grid resolution (219 surrogates) agrees with that from the <inline-formula><mml:math id="M346" display="inline"><mml:mrow><mml:mn mathvariant="normal">40</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">20</mml:mn></mml:mrow></mml:math></inline-formula> (612 surrogates) case within a MAPE of <inline-formula><mml:math id="M347" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.1</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="italic">%</mml:mi></mml:mrow></mml:math></inline-formula>, confirming it to be an appropriate reference.</p>
      <p id="d2e6905">Table <xref ref-type="table" rid="T2"/> compares the relative deviations of different surrogate selection methods for predicting SOA mass concentrations using the selected surrogate species from the <inline-formula><mml:math id="M348" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA system. The table lists the MAPE and MPE for the four surrogate selection methods (midpoint, medoid, mass-weighted medoid, mass-weighted <inline-formula><mml:math id="M349" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means) combined with three polarity axis options across different grid resolutions. The corresponding absolute SOA mass concentrations and <inline-formula><mml:math id="M350" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> values predicted for this system are listed in Table S3. The weighted <inline-formula><mml:math id="M351" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method generally performs best, with lowest errors across most grid sizes. Table <xref ref-type="table" rid="T3"/> presents similar MAPE and MPE data but for the predictions of the hygroscopicity parameter <inline-formula><mml:math id="M352" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> for the same <inline-formula><mml:math id="M353" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA system. Again, the weighted <inline-formula><mml:math id="M354" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method tends to show the smallest errors overall compared to other surrogate selection approaches. A more detailed discussion of different axis choices and impacts follows in the “Analysis of the polarity axes and surrogate selection methods” part.</p>

<table-wrap id="T2" specific-use="star"><label>Table 2</label><caption><p id="d2e6966">Comparison of relative deviations in predicted SOA mass concentrations at 298 <inline-formula><mml:math id="M355" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">K</mml:mi></mml:mrow></mml:math></inline-formula> for the <inline-formula><mml:math id="M356" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene-derived SOA system. The MAPE and MPE values are listed for the four different surrogate selection methods combined with three choices for the polarity axis (<inline-formula><mml:math id="M357" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis) and for several grid/cluster resolutions. The reference values are from the partitioning computation based on the full system (174 organic components). For a comparison of related absolute quantities, see Table S3.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="10">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right" colsep="1"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right" colsep="1"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right" colsep="1"/>
     <oasis:colspec colnum="9" colname="col9" align="right"/>
     <oasis:colspec colnum="10" colname="col10" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Surrogate selection</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M358" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis</oasis:entry>
         <oasis:entry rowsep="1" namest="col3" nameend="col4" align="center" colsep="1"><inline-formula><mml:math id="M359" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col5" nameend="col6" align="center" colsep="1"><inline-formula><mml:math id="M360" display="inline"><mml:mrow><mml:mn mathvariant="normal">6</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col7" nameend="col8" align="center" colsep="1"><inline-formula><mml:math id="M361" display="inline"><mml:mrow><mml:mn mathvariant="normal">8</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col9" nameend="col10" align="center"><inline-formula><mml:math id="M362" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">MAPE</oasis:entry>
         <oasis:entry colname="col4">MPE</oasis:entry>
         <oasis:entry colname="col5">MAPE</oasis:entry>
         <oasis:entry colname="col6">MPE</oasis:entry>
         <oasis:entry colname="col7">MAPE</oasis:entry>
         <oasis:entry colname="col8">MPE</oasis:entry>
         <oasis:entry colname="col9">MAPE</oasis:entry>
         <oasis:entry colname="col10">MPE</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Midpoint</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M363" display="inline"><mml:mrow><mml:msub><mml:mi>log⁡</mml:mi><mml:mn mathvariant="normal">10</mml:mn></mml:msub><mml:mfenced close="]" open="["><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mtext>hex</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>/</mml:mo><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">w</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">13.6 %</oasis:entry>
         <oasis:entry colname="col4">12.6 %</oasis:entry>
         <oasis:entry colname="col5">25.9 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M364" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">25.9</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">2.1 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M365" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2.6</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">20.1 %</oasis:entry>
         <oasis:entry colname="col10">20.1 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">4.1 %</oasis:entry>
         <oasis:entry colname="col4">3.7 %</oasis:entry>
         <oasis:entry colname="col5">24.8 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M366" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">24.8</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">24.2 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M367" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">24.2</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">17.6 %</oasis:entry>
         <oasis:entry colname="col10">17.6 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">20.4 %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M368" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">20.4</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col5">2.2 %</oasis:entry>
         <oasis:entry colname="col6">2.2 %</oasis:entry>
         <oasis:entry colname="col7">3.2 %</oasis:entry>
         <oasis:entry colname="col8">3.2 %</oasis:entry>
         <oasis:entry colname="col9">5.0 %</oasis:entry>
         <oasis:entry colname="col10">5.0 %</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Weighted <inline-formula><mml:math id="M369" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">2.0 %</oasis:entry>
         <oasis:entry colname="col4">2.0 %</oasis:entry>
         <oasis:entry colname="col5">1.9 %</oasis:entry>
         <oasis:entry colname="col6">0.2 %</oasis:entry>
         <oasis:entry colname="col7">0.4 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M370" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.4</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">0.1 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M371" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Midpoint</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M372" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">51.2 %</oasis:entry>
         <oasis:entry colname="col4">51.2 %</oasis:entry>
         <oasis:entry colname="col5">6.6 %</oasis:entry>
         <oasis:entry colname="col6">2.5 %</oasis:entry>
         <oasis:entry colname="col7">17.4 %</oasis:entry>
         <oasis:entry colname="col8">17.4 %</oasis:entry>
         <oasis:entry colname="col9">24.3 %</oasis:entry>
         <oasis:entry colname="col10">24.3 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">16.0 %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M373" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">8.8</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col5">8.0 %</oasis:entry>
         <oasis:entry colname="col6">8.0 %</oasis:entry>
         <oasis:entry colname="col7">6.9 %</oasis:entry>
         <oasis:entry colname="col8">4.4 %</oasis:entry>
         <oasis:entry colname="col9">39.8 %</oasis:entry>
         <oasis:entry colname="col10">39.8 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">6.1 %</oasis:entry>
         <oasis:entry colname="col4">0.9 %</oasis:entry>
         <oasis:entry colname="col5">18.0 %</oasis:entry>
         <oasis:entry colname="col6">18.0 %</oasis:entry>
         <oasis:entry colname="col7">7.1 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M374" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.8</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">6.6 %</oasis:entry>
         <oasis:entry colname="col10">6.6 %</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Weighted <inline-formula><mml:math id="M375" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">6.7 %</oasis:entry>
         <oasis:entry colname="col4">3.9 %</oasis:entry>
         <oasis:entry colname="col5">2.9 %</oasis:entry>
         <oasis:entry colname="col6">2.9 %</oasis:entry>
         <oasis:entry colname="col7">0.9 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M376" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.9</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">0.2 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M377" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.2</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Midpoint</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M378" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">71.4 %</oasis:entry>
         <oasis:entry colname="col4">71.4 %</oasis:entry>
         <oasis:entry colname="col5">9.2 %</oasis:entry>
         <oasis:entry colname="col6">8.4 %</oasis:entry>
         <oasis:entry colname="col7">22.9 %</oasis:entry>
         <oasis:entry colname="col8">22.9 %</oasis:entry>
         <oasis:entry colname="col9">1.5 %</oasis:entry>
         <oasis:entry colname="col10">1.1 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">16.4 %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M379" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1.8</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col5">25.6 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M380" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">25.6</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">25.6 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M381" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">25.6</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">20.3 %</oasis:entry>
         <oasis:entry colname="col10">19.0 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">23.8 %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M382" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">23.8</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col5">2.2 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M383" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">1.6 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M384" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.8</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">3.0 %</oasis:entry>
         <oasis:entry colname="col10">3.0 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted <inline-formula><mml:math id="M385" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">1.7 %</oasis:entry>
         <oasis:entry colname="col4">1.0 %</oasis:entry>
         <oasis:entry colname="col5">1.6 %</oasis:entry>
         <oasis:entry colname="col6">0.7 %</oasis:entry>
         <oasis:entry colname="col7">1.2 %</oasis:entry>
         <oasis:entry colname="col8">1.0 %</oasis:entry>
         <oasis:entry colname="col9">0.9 %</oasis:entry>
         <oasis:entry colname="col10">0.6 %</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<table-wrap id="T3" specific-use="star"><label>Table 3</label><caption><p id="d2e7772">Similar to Table <xref ref-type="table" rid="T2"/> but for the MAPE and MPE of the predicted hygroscopicity parameter <inline-formula><mml:math id="M386" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> (evaluated at water activities of 85 % and 90 %) relative to the full <inline-formula><mml:math id="M387" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA system prediction used as benchmark.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="10">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right" colsep="1"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right" colsep="1"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right" colsep="1"/>
     <oasis:colspec colnum="9" colname="col9" align="right"/>
     <oasis:colspec colnum="10" colname="col10" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Surrogate selection</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M388" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis</oasis:entry>
         <oasis:entry rowsep="1" namest="col3" nameend="col4" align="center" colsep="1"><inline-formula><mml:math id="M389" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col5" nameend="col6" align="center" colsep="1"><inline-formula><mml:math id="M390" display="inline"><mml:mrow><mml:mn mathvariant="normal">6</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col7" nameend="col8" align="center" colsep="1"><inline-formula><mml:math id="M391" display="inline"><mml:mrow><mml:mn mathvariant="normal">8</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col9" nameend="col10" align="center"><inline-formula><mml:math id="M392" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">MAPE</oasis:entry>
         <oasis:entry colname="col4">MPE</oasis:entry>
         <oasis:entry colname="col5">MAPE</oasis:entry>
         <oasis:entry colname="col6">MPE</oasis:entry>
         <oasis:entry colname="col7">MAPE</oasis:entry>
         <oasis:entry colname="col8">MPE</oasis:entry>
         <oasis:entry colname="col9">MAPE</oasis:entry>
         <oasis:entry colname="col10">MPE</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Midpoint</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M393" display="inline"><mml:mrow><mml:msub><mml:mi>log⁡</mml:mi><mml:mn mathvariant="normal">10</mml:mn></mml:msub><mml:mfenced close="]" open="["><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mtext>hex</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>/</mml:mo><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">w</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">3.3 %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M394" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3.3</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col5">16.0 %</oasis:entry>
         <oasis:entry colname="col6">16.0 %</oasis:entry>
         <oasis:entry colname="col7">3.5 %</oasis:entry>
         <oasis:entry colname="col8">3.5 %</oasis:entry>
         <oasis:entry colname="col9">9.3 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M395" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">9.3</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">16.1 %</oasis:entry>
         <oasis:entry colname="col4">16.1 %</oasis:entry>
         <oasis:entry colname="col5">11.5 %</oasis:entry>
         <oasis:entry colname="col6">11.5 %</oasis:entry>
         <oasis:entry colname="col7">8.2 %</oasis:entry>
         <oasis:entry colname="col8">8.2 %</oasis:entry>
         <oasis:entry colname="col9">6.5 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M396" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">6.5</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">3.6 %</oasis:entry>
         <oasis:entry colname="col4">3.6 %</oasis:entry>
         <oasis:entry colname="col5">3.0 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M397" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3.0</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">3.0 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M398" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3.0</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">1.8 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M399" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1.8</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Weighted <inline-formula><mml:math id="M400" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">3.5 %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M401" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3.5</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col5">3.6 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M402" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3.6</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">2.1 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M403" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2.1</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">0.0 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M404" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.0</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Midpoint</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M405" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">73.3 %</oasis:entry>
         <oasis:entry colname="col4">73.3 %</oasis:entry>
         <oasis:entry colname="col5">13.9 %</oasis:entry>
         <oasis:entry colname="col6">13.9 %</oasis:entry>
         <oasis:entry colname="col7">19.2 %</oasis:entry>
         <oasis:entry colname="col8">19.2 %</oasis:entry>
         <oasis:entry colname="col9">1.5 %</oasis:entry>
         <oasis:entry colname="col10">1.5 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">89.1 %</oasis:entry>
         <oasis:entry colname="col4">89.1 %</oasis:entry>
         <oasis:entry colname="col5">39.8 %</oasis:entry>
         <oasis:entry colname="col6">39.8 %</oasis:entry>
         <oasis:entry colname="col7">36.4 %</oasis:entry>
         <oasis:entry colname="col8">36.4 %</oasis:entry>
         <oasis:entry colname="col9">14.6 %</oasis:entry>
         <oasis:entry colname="col10">14.6 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">11.7 %</oasis:entry>
         <oasis:entry colname="col4">11.7 %</oasis:entry>
         <oasis:entry colname="col5">1.3 %</oasis:entry>
         <oasis:entry colname="col6">0.5 %</oasis:entry>
         <oasis:entry colname="col7">14.4 %</oasis:entry>
         <oasis:entry colname="col8">14.4 %</oasis:entry>
         <oasis:entry colname="col9">15.9 %</oasis:entry>
         <oasis:entry colname="col10">15.9 %</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Weighted <inline-formula><mml:math id="M406" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">14.9 %</oasis:entry>
         <oasis:entry colname="col4">14.9 %</oasis:entry>
         <oasis:entry colname="col5">5.2 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M407" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5.2</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">2.1 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M408" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2.1</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">0.3 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M409" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.3</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Midpoint</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M410" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">35.8 %</oasis:entry>
         <oasis:entry colname="col4">35.8 %</oasis:entry>
         <oasis:entry colname="col5">26.2 %</oasis:entry>
         <oasis:entry colname="col6">26.2 %</oasis:entry>
         <oasis:entry colname="col7">5.9 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M411" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5.9</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">5.9 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M412" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5.9</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">32.8 %</oasis:entry>
         <oasis:entry colname="col4">32.8 %</oasis:entry>
         <oasis:entry colname="col5">1.6 %</oasis:entry>
         <oasis:entry colname="col6">1.6 %</oasis:entry>
         <oasis:entry colname="col7">21.4 %</oasis:entry>
         <oasis:entry colname="col8">21.4 %</oasis:entry>
         <oasis:entry colname="col9">7.5 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M413" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">7.5</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">12.9 %</oasis:entry>
         <oasis:entry colname="col4">12.9 %</oasis:entry>
         <oasis:entry colname="col5">5.0 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M414" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5.0</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">5.8 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M415" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5.8</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">7.8 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M416" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">7.8</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted <inline-formula><mml:math id="M417" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">8.2 %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M418" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">8.2</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col5">7.1 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M419" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">7.1</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">4.5 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M420" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4.5</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">2.5 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M421" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2.5</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e8623">Table <xref ref-type="table" rid="T4"/> provides MAPE and MPE values for SOA predictions from the toluene oxidation system, comparing the same surrogate selection methods and polarity metrics across five grid resolutions. Table <xref ref-type="table" rid="T5"/> shows the corresponding evaluation of the hygroscopicity parameter predictions for the toluene-derived SOA. The corresponding absolute SOA mass concentrations and <inline-formula><mml:math id="M422" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> values predicted for this system are listed in Table S4. Deviations from the reference case at lower grid resolutions are generally higher than for SOA mass predictions, but the mass-weighted <inline-formula><mml:math id="M423" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method still performs relatively well at most resolutions. The larger variability in <inline-formula><mml:math id="M424" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> predictions stems in part also from the relatively low absolute values (Table <xref ref-type="table" rid="T7"/>). Additionally, the relatively small modelled SOA mass concentrations contribute to the observed metric fluctuations, since minor absolute differences can result in larger relative errors.</p>

<table-wrap id="T4" specific-use="star"><label>Table 4</label><caption><p id="d2e8657">Comparison of relative deviations in predicted SOA mass concentrations at 298 <inline-formula><mml:math id="M425" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">K</mml:mi></mml:mrow></mml:math></inline-formula> for the toluene-derived SOA system. The MAPE and MPE values are listed for the four different surrogate selection methods combined with three choices for the polarity axis and for several grid/cluster resolutions. The computation with the mass-weighted <inline-formula><mml:math id="M426" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method using the ACR polarity proxy at <inline-formula><mml:math id="M427" display="inline"><mml:mrow><mml:mn mathvariant="normal">25</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula> resolution (219 surrogate components) is used as reference for MAPE and MPE calculations. For related absolute quantities, see Table S4.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="12">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right" colsep="1"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right" colsep="1"/>
     <oasis:colspec colnum="7" colname="col7" align="center"/>
     <oasis:colspec colnum="8" colname="col8" align="right" colsep="1"/>
     <oasis:colspec colnum="9" colname="col9" align="center"/>
     <oasis:colspec colnum="10" colname="col10" align="right" colsep="1"/>
     <oasis:colspec colnum="11" colname="col11" align="center"/>
     <oasis:colspec colnum="12" colname="col12" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Surrogate selection</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M428" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis</oasis:entry>
         <oasis:entry rowsep="1" namest="col3" nameend="col4" align="center" colsep="1"><inline-formula><mml:math id="M429" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col5" nameend="col6" align="center" colsep="1"><inline-formula><mml:math id="M430" display="inline"><mml:mrow><mml:mn mathvariant="normal">6</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col7" nameend="col8" colsep="1"><inline-formula><mml:math id="M431" display="inline"><mml:mrow><mml:mn mathvariant="normal">8</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col9" nameend="col10" colsep="1"><inline-formula><mml:math id="M432" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col11" nameend="col12"><inline-formula><mml:math id="M433" display="inline"><mml:mrow><mml:mn mathvariant="normal">25</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">MAPE</oasis:entry>
         <oasis:entry colname="col4">MPE</oasis:entry>
         <oasis:entry colname="col5">MAPE</oasis:entry>
         <oasis:entry colname="col6">MPE</oasis:entry>
         <oasis:entry colname="col7">MAPE</oasis:entry>
         <oasis:entry colname="col8">MPE</oasis:entry>
         <oasis:entry colname="col9">MAPE</oasis:entry>
         <oasis:entry colname="col10">MPE</oasis:entry>
         <oasis:entry colname="col11">MAPE</oasis:entry>
         <oasis:entry colname="col12">MPE</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Midpoint</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M434" display="inline"><mml:mrow><mml:msub><mml:mi>log⁡</mml:mi><mml:mn mathvariant="normal">10</mml:mn></mml:msub><mml:mfenced open="[" close="]"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mtext>hex</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>/</mml:mo><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">w</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">5.1 %</oasis:entry>
         <oasis:entry colname="col4">5.1 %</oasis:entry>
         <oasis:entry colname="col5">2.7 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M435" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2.7</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">6.8 %</oasis:entry>
         <oasis:entry colname="col8">6.8 %</oasis:entry>
         <oasis:entry colname="col9">1.3 %</oasis:entry>
         <oasis:entry colname="col10">1.3 %</oasis:entry>
         <oasis:entry colname="col11">3.4 %</oasis:entry>
         <oasis:entry colname="col12">3.4 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">8.2 %</oasis:entry>
         <oasis:entry colname="col4">8.2 %</oasis:entry>
         <oasis:entry colname="col5">5.0 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M436" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5.0</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">1.9 %</oasis:entry>
         <oasis:entry colname="col8">1.5 %</oasis:entry>
         <oasis:entry colname="col9">1.2 %</oasis:entry>
         <oasis:entry colname="col10">0.6 %</oasis:entry>
         <oasis:entry colname="col11">2.9 %</oasis:entry>
         <oasis:entry colname="col12">2.9 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">12 %</oasis:entry>
         <oasis:entry colname="col4">12 %</oasis:entry>
         <oasis:entry colname="col5">6.4 %</oasis:entry>
         <oasis:entry colname="col6">6.4 %</oasis:entry>
         <oasis:entry colname="col7">2.0 %</oasis:entry>
         <oasis:entry colname="col8">2.0 %</oasis:entry>
         <oasis:entry colname="col9">2.4 %</oasis:entry>
         <oasis:entry colname="col10">2.4 %</oasis:entry>
         <oasis:entry colname="col11">3.6 %</oasis:entry>
         <oasis:entry colname="col12">3.6 %</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Weighted <inline-formula><mml:math id="M437" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">6.0 %</oasis:entry>
         <oasis:entry colname="col4">6.0 %</oasis:entry>
         <oasis:entry colname="col5">2.4 %</oasis:entry>
         <oasis:entry colname="col6">2.4 %</oasis:entry>
         <oasis:entry colname="col7">3.9 %</oasis:entry>
         <oasis:entry colname="col8">3.9 %</oasis:entry>
         <oasis:entry colname="col9">4.3 %</oasis:entry>
         <oasis:entry colname="col10">4.3 %</oasis:entry>
         <oasis:entry colname="col11">0.0 %</oasis:entry>
         <oasis:entry colname="col12">0.0 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Midpoint</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M438" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">13 %</oasis:entry>
         <oasis:entry colname="col4">13 %</oasis:entry>
         <oasis:entry colname="col5">10 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M439" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">1.4 %</oasis:entry>
         <oasis:entry colname="col8">1.4 %</oasis:entry>
         <oasis:entry colname="col9">2.1 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M440" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2.1</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col11">2.5 %</oasis:entry>
         <oasis:entry colname="col12"><inline-formula><mml:math id="M441" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2.5</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">6.7 %</oasis:entry>
         <oasis:entry colname="col4">6.7 %</oasis:entry>
         <oasis:entry colname="col5">1.2 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M442" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.3</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">8.3 %</oasis:entry>
         <oasis:entry colname="col8">8.3 %</oasis:entry>
         <oasis:entry colname="col9">4.7 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M443" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4.7</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col11">2.2 %</oasis:entry>
         <oasis:entry colname="col12"><inline-formula><mml:math id="M444" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2.2</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">8.9 %</oasis:entry>
         <oasis:entry colname="col4">8.9 %</oasis:entry>
         <oasis:entry colname="col5">2.0 %</oasis:entry>
         <oasis:entry colname="col6">2.0 %</oasis:entry>
         <oasis:entry colname="col7">4.5 %</oasis:entry>
         <oasis:entry colname="col8">4.5 %</oasis:entry>
         <oasis:entry colname="col9">4.4 %</oasis:entry>
         <oasis:entry colname="col10">4.4 %</oasis:entry>
         <oasis:entry colname="col11">0.7 %</oasis:entry>
         <oasis:entry colname="col12"><inline-formula><mml:math id="M445" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.7</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Weighted <inline-formula><mml:math id="M446" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">2.4 %</oasis:entry>
         <oasis:entry colname="col4">2.4 %</oasis:entry>
         <oasis:entry colname="col5">1.3 %</oasis:entry>
         <oasis:entry colname="col6">0.3 %</oasis:entry>
         <oasis:entry colname="col7">4.9 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M447" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4.9</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">4.1 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M448" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4.1</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col11">1.8 %</oasis:entry>
         <oasis:entry colname="col12">1.8 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Midpoint</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M449" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">10 %</oasis:entry>
         <oasis:entry colname="col4">10 %</oasis:entry>
         <oasis:entry colname="col5">6.7 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M450" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">6.7</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">7.8 %</oasis:entry>
         <oasis:entry colname="col8">7.8 %</oasis:entry>
         <oasis:entry colname="col9">0.8 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M451" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.5</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col11">5.1 %</oasis:entry>
         <oasis:entry colname="col12">5.1 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">1.8 %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M452" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1.8</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col5">4.2 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M453" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4.2</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">7.6 %</oasis:entry>
         <oasis:entry colname="col8">7.6 %</oasis:entry>
         <oasis:entry colname="col9">0.9 %</oasis:entry>
         <oasis:entry colname="col10">0.4 %</oasis:entry>
         <oasis:entry colname="col11">5.1 %</oasis:entry>
         <oasis:entry colname="col12">5.1 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">11 %</oasis:entry>
         <oasis:entry colname="col4">11 %</oasis:entry>
         <oasis:entry colname="col5">2.0 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M454" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.9</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">1.0 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M455" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.9</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">1.8 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M456" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1.8</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col11">6.3 %</oasis:entry>
         <oasis:entry colname="col12">6.3 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted <inline-formula><mml:math id="M457" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">4.7 %</oasis:entry>
         <oasis:entry colname="col4">3.4 %</oasis:entry>
         <oasis:entry colname="col5">2.1 %</oasis:entry>
         <oasis:entry colname="col6">2.1 %</oasis:entry>
         <oasis:entry colname="col7">0.8 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M458" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">2.9 %</oasis:entry>
         <oasis:entry colname="col10">2.9 %</oasis:entry>
         <oasis:entry colname="col11">2.1 %</oasis:entry>
         <oasis:entry colname="col12">2.1 %</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<table-wrap id="T5" specific-use="star"><label>Table 5</label><caption><p id="d2e9595">Similar to Table <xref ref-type="table" rid="T4"/> but for the MAPE and MPE of the predicted hygroscopicity parameter <inline-formula><mml:math id="M459" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> of the predicted toluene-derived SOA (evaluated at water activities of 85 % and 90 %). The mass-concentration-weighted <inline-formula><mml:math id="M460" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method at <inline-formula><mml:math id="M461" display="inline"><mml:mrow><mml:mn mathvariant="normal">25</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula> resolution is used as reference case.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="12">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right" colsep="1"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right" colsep="1"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right" colsep="1"/>
     <oasis:colspec colnum="9" colname="col9" align="right"/>
     <oasis:colspec colnum="10" colname="col10" align="right" colsep="1"/>
     <oasis:colspec colnum="11" colname="col11" align="right"/>
     <oasis:colspec colnum="12" colname="col12" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Surrogate selection</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M462" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis</oasis:entry>
         <oasis:entry rowsep="1" namest="col3" nameend="col4" align="center" colsep="1"><inline-formula><mml:math id="M463" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col5" nameend="col6" align="center" colsep="1"><inline-formula><mml:math id="M464" display="inline"><mml:mrow><mml:mn mathvariant="normal">6</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col7" nameend="col8" align="center" colsep="1"><inline-formula><mml:math id="M465" display="inline"><mml:mrow><mml:mn mathvariant="normal">8</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col9" nameend="col10" align="center" colsep="1"><inline-formula><mml:math id="M466" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col11" nameend="col12" align="center"><inline-formula><mml:math id="M467" display="inline"><mml:mrow><mml:mn mathvariant="normal">25</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">MAPE</oasis:entry>
         <oasis:entry colname="col4">MPE</oasis:entry>
         <oasis:entry colname="col5">MAPE</oasis:entry>
         <oasis:entry colname="col6">MPE</oasis:entry>
         <oasis:entry colname="col7">MAPE</oasis:entry>
         <oasis:entry colname="col8">MPE</oasis:entry>
         <oasis:entry colname="col9">MAPE</oasis:entry>
         <oasis:entry colname="col10">MPE</oasis:entry>
         <oasis:entry colname="col11">MAPE</oasis:entry>
         <oasis:entry colname="col12">MPE</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Midpoint</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M468" display="inline"><mml:mrow><mml:msub><mml:mi>log⁡</mml:mi><mml:mn mathvariant="normal">10</mml:mn></mml:msub><mml:mfenced open="[" close="]"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mtext>hex</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup><mml:mo>/</mml:mo><mml:msubsup><mml:mi mathvariant="italic">γ</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">w</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">1.4 %</oasis:entry>
         <oasis:entry colname="col4">0.9 %</oasis:entry>
         <oasis:entry colname="col5">26 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M469" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">26</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">27 %</oasis:entry>
         <oasis:entry colname="col8">27 %</oasis:entry>
         <oasis:entry colname="col9">21 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M470" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">21</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col11">2.9 %</oasis:entry>
         <oasis:entry colname="col12"><inline-formula><mml:math id="M471" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2.9</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">9.4 %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M472" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">9.4</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col5">47 %</oasis:entry>
         <oasis:entry colname="col6">47 %</oasis:entry>
         <oasis:entry colname="col7">27 %</oasis:entry>
         <oasis:entry colname="col8">27 %</oasis:entry>
         <oasis:entry colname="col9">6.3 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M473" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">6.3</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col11">5.0 %</oasis:entry>
         <oasis:entry colname="col12">5.0 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">10 %</oasis:entry>
         <oasis:entry colname="col4">10 %</oasis:entry>
         <oasis:entry colname="col5">53 %</oasis:entry>
         <oasis:entry colname="col6">53 %</oasis:entry>
         <oasis:entry colname="col7">6.5 %</oasis:entry>
         <oasis:entry colname="col8">6.5 %</oasis:entry>
         <oasis:entry colname="col9">6.3 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M474" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">6.3</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col11">9.8 %</oasis:entry>
         <oasis:entry colname="col12"><inline-formula><mml:math id="M475" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">9.8</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Weighted <inline-formula><mml:math id="M476" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">31 %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M477" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">31</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col5">8.1 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M478" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">8.1</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">9.0 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M479" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">9.0</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">13 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M480" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">13</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col11">0.0 %</oasis:entry>
         <oasis:entry colname="col12">0.0 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Midpoint</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M481" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">143 %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M482" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">143</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col5">38 %</oasis:entry>
         <oasis:entry colname="col6">38 %</oasis:entry>
         <oasis:entry colname="col7">25 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M483" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">25</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">7.5 %</oasis:entry>
         <oasis:entry colname="col10">7.5 %</oasis:entry>
         <oasis:entry colname="col11">45 %</oasis:entry>
         <oasis:entry colname="col12"><inline-formula><mml:math id="M484" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">45</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">131 %</oasis:entry>
         <oasis:entry colname="col4">131 %</oasis:entry>
         <oasis:entry colname="col5">38 %</oasis:entry>
         <oasis:entry colname="col6">38 %</oasis:entry>
         <oasis:entry colname="col7">35 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M485" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">35</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">40 %</oasis:entry>
         <oasis:entry colname="col10">40 %</oasis:entry>
         <oasis:entry colname="col11">18 %</oasis:entry>
         <oasis:entry colname="col12"><inline-formula><mml:math id="M486" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">18</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">46 %</oasis:entry>
         <oasis:entry colname="col4">46 %</oasis:entry>
         <oasis:entry colname="col5">8.5 %</oasis:entry>
         <oasis:entry colname="col6">8.5 %</oasis:entry>
         <oasis:entry colname="col7">51 %</oasis:entry>
         <oasis:entry colname="col8">51 %</oasis:entry>
         <oasis:entry colname="col9">42 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M487" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">42</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col11">1.0 %</oasis:entry>
         <oasis:entry colname="col12">1.0 %</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Weighted <inline-formula><mml:math id="M488" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">18 %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M489" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">18</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col5">45 %</oasis:entry>
         <oasis:entry colname="col6">45 %</oasis:entry>
         <oasis:entry colname="col7">26 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M490" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">26</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">11 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M491" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">11</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col11">10 %</oasis:entry>
         <oasis:entry colname="col12"><inline-formula><mml:math id="M492" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Midpoint</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M493" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">34 %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M494" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">34</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col5">49 %</oasis:entry>
         <oasis:entry colname="col6">49 %</oasis:entry>
         <oasis:entry colname="col7">30 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M495" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">30</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">25 %</oasis:entry>
         <oasis:entry colname="col10">25 %</oasis:entry>
         <oasis:entry colname="col11">12 %</oasis:entry>
         <oasis:entry colname="col12">12 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">24 %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M496" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">24</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col5">23 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M497" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">23</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">90 %</oasis:entry>
         <oasis:entry colname="col8">90 %</oasis:entry>
         <oasis:entry colname="col9">42 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M498" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">42</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col11">30 %</oasis:entry>
         <oasis:entry colname="col12">30 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted medoid</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">70 %</oasis:entry>
         <oasis:entry colname="col4">70 %</oasis:entry>
         <oasis:entry colname="col5">8.5 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M499" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">8.5</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">44 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M500" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">44</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">42 %</oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M501" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">42</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col11">14 %</oasis:entry>
         <oasis:entry colname="col12">14 %</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Weighted <inline-formula><mml:math id="M502" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">70 %</oasis:entry>
         <oasis:entry colname="col4">70 %</oasis:entry>
         <oasis:entry colname="col5">49 %</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M503" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">49</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col7">30 %</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M504" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">30</mml:mn></mml:mrow></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col9">25 %</oasis:entry>
         <oasis:entry colname="col10">25 %</oasis:entry>
         <oasis:entry colname="col11">10 %</oasis:entry>
         <oasis:entry colname="col12">10 %</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e10638">Table <xref ref-type="table" rid="T6"/> compares the predicted aerosol mass concentrations for the toluene and <inline-formula><mml:math id="M505" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA systems at different water activities, using the weighted <inline-formula><mml:math id="M506" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method or, in case of <inline-formula><mml:math id="M507" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA, the full system for the model calculations. In both cases, the predicted SOA mass concentrations are relatively low (<inline-formula><mml:math id="M508" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> or <inline-formula><mml:math id="M509" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M510" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>), yet in a realistic range for relatively clean air quality conditions. These absolute SOA mass concentrations may contribute to the observed fluctuations in MAPE and MPE for different resolutions and polarity axis choices, since minor absolute differences can result in larger relative deviations.</p>

<table-wrap id="T6"><label>Table 6</label><caption><p id="d2e10707">Comparison of predicted SOA mass concentrations at selected water activities (equilibrium RH) for the toluene and <inline-formula><mml:math id="M511" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA systems. For the toluene SOA system, the data are based on ACR as the polarity axis, <inline-formula><mml:math id="M512" display="inline"><mml:mrow><mml:mn mathvariant="normal">25</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula> resolution and <inline-formula><mml:math id="M513" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means surrogate selection. The full set of MCM-derived system components is used for the <inline-formula><mml:math id="M514" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA case.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="center"/>
     <oasis:colspec colnum="3" colname="col3" align="center"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Water activity</oasis:entry>
         <oasis:entry colname="col2">Toluene SOA</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M515" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-Pinene SOA</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">(–)</oasis:entry>
         <oasis:entry colname="col2">(<inline-formula><mml:math id="M516" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col3">(<inline-formula><mml:math id="M517" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">0.95</oasis:entry>
         <oasis:entry colname="col2">1.981</oasis:entry>
         <oasis:entry colname="col3">3.505</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">0.85</oasis:entry>
         <oasis:entry colname="col2">1.786</oasis:entry>
         <oasis:entry colname="col3">3.232</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">0.75</oasis:entry>
         <oasis:entry colname="col2">1.640</oasis:entry>
         <oasis:entry colname="col3">2.997</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">0.65</oasis:entry>
         <oasis:entry colname="col2">1.523</oasis:entry>
         <oasis:entry colname="col3">2.791</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">0.50</oasis:entry>
         <oasis:entry colname="col2">1.382</oasis:entry>
         <oasis:entry colname="col3">2.524</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">0.40</oasis:entry>
         <oasis:entry colname="col2">1.305</oasis:entry>
         <oasis:entry colname="col3">2.368</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">0.30</oasis:entry>
         <oasis:entry colname="col2">1.237</oasis:entry>
         <oasis:entry colname="col3">2.225</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e10913">Table <xref ref-type="table" rid="T7"/> compares predicted <inline-formula><mml:math id="M518" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> values for toluene and <inline-formula><mml:math id="M519" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene systems at two water activity levels often used in the estimation of diameter growth factors and hygroscopicity parameters from field observations. The <inline-formula><mml:math id="M520" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA exhibits higher <inline-formula><mml:math id="M521" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> values, indicating greater hygroscopicity than toluene SOA, yet both SOA types are relatively low in hygroscopicity with <inline-formula><mml:math id="M522" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> values of less than 0.1, a value often assumed as representative of the organic aerosol fraction in aged tropospheric particles <xref ref-type="bibr" rid="bib1.bibx48" id="paren.66"><named-content content-type="pre">e.g.,</named-content></xref>. As an example, Fig. S2 shows the speciated SOA mass concentrations predicted for the toluene SOA system at different water activities when using surrogate components derived from the mass-weighted medoid method for a <inline-formula><mml:math id="M523" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> grid resolution. The water activity (or equilibrium RH) has only a weak influence on the predicted total SOA mass concentration since the hygroscopicity of the SOA is relatively low, leading to a weak feedback on the partitioning of semivolatile organics due to water uptake.</p>

<table-wrap id="T7"><label>Table 7</label><caption><p id="d2e10974">Comparison of predicted SOA hygroscopicity parameters for the toluene SOA system (<inline-formula><mml:math id="M524" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">κ</mml:mi><mml:mtext>Tol</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>) and the <inline-formula><mml:math id="M525" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA system (<inline-formula><mml:math id="M526" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">κ</mml:mi><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mi mathvariant="normal">P</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) at two water activity levels. Surrogate selection and resolutions are as for Table <xref ref-type="table" rid="T6"/>.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="center"/>
     <oasis:colspec colnum="3" colname="col3" align="center"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Water activity</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M527" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">κ</mml:mi><mml:mtext>Tol</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M528" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">κ</mml:mi><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mi mathvariant="normal">P</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">0.90</oasis:entry>
         <oasis:entry colname="col2">0.038</oasis:entry>
         <oasis:entry colname="col3">0.061</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">0.85</oasis:entry>
         <oasis:entry colname="col2">0.049</oasis:entry>
         <oasis:entry colname="col3">0.072</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<sec id="Ch1.S3.SS3.SSSx1" specific-use="unnumbered">
  <title>Analysis of the polarity axes and surrogate selection methods</title>
      <p id="d2e11091">We compare three choices for the polarity axis of our 2D lumping framework: ACR, <inline-formula><mml:math id="M529" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio, and <inline-formula><mml:math id="M530" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>. Figure <xref ref-type="fig" rid="F11"/> shows examples of the 2D representations of the <inline-formula><mml:math id="M531" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene-derived components when either the <inline-formula><mml:math id="M532" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio or the mean carbon oxidation state are selected as polarity axis. These polarity metrics show a surprisingly large degree of variation in representing the polarity-related molecular properties, as demonstrated in Fig. <xref ref-type="fig" rid="F12"/> for the components from the toluene SOA system. While <inline-formula><mml:math id="M533" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio and <inline-formula><mml:math id="M534" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> serve as well-established polarity proxies, the ACR captures additional functional-group-level information, resulting in a wide spread of ACR values for a given <inline-formula><mml:math id="M535" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio or <inline-formula><mml:math id="M536" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, e.g. compare the spread at <inline-formula><mml:math id="M537" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> in Fig. <xref ref-type="fig" rid="F12"/>. This demonstrates that compounds with similar <inline-formula><mml:math id="M538" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> can have distinctly different relative affinities for water, as expressed by their predicted ACR.</p>

      <fig id="F11" specific-use="star"><label>Figure 11</label><caption><p id="d2e11238">2D space representations of the <inline-formula><mml:math id="M539" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA system at 298 <inline-formula><mml:math id="M540" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">K</mml:mi></mml:mrow></mml:math></inline-formula> using <bold>(a, c)</bold> the <inline-formula><mml:math id="M541" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio or <bold>(b, d)</bold> the mean carbon oxidation state as polarity axis. A <inline-formula><mml:math id="M542" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> grid is shown in case of the full SOA system <bold>(a, b)</bold> as well as its use for the surrogate selection by the weighted medoid method <bold>(c, d)</bold>.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026-f11.png"/>

          </fig>

      <fig id="F12" specific-use="star"><label>Figure 12</label><caption><p id="d2e11301">Comparison of three polarity axis metrics: <inline-formula><mml:math id="M543" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, ACR and <inline-formula><mml:math id="M544" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio for all components of the toluene SOA system. Both <inline-formula><mml:math id="M545" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> and ACR account for the impact of nitrogen-containing functionalities in a molecule on the respective metric, while the <inline-formula><mml:math id="M546" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio does not.</p></caption>
            <graphic xlink:href="https://gmd.copernicus.org/articles/19/4601/2026/gmd-19-4601-2026-f12.png"/>

          </fig>

      <p id="d2e11365">The accuracy of the grid-based and clustering approaches, as measured by MPE and MAPE, varies notably depending on the surrogate selection method and grid resolution chosen; refer to Tables <xref ref-type="table" rid="T2"/>–<xref ref-type="table" rid="T5"/>. This variability underscores the importance of selecting appropriate methods for specific modelling objectives. At high resolutions, e.g. <inline-formula><mml:math id="M547" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula>, the polarity axis choice has a modest impact, while it can be substantial at lower resolutions, especially in terms of predicted <inline-formula><mml:math id="M548" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> values when using the <inline-formula><mml:math id="M549" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> polarity axis paired with one of the non-weighted gridded surrogate selection methods, in which case <inline-formula><mml:math id="M550" display="inline"><mml:mrow><mml:mtext>MAPE</mml:mtext><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">50</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="italic">%</mml:mi></mml:mrow></mml:math></inline-formula> can occur.</p>
      <p id="d2e11419">In the case of the <inline-formula><mml:math id="M551" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene system (Tables <xref ref-type="table" rid="T2"/> and <xref ref-type="table" rid="T3"/>), the grid-based methods generally exhibit higher MAPE compared to the clustering-based <inline-formula><mml:math id="M552" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means approach, especially at lower resolutions. This trend is consistent across the evaluated metrics and resolutions, highlighting the potential limitations of the simpler grid-based techniques in accurately reducing and representing complex chemical systems. Higher resolutions typically yield lower MAPE, especially in case of the two mass-weighted surrogate selection methods, indicating improved accuracy and preference in applications.</p>
      <p id="d2e11440">Overall, the activity coefficient ratio as polarity metric choice slightly outperformed the <inline-formula><mml:math id="M553" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio and the <inline-formula><mml:math id="M554" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> in case of the systems studied, demonstrating its ability in representing polarity in the context of subsequent gas–particle partitioning impacts on aerosol mass and hygroscopicity. In the case of the <inline-formula><mml:math id="M555" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene system, the mass-weighted <inline-formula><mml:math id="M556" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means approach using ACR achieved impressively low MAPE values, ranging from 2.0 % (<inline-formula><mml:math id="M557" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> resolution) to 0.1 % (<inline-formula><mml:math id="M558" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> resolution). The toluene system also showed good performance of the mass-weighted <inline-formula><mml:math id="M559" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means approach with ACR, with MAPE values ranging from 6.0 % (<inline-formula><mml:math id="M560" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> resolution) to 4.3 % (<inline-formula><mml:math id="M561" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> resolution).</p>
      <p id="d2e11540">In a comparison of the different surrogate selection methods, the mass-weighted <inline-formula><mml:math id="M562" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method consistently scored as the most consistent and accurate across both systems. This is evident in case of the <inline-formula><mml:math id="M563" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA system when using the <inline-formula><mml:math id="M564" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio for polarity. The mass-weighted <inline-formula><mml:math id="M565" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method demonstrated superior performance with low MAPE values (0.2 % to 6.7 %) for predicted SOA mass concentrations across various system resolutions. In contrast, the midpoint method showed significant variability, with the MAPE ranging from 6.6 % to 51.2 %. The medoid and weighted medoid methods also displayed lower consistency with the MAPE reaching up to 39.8 %. In terms of the predicted hygroscopicity parameter of the <inline-formula><mml:math id="M566" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA, MAPE values spanned from 0.0 % (high resolution, ACR, <inline-formula><mml:math id="M567" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method) to 89.1 % (low resolution, <inline-formula><mml:math id="M568" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio, medoid method).</p>
      <p id="d2e11603">In the case of the toluene SOA system using the <inline-formula><mml:math id="M569" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio, <inline-formula><mml:math id="M570" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means performed again well and consistently accros different resolutions in terms of predicted SOA mass concentrations, with MAPE values ranging from 1.3 % to 4.9 %. The other surrogate selection methods showed higher variability, with MAPE values of up to 13 % in case of the midpoint method at <inline-formula><mml:math id="M571" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> grid resolution. Although, we note that a MAPE of 13 % could still be considered a good performance compared to uncertainties in field measurements of mass concentrations and aerosol composition. The predicted <inline-formula><mml:math id="M572" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> values for the toluene system showed a broader range of MAPE values, in the case of the <inline-formula><mml:math id="M573" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio axis ranging from 1.0 % (weighted medoid, highest resolution) to 143 % (midpoint method, lowest resolution) and substantial MPE variability (<inline-formula><mml:math id="M574" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">143</mml:mn></mml:mrow></mml:math></inline-formula> % to <inline-formula><mml:math id="M575" display="inline"><mml:mrow><mml:mo>+</mml:mo><mml:mn mathvariant="normal">131</mml:mn></mml:mrow></mml:math></inline-formula> %), indicating significant over- and under-predictions by different methods at the lowest grid resolution studied (<inline-formula><mml:math id="M576" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>).</p>
      <p id="d2e11689">When using the <inline-formula><mml:math id="M577" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> metric, the mass-weighted <inline-formula><mml:math id="M578" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method again outperformed alternative approaches at most resolutions. In case of the <inline-formula><mml:math id="M579" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA system, <inline-formula><mml:math id="M580" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means resulted in MAPE values ranging from 0.9 % to 1.7 % for predicted SOA mass concentrations across various grid resolutions, approximately as accurate as with the ACR polarity axis. The other surrogate selection methods exhibited higher variability, with the midpoint method showing MAPE values as high as 71.4 % for the <inline-formula><mml:math id="M581" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> grid resolution. For the predicted <inline-formula><mml:math id="M582" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> values, MAPE ranges from 2.5 % (<inline-formula><mml:math id="M583" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means) to 36 % (midpoint method).</p>
      <p id="d2e11756">In the case of the toluene SOA system, using <inline-formula><mml:math id="M584" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, the <inline-formula><mml:math id="M585" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method continued to perform well and consistently, with a MAPE between 0.8 % and 4.7 % for predicted SOA mass concentrations. The other methods showed relatively good performance as well, with a maximum MAPE of 11 % (weighted medoid method, <inline-formula><mml:math id="M586" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> grid resolution). In terms of the predicted SOA <inline-formula><mml:math id="M587" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula>, MAPE values ranged from 8.5 % to 90 %, and MPE values spanned from <inline-formula><mml:math id="M588" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">49</mml:mn></mml:mrow></mml:math></inline-formula> % (<inline-formula><mml:math id="M589" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means, <inline-formula><mml:math id="M590" display="inline"><mml:mrow><mml:mn mathvariant="normal">6</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula> resolution) to 90 % (medoid, <inline-formula><mml:math id="M591" display="inline"><mml:mrow><mml:mn mathvariant="normal">8</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula> resolution), again showing higher variability in case of this metric, even when using the mass-weighted <inline-formula><mml:math id="M592" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method (MAPE from 10 % to 70 %). In comparison, when using ACR as polarity axis, the <inline-formula><mml:math id="M593" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method achieves a worst MAPE for <inline-formula><mml:math id="M594" display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> of 31 % for the <inline-formula><mml:math id="M595" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> resolution – approximately equivalent to the MAPE of 30 % for an <inline-formula><mml:math id="M596" display="inline"><mml:mrow><mml:mn mathvariant="normal">8</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula> resolution when using <inline-formula><mml:math id="M597" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> for polarity. This is also consistent with ACR leading to better <inline-formula><mml:math id="M598" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means performance compared to <inline-formula><mml:math id="M599" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> in case of the <inline-formula><mml:math id="M600" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene SOA system.</p>
</sec>
</sec>
</sec>
<sec id="Ch1.S4" sec-type="conclusions">
  <label>4</label><title>Conclusions</title>
      <p id="d2e11942">This study introduces a novel chain of computational tools to advance the prediction of organic aerosol formation, which can be applied to chemical product evolution predictions from near-explicit gas phase mechanisms. This will allow the community to approach atmospheric chemistry modelling of organics and related aerosol composition and mass concentrations at a highly detailed level. A level that is only accessible with the aid of automatic product classification methods. By integrating structure- and system-level tools, our work paves the way for gas–particle partitioning modelling involving large numbers of aerosol components, a challenge that has long hindered highly detailed system studies on SOA formation and composition.</p>
      <p id="d2e11945">The developed suite of automated tools, including the SMILES to AIOMFAC Subgroups tool (S2AS), enhances processing capabilities for complex chemical systems, as demonstrated for compounds resulting from <inline-formula><mml:math id="M601" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene ozonolysis or toluene oxidation. The introduction of a scalable 2D framework with grid-based and cluster-based surrogate selection methods, represents a major advancement in simplifying complex chemical systems while maintaining controllable accuracy in SOA partitioning predictions and retaining structural information (of the surrogates).</p>
      <p id="d2e11955">The evaluation of the framework's applications to two SOA systems provided information about the effects of polarity axis choices and system resolution. We examined the impact of choosing relatively low to moderate grid resolutions (<inline-formula><mml:math id="M602" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M603" display="inline"><mml:mrow><mml:mn mathvariant="normal">6</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M604" display="inline"><mml:mrow><mml:mn mathvariant="normal">8</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M605" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula>) in surrogate representations of gas–particle systems. As expected, higher grid resolutions (or higher number of <inline-formula><mml:math id="M606" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clusters) generally yield better accuracy at the cost of increased computational cost for the gas–particle partitioning computation step. By reducing the number of species from hundreds or thousands to just <inline-formula><mml:math id="M607" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">32</mml:mn></mml:mrow></mml:math></inline-formula> surrogate components, substantial computational efficiency gains were achieved without compromising accuracy in SOA mass predictions substantially. Based on the two example systems discussed, we recommend grid resolutions of at least <inline-formula><mml:math id="M608" display="inline"><mml:mrow><mml:mn mathvariant="normal">8</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula> to maintain expected prediction errors within a 10 % threshold. More generally, for reference offline gas–particle partitioning computations in which computational costs are secondary concern, we recommend choosing the highest grid or cluster resolution feasible for the use case, up to about 200 surrogate components. Beyond 200 surrogate components, the cost versus accuracy trade-off will likely result in diminishing returns in terms of improvements in predicted (aerosol) system properties.</p>
      <p id="d2e12036">Our quantitative analysis highlights the effectiveness of the mass-weighted <inline-formula><mml:math id="M609" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method. When it is applied with the ACR polarity representation, good performance is achieved even at relatively coarse resolutions of an organic aerosol system. The choice of polarity axis metric impacted the accuracy and consistency of SOA system representation. Compared to the <inline-formula><mml:math id="M610" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> ratio and <inline-formula><mml:math id="M611" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mtext>OS</mml:mtext><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mrow class="chem"><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> proxies for polarity, the ACR metric is particularly effective since it encodes more information about a components structure and mixing properties in aqueous SOA. The <inline-formula><mml:math id="M612" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means based medoid approach paired with the ACR metric emerged as the most effective 2D lumping and surrogate selection method among those evaluated. While it may be possible to improve the grid-based surrogate selection methods, e.g. by introducing variable-resolution grids with higher resolution in the semivolatile range, the strong performance of the <inline-formula><mml:math id="M613" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means method renders such improvements unnecessary.</p>
      <p id="d2e12088">We envision a few distinct options for future applications of this framework in different kinds of atmospheric chemistry models. (1) Within detailed chemical box or plume models, those that consider a large number of compounds and retain their molecular structure information, the computation of surrogates and subsequent gas–particle partitioning at each desired (output) time step may be the preferred option. (2) Alternatively, based on a separate offline calculation for a specific system, a fixed set of surrogate compounds could be determined with the 2D framework. Subsequently, at each time step, existing and newly formed compounds from the box model's chemical mechanism could be mapped to this conserved, predetermined set of surrogates using the closest normalized Euclidean distance to the various surrogates in the 2D space (similar to Eq. <xref ref-type="disp-formula" rid="Ch1.E6"/>) to determine the surrogate to which a compound's mass will be lumped. (3) In the case of simplified chemical mechanisms, such as those often employed in large-scale chemical transport models, maintaining only a few organic aerosol surrogates or a 1D/2D VBS representation, the application differs since surrogate lumping during simulations is unnecessary. In that case, the 2D framework could serve in systematically generating sets of surrogate components after mechanism simulations (e.g. with GECKO-A) for targeted aerosol precursors (structure-resolved) or aid in generating 2D VBS bin-resolved (structure-agnostic) representations at desired polarity–volatility resolutions. In the latter case, the 2D lumping step may serve in assigning surrogates in the ACR vs. <inline-formula><mml:math id="M614" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> space and in translating the resulting surrogate mass concentrations into bin-based mass concentrations, e.g. in the <inline-formula><mml:math id="M615" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">O</mml:mi><mml:mo>:</mml:mo><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> vs. <inline-formula><mml:math id="M616" display="inline"><mml:mrow><mml:mi>C</mml:mi><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> coordinate space. In the case of atmospheric chemistry models that retain the molecular structure information of surrogates, we envision two options for invoking equilibrium gas–particle partitioning calculations during simulations. (i) Applying the gas–particle partitioning calculation offline at specific output times during a simulation while running the gas-phase chemical mechanism as if all material remained in the gas phase (no feedback from partitioning). (ii) Running the 2D lumping framework and the gas–particle partitioning method at every simulation time step, followed by treating the determined fractional surrogate amounts partitioned to the particle phase as partially or fully shielded from further gas-phase chemical reactions. The gas-phase fraction of a surrogate would then be applied to the list of associated compounds, updating their molecular gas-phase concentrations prior to the next chemical reaction step in the simulation. Optionally, reactions in the condensed phase could be treated separately by a distinct mechanism.</p>
      <p id="d2e12125">A computationally effective use of near-explicit gas-phase chemical mechanisms in atmospheric chemistry models benefits often from a tunable reduction in the complexity of the mechanism itself, both in terms of number of explicit species and number of reactions covered. Methods such as the GENerator of reduced Organic Aerosol mechanism (GENOA) <xref ref-type="bibr" rid="bib1.bibx64" id="paren.67"/> and the Automated MOdel REduction (AMORE) algorithm based on graph theory <xref ref-type="bibr" rid="bib1.bibx67" id="paren.68"/> serve this purpose. When targeting SOA formation applications, AMORE v2.0 employs a 2D categorization based on the saturation vapour pressures and Henry's law constants of organic components, which is similar to the polarity–volatility space of our 2D framework. Further development of such rule-based mechanism reduction methods may therefore benefit from considering also our 2D framework for potential application in compound classification.</p>
      <p id="d2e12134">In conclusion, the introduced computational framework will aid in bridging the wide gap between detailed, molecular-level reaction and simulation mechanisms and the computationally constrained, much simpler aerosol schemes of regional and large-scale atmospheric models. It provides automated tools and lumping techniques for generating reduced-complexity representations of aerosol properties and gas–particle partitioning inputs in a systematic and objective manner. Furthermore, the scalability of the 2D framework allows researchers to adjust the level of detail based on specific research needs while being mindful of computational constraints. Alongside with detailed chemical mechanisms, such as GECKO-A, future work could make use of the introduced tools in simulations of specific environmental chamber studies on aerosol formation from known precursors – and to represent related gas–particle systems by a tuneable selection of structure-resolved surrogates.</p>
</sec>

      
      </body>
    <back><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d2e12142">The current Python code of the S2AS model and related documentation are available via an online code repository (<uri>https://github.com/andizuend/S2AS__SMILES_to_AIOMFAC</uri>, last access: 25 May 2026) under the GNU General Public License v3.0. The exact version of the S2AS model (v1.0) applied to produce the results used in this article is archived on Zenodo under <ext-link xlink:href="https://doi.org/10.5281/zenodo.18968164" ext-link-type="DOI">10.5281/zenodo.18968164</ext-link> <xref ref-type="bibr" rid="bib1.bibx3" id="paren.69"/>. The current Fortran code of the 2D polarity–volatility framework as well as an associated plotting program and documentation are available via an online repository  (<uri>https://github.com/andizuend/2D_Polarity_Volatility_lumping</uri>, last access: 25 May 2026) under the GNU General Public License v3.0. The exact version of this framework (v1.0) applied to produce the results used in this article is archived on Zenodo under <ext-link xlink:href="https://doi.org/10.5281/zenodo.18968224" ext-link-type="DOI">10.5281/zenodo.18968224</ext-link> <xref ref-type="bibr" rid="bib1.bibx2" id="paren.70"/>. The UManSysProp code (v1.0) by <xref ref-type="bibr" rid="bib1.bibx59" id="text.71"/> is available via an online code repository (<uri>https://github.com/loftytopping/UManSysProp_public</uri>, last access: 25 May 2026). The specific version of the used UManSysProp code, including the adaptations for temperature-dependent pure-component vapour pressure parameterizations used in this work, is archived on Zenodo under <ext-link xlink:href="https://doi.org/10.5281/zenodo.17172675" ext-link-type="DOI">10.5281/zenodo.17172675</ext-link> <xref ref-type="bibr" rid="bib1.bibx74" id="paren.72"/>. The Master Chemical Mechanism (v3.3.1) <xref ref-type="bibr" rid="bib1.bibx29 bib1.bibx50 bib1.bibx30" id="paren.73"/> and the related AtChem online box model are available online via <uri>https://mcm.york.ac.uk/MCM/</uri> (last access: 18 September 2025). Predicted SOA mass concentrations and hygroscopicity parameters for various surrogate methods and polarity metrics used in this article are summarized in the electronic Supplement. The data underlying the shown figures and tables, as well as related output from the property prediction tools and the 2D lumping framework, are archived on Zenodo under <ext-link xlink:href="https://doi.org/10.5281/zenodo.17088390" ext-link-type="DOI">10.5281/zenodo.17088390</ext-link> <xref ref-type="bibr" rid="bib1.bibx4" id="paren.74"/>.</p>
  </notes><app-group>
        <supplementary-material position="anchor"><p id="d2e12189">The supplement related to this article is available online at <inline-supplementary-material xlink:href="https://doi.org/10.5194/gmd-19-4601-2026-supplement" xlink:title="pdf">https://doi.org/10.5194/gmd-19-4601-2026-supplement</inline-supplementary-material>.</p></supplementary-material>
        </app-group><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e12198">DAA and AZ conceptualized the project. DAA and AZ developed the code of the S2AS property prediction tool and the lumping framework. DAA carried out the MCM simulations. DHB carried out the GECKO-A simulations. DHB modified the vapour pressure estimation tool. DAA carried out the lumping framework data analysis with input by AZ. DAA and AZ co-wrote the manuscript with contributions by DHB.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e12204">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e12210">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e12216">We extend our thanks to Bernard Aumont for providing output from a GECKO-A simulation, supporting initial tests of our software tools.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e12221">This research has been supported by the Natural Sciences and Engineering Research Council of Canada (grant nos. RGPIN/04315-2014 and RGPIN-2021-02688) and the government of Canada through the Federal Department of Environment and Climate Change Canada (grant no. GCXE26S058).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e12227">This paper was edited by Christoph Knote and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Allen et al.(2016)Allen, Pon, Greiner, and Wishart</label><mixed-citation> Allen, F., Pon, A., Greiner, R., and Wishart, D.: Computational prediction of electron ionization mass spectra to assist in GC/MS compound identification, Anal. Chem., 88, 7689–7697, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Amaladhasan and Zuend(2026a)</label><mixed-citation>Amaladhasan, D. A. and Zuend, A.: 2D polarity–volatility lumping framework, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.18968224" ext-link-type="DOI">10.5281/zenodo.18968224</ext-link>, 2026a.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Amaladhasan and Zuend(2026b)</label><mixed-citation>Amaladhasan, D. A. and Zuend, A.: SMILES to AIOMFAC subgroups (S2AS) tool, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.18968164" ext-link-type="DOI">10.5281/zenodo.18968164</ext-link>, 2026b.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Amaladhasan et al.(2026)Amaladhasan, Zuend, and Hassan-Barthaux</label><mixed-citation>Amaladhasan, D. A., Zuend, A., and Hassan-Barthaux, D.: Alpha-pinene and Toluene SOA system data used in Amaladhasan et al. for 2D lumping, Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.17088390" ext-link-type="DOI">10.5281/zenodo.17088390</ext-link>, 2026.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Armeli et al.(2023)Armeli, Peters, and Koop</label><mixed-citation>Armeli, G., Peters, J.-H., and Koop, T.: Machine-Learning-Based Prediction of the Glass Transition Temperature of Organic Compounds Using Experimental Data, ACS Omega, 8, 12298–12309, <ext-link xlink:href="https://doi.org/10.1021/acsomega.2c08146" ext-link-type="DOI">10.1021/acsomega.2c08146</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Aumont et al.(2005)Aumont, Szopa, and Madronich</label><mixed-citation>Aumont, B., Szopa, S., and Madronich, S.: Modelling the evolution of organic carbon during its gas-phase tropospheric oxidation: development of an explicit model based on a self generating approach, Atmos. Chem. Phys., 5, 2497–2517, <ext-link xlink:href="https://doi.org/10.5194/acp-5-2497-2005" ext-link-type="DOI">10.5194/acp-5-2497-2005</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Barley and McFiggans(2010)</label><mixed-citation>Barley, M. H. and McFiggans, G.: The critical assessment of vapour pressure estimation methods for use in modelling the formation of atmospheric organic aerosol, Atmos. Chem. Phys., 10, 749–767, <ext-link xlink:href="https://doi.org/10.5194/acp-10-749-2010" ext-link-type="DOI">10.5194/acp-10-749-2010</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Bertram et al.(2011)Bertram, Martin, Hanna, Smith, Bodsworth, Chen, Kuwata, Liu, You, and Zorn</label><mixed-citation>Bertram, A. K., Martin, S. T., Hanna, S. J., Smith, M. L., Bodsworth, A., Chen, Q., Kuwata, M., Liu, A., You, Y., and Zorn, S. R.: Predicting the relative humidities of liquid-liquid phase separation, efflorescence, and deliquescence of mixed particles of ammonium sulfate, organic material, and water using the organic-to-sulfate mass ratio of the particle and the oxygen-to-carbon elemental ratio of the organic component, Atmos. Chem. Phys., 11, 10995–11006, <ext-link xlink:href="https://doi.org/10.5194/acp-11-10995-2011" ext-link-type="DOI">10.5194/acp-11-10995-2011</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Bilde et al.(2015)Bilde, Barsanti, Booth, Cappa, Donahue, Emanuelsson, McFiggans, Krieger, Marcolli, Topping, Ziemann, Barley, Clegg, Dennis-Smither, Hallquist, Hallquist, Khlystov, Kulmala, Mogensen, Percival, Pope, Reid, Ribeiro da Silva, Rosenoern, Salo, Soonsin, Yli-Juuti, Prisle, Pagels, Rarey, Zardini, and Riipinen</label><mixed-citation>Bilde, M., Barsanti, K., Booth, M., Cappa, C. D., Donahue, N. M., Emanuelsson, E. U., McFiggans, G., Krieger, U. K., Marcolli, C., Topping, D., Ziemann, P., Barley, M., Clegg, S., Dennis-Smither, B., Hallquist, M., Hallquist, A. M., Khlystov, A., Kulmala, M., Mogensen, D., Percival, C. J., Pope, F., Reid, J. P., Ribeiro da Silva, M. A. V., Rosenoern, T., Salo, K., Soonsin, V. P., Yli-Juuti, T., Prisle, N. L., Pagels, J., Rarey, J., Zardini, A. A., and Riipinen, I.: Saturation Vapor Pressures and Transition Enthalpies of Low-Volatility Organic Molecules of Atmospheric Relevance: From Dicarboxylic Acids to Complex Mixtures, Chem. Rev., 115, 4115–4156, <ext-link xlink:href="https://doi.org/10.1021/cr5005502" ext-link-type="DOI">10.1021/cr5005502</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Burkardt(2008)</label><mixed-citation>Burkardt, J.: ASA058 the K-Means Problem, <uri>https://people.math.sc.edu/Burkardt/f_src/asa058/asa058.html</uri> (last access: 8 August 2025), 2008.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Byun et al.(1999)Byun, Young, and Odman</label><mixed-citation>Byun, D. W., Young, J., and Odman, M. T.: Governing Equations and Computational Structure of the Community Multiscale Air Quality (CMAQ) Chemical Transport Model, Chap. 6, Science Algorithms of the EPA Models-3 Community Multiscale Air Quality (CMAQ) Modeling System, 6-1–6-41, <uri>https://www.cmascenter.org/cmaq/science_documentation/pdf/ch06.pdf</uri> (last access: 25 May 2026), 1999.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Chang and Pankow(2006)</label><mixed-citation>Chang, E. I. and Pankow, J. F.: Prediction of activity coefficients in liquid aerosol particles containing organic compounds, dissolved inorganic salts, and water – Part 2: Consideration of phase separation effects by an X-UNIFAC model, Atmos. Environ., 40, 6422–6436, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2006.04.031" ext-link-type="DOI">10.1016/j.atmosenv.2006.04.031</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Chang and Pankow(2010)</label><mixed-citation>Chang, E. I. and Pankow, J. F.: Organic particulate matter formation at varying relative humidity using surrogate secondary and primary organic compounds with activity corrections in the condensed phase obtained using a method based on the Wilson equation, Atmos. Chem. Phys., 10, 5475–5490, <ext-link xlink:href="https://doi.org/10.5194/acp-10-5475-2010" ext-link-type="DOI">10.5194/acp-10-5475-2010</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Compernolle et al.(2011)Compernolle, Ceulemans, and Müller</label><mixed-citation>Compernolle, S., Ceulemans, K., and Müller, J.-F.: EVAPORATION: a new vapour pressure estimation methodfor organic molecules including non-additivity and intramolecular interactions, Atmos. Chem. Phys., 11, 9431–9450, <ext-link xlink:href="https://doi.org/10.5194/acp-11-9431-2011" ext-link-type="DOI">10.5194/acp-11-9431-2011</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>DeRieux et al.(2018)DeRieux, Li, Lin, Laskin, Laskin, Bertram, Nizkorodov, and Shiraiwa</label><mixed-citation>DeRieux, W.-S. W., Li, Y., Lin, P., Laskin, J., Laskin, A., Bertram, A. K., Nizkorodov, S. A., and Shiraiwa, M.: Predicting the glass transition temperature and viscosity of secondary organic material using molecular composition, Atmos. Chem. Phys., 18, 6331–6351, <ext-link xlink:href="https://doi.org/10.5194/acp-18-6331-2018" ext-link-type="DOI">10.5194/acp-18-6331-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Donahue et al.(2006)Donahue, Robinson, Stanier, and Pandis</label><mixed-citation>Donahue, N. M., Robinson, A. L., Stanier, C. O., and Pandis, S. N.: Coupled Partitioning, Dilution, and Chemical Aging of Semivolatile Organics, Environ. Sci. Technol., 40, 2635–2643, <ext-link xlink:href="https://doi.org/10.1021/es052297c" ext-link-type="DOI">10.1021/es052297c</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Donahue et al.(2011)Donahue, Epstein, Pandis, and Robinson</label><mixed-citation>Donahue, N. M., Epstein, S. A., Pandis, S. N., and Robinson, A. L.: A two-dimensional volatility basis set: 1. organic-aerosol mixing thermodynamics, Atmos. Chem. Phys., 11, 3303–3318, <ext-link xlink:href="https://doi.org/10.5194/acp-11-3303-2011" ext-link-type="DOI">10.5194/acp-11-3303-2011</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Donahue et al.(2012)Donahue, Kroll, Pandis, and Robinson</label><mixed-citation>Donahue, N. M., Kroll, J. H., Pandis, S. N., and Robinson, A. L.: A two-dimensional volatility basis set – Part 2: Diagnostics of organic-aerosol evolution, Atmos. Chem. Phys., 12, 615–634, <ext-link xlink:href="https://doi.org/10.5194/acp-12-615-2012" ext-link-type="DOI">10.5194/acp-12-615-2012</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Ehrlich and Rarey(2012)</label><mixed-citation> Ehrlich, H.-C. and Rarey, M.: Systematic benchmark of substructure search in molecular graphs-From Ullmann to VF2, J. Cheminformatics, 4, 1–17, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Erdakos and Pankow(2004)</label><mixed-citation>Erdakos, G. B. and Pankow, J. F.: Gas/particle partitioning of neutral and ionizing compounds to single- and multi-phase aerosol particles. 2. Phase separation in liquid particulate matter containing both polar and low-polarity organic compounds, Atmos. Environ., 38, 1005–1013, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2003.10.038" ext-link-type="DOI">10.1016/j.atmosenv.2003.10.038</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Fredenslund et al.(1975)Fredenslund, Jones, and Prausnitz</label><mixed-citation> Fredenslund, A., Jones, R. L., and Prausnitz, J. M.: Group-Contribution Estimation of Activity Coefficients in Nonideal Liquid Mixtures, AIChE J., 21, 1086–1099, 1975.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Galeazzo and Shiraiwa(2022)</label><mixed-citation>Galeazzo, T. and Shiraiwa, M.: Predicting glass transition temperature and melting point of organic compounds via machine learning and molecular embeddings, Environmental Science: Atmospheres, 2, 362–374, <ext-link xlink:href="https://doi.org/10.1039/D1EA00090J" ext-link-type="DOI">10.1039/D1EA00090J</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Girolami(1994)</label><mixed-citation>Girolami, G. S.: A Simple “Back of the Envelope” Method for Estimating the Densities and Molecular Volumes of Liquids and Solids, J. Chem. Educ., 71, 962, <ext-link xlink:href="https://doi.org/10.1021/ed071p962" ext-link-type="DOI">10.1021/ed071p962</ext-link>, 1994.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Griffin et al.(2003)Griffin, Nguyen, Dabdub, and Seinfeld</label><mixed-citation> Griffin, R. J., Nguyen, K., Dabdub, D., and Seinfeld, J. H.: A Coupled Hydrophobic-Hydrophilic Model for Predicting Secondary Organic Aerosol Formation, J. Atmos. Chem., 44, 171–190, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Hallquist et al.(2009)Hallquist, Wenger, Baltensperger, Rudich, Simpson, Claeys, Dommen, Donahue, George, Goldstein, Hamilton, Herrmann, Hoffmann, Iinuma, Jang, Jenkin, Jimenez, Kiendler-Scharr, Maenhaut, McFiggans, Mentel, Monod, Prevot, Seinfeld, Surratt, Szmigielski, and Wildt</label><mixed-citation>Hallquist, M., Wenger, J. C., Baltensperger, U., Rudich, Y., Simpson, D., Claeys, M., Dommen, J., Donahue, N. M., George, C., Goldstein, A. H., Hamilton, J. F., Herrmann, H., Hoffmann, T., Iinuma, Y., Jang, M., Jenkin, M. E., Jimenez, J. L., Kiendler-Scharr, A., Maenhaut, W., McFiggans, G., Mentel, Th. F., Monod, A., Prévôt, A. S. H., Seinfeld, J. H., Surratt, J. D., Szmigielski, R., and Wildt, J.: The formation, properties and impact of secondary organic aerosol: current and emerging issues, Atmos. Chem. Phys., 9, 5155–5236, <ext-link xlink:href="https://doi.org/10.5194/acp-9-5155-2009" ext-link-type="DOI">10.5194/acp-9-5155-2009</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Hansen et al.(1991)Hansen, Rasmussen, Fredenslund, Schiller, and Gmehling</label><mixed-citation>Hansen, H. K., Rasmussen, P., Fredenslund, A., Schiller, M., and Gmehling, J.: Vapor–liquid equilibria by UNIFAC group contribution. 5. Revision and extension, Ind. Eng. Chem. Res., 30, 2352–2355, <ext-link xlink:href="https://doi.org/10.1021/ie00058a017" ext-link-type="DOI">10.1021/ie00058a017</ext-link>, 1991.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Hartigan and Wong(1979)</label><mixed-citation>Hartigan, J. A. and Wong, M. A.: Algorithm AS 136: A K-Means Clustering Algorithm, J. Roy. Stat. Soc. C-App., 28, 100–108, <ext-link xlink:href="https://doi.org/10.2307/2346830" ext-link-type="DOI">10.2307/2346830</ext-link>, 1979.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Huang et al.(2021)Huang, Mahrt, Xu, Shiraiwa, Zuend, and Bertram</label><mixed-citation>Huang, Y., Mahrt, F., Xu, S., Shiraiwa, M., Zuend, A., and Bertram, A. K.: Coexistence of three liquid phases in individual atmospheric aerosol particles, P. Natl. Acad. Sci. USA, 118, e2102512118, <ext-link xlink:href="https://doi.org/10.1073/pnas.2102512118" ext-link-type="DOI">10.1073/pnas.2102512118</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Jenkin et al.(1997)Jenkin, Saunders, and Pilling</label><mixed-citation>Jenkin, M. E., Saunders, S. M., and Pilling, M. J.: The tropospheric degradation of volatile organic compounds: A protocol for mechanism development, Atmos. Environ., 31, 81–104, <ext-link xlink:href="https://doi.org/10.1016/S1352-2310(96)00105-7" ext-link-type="DOI">10.1016/S1352-2310(96)00105-7</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Jenkin et al.(2003)Jenkin, Saunders, Wagner, and Pilling</label><mixed-citation>Jenkin, M. E., Saunders, S. M., Wagner, V., and Pilling, M. J.: Protocol for the development of the Master Chemical Mechanism, MCM v3 (Part B): tropospheric degradation of aromatic volatile organic compounds, Atmos. Chem. Phys., 3, 181–193, <ext-link xlink:href="https://doi.org/10.5194/acp-3-181-2003" ext-link-type="DOI">10.5194/acp-3-181-2003</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Jimenez et al.(2009)Jimenez, Canagaratna, Donahue, Prevot, Zhang, Kroll, DeCarlo, Allan, Coe, Ng, Aiken, Docherty, Ulbrich, Grieshop, Robinson, Duplissy, Smith, Wilson, Lanz, Hueglin, Sun, Tian, Laaksonen, Raatikainen, Rautiainen, Vaattovaara, Ehn, Kulmala, Tomlinson, Collins, Cubison, Dunlea, Huffman, Onasch, Alfarra, Williams, Bower, Kondo, Schneider, Drewnick, Borrmann, Weimer, Demerjian, Salcedo, Cottrell, Griffin, Takami, Miyoshi, Hatakeyama, Shimono, Sun, Zhang, Dzepina, Kimmel, Sueper, Jayne, Herndon, Trimborn, Williams, Wood, Middlebrook, Kolb, Baltensperger, and Worsnop</label><mixed-citation>Jimenez, J. L., Canagaratna, M. R., Donahue, N. M., Prevot, A. S. H., Zhang, Q., Kroll, J. H., DeCarlo, P. F., Allan, J. D., Coe, H., Ng, N. L., Aiken, A. C., Docherty, K. S., Ulbrich, I. M., Grieshop, A. P., Robinson, A. L., Duplissy, J., Smith, J. D., Wilson, K. R., Lanz, V. A., Hueglin, C., Sun, Y. L., Tian, J., Laaksonen, A., Raatikainen, T., Rautiainen, J., Vaattovaara, P., Ehn, M., Kulmala, M., Tomlinson, J. M., Collins, D. R., Cubison, M. J., Dunlea, E. J., Huffman, J. A., Onasch, T. B., Alfarra, M. R., Williams, P. I., Bower, K., Kondo, Y., Schneider, J., Drewnick, F., Borrmann, S., Weimer, S., Demerjian, K., Salcedo, D., Cottrell, L., Griffin, R., Takami, A., Miyoshi, T., Hatakeyama, S., Shimono, A., Sun, J. Y., Zhang, Y. M., Dzepina, K., Kimmel, J. R., Sueper, D., Jayne, J. T., Herndon, S. C., Trimborn, A. M., Williams, L. R., Wood, E. C., Middlebrook, A. M., Kolb, C. E., Baltensperger, U., and Worsnop, D. R.: Evolution of Organic Aerosols in the Atmosphere, Science, 326, 1525–1529, <ext-link xlink:href="https://doi.org/10.1126/science.1180353" ext-link-type="DOI">10.1126/science.1180353</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Kamlet et al.(1988)Kamlet, Doherty, Abraham, Marcus, and Taft</label><mixed-citation>Kamlet, M. J., Doherty, R. M., Abraham, M. H., Marcus, Y., and Taft, R. W.: Linear solvation energy relationship. 46. An improved equation for correlation and prediction of octanol/water partition coefficients of organic nonelectrolytes (including strong hydrogen bond donor solutes), J. Phys. Chem., 92, 5244–5255, <ext-link xlink:href="https://doi.org/10.1021/j100329a035" ext-link-type="DOI">10.1021/j100329a035</ext-link>, 1988.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Kroll et al.(2011)Kroll, Donahue, Jimenez, Kessler, Canagaratna, Wilson, Altieri, Mazzoleni, Wozniak, Bluhm, Mysak, Smith, Kolb, and Worsnop</label><mixed-citation>Kroll, J. H., Donahue, N. M., Jimenez, J. L., Kessler, S. H., Canagaratna, M. R., Wilson, K. R., Altieri, K. E., Mazzoleni, L. R., Wozniak, A. S., Bluhm, H., Mysak, E. R., Smith, J. D., Kolb, C. E., and Worsnop, D. R.: Carbon oxidation state as a metric for describing the chemistry of atmospheric organic aerosol, Nat. Chem., 3, 133–139, <ext-link xlink:href="https://doi.org/10.1038/nchem.948" ext-link-type="DOI">10.1038/nchem.948</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Landrum(2013)</label><mixed-citation>Landrum, G.: RDKit: A software suite for cheminformatics, computational chemistry, and predictive modeling, <uri>https://www.rdkit.org</uri> (last access: 25 May 2026), 2013.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Lannuque et al.(2021)Lannuque, D’Anna, Couvidat, Valorso, and Sartelet</label><mixed-citation>Lannuque, V., D’Anna, B., Couvidat, F., Valorso, R., and Sartelet, K.: Improvement in Modeling of OH and <inline-formula><mml:math id="M617" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">HO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> Radical Concentrations during Toluene and Xylene Oxidation with RACM2 Using MCM/GECKO-A, Atmosphere-Basel, 12, <ext-link xlink:href="https://doi.org/10.3390/atmos12060732" ext-link-type="DOI">10.3390/atmos12060732</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Marcolli and Peter(2005)</label><mixed-citation>Marcolli, C. and Peter, Th.: Water activity in polyol/water systems: new UNIFAC parameterization, Atmos. Chem. Phys., 5, 1545–1555, <ext-link xlink:href="https://doi.org/10.5194/acp-5-1545-2005" ext-link-type="DOI">10.5194/acp-5-1545-2005</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Mouchel-Vallon et al.(2020)Mouchel-Vallon, Lee-Taylor, Hodzic, Artaxo, Aumont, Camredon, Gurarie, Jimenez, Lenschow, Martin et al.</label><mixed-citation>Mouchel-Vallon, C., Lee-Taylor, J., Hodzic, A., Artaxo, P., Aumont, B., Camredon, M., Gurarie, D., Jimenez, J.-L., Lenschow, D. H., Martin, S. T., Nascimento, J., Orlando, J. J., Palm, B. B., Shilling, J. E., Shrivastava, M., and Madronich, S.: Exploration of oxidative chemistry and secondary organic aerosol formation in the Amazon during the wet season: explicit modeling of the Manaus urban plume with GECKO-A, Atmos. Chem. Phys., 20, 5995–6014, <ext-link xlink:href="https://doi.org/10.5194/acp-20-5995-2020" ext-link-type="DOI">10.5194/acp-20-5995-2020</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Nannoolal et al.(2008)Nannoolal, Rarey, and Ramjugernath</label><mixed-citation> Nannoolal, Y., Rarey, J., and Ramjugernath, D.: Estimation of pure component properties: Part 3. Estimation of the vapor pressure of non-electrolyte organic compounds via group contributions and group interactions, Fluid Phase Equilibr., 269, 117–133, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>O'Boyle et al.(2011)O'Boyle Jr., Humphrey, Pollack, Hawver, and Story</label><mixed-citation> O'Boyle Jr., E. H., Humphrey, R. H., Pollack, J. M., Hawver, T. H., and Story, P. A.: The relation between emotional intelligence and job performance: A meta-analysis, J. Organ. Behav., 32, 788–818, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>OEChem(2012)</label><mixed-citation>OEChem: OpenEye Scientific Software, Inc., Santa Fe, NM, USA, <uri>https://www.eyesopen.com/oechem-tk</uri> (last access: 25 May 2026), 2012.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>O'Meara et al.(2014)O'Meara, Booth, Barley, Topping, and McFiggans</label><mixed-citation>O'Meara, S., Booth, A. M., Barley, M. H., Topping, D., and McFiggans, G.: An assessment of vapour pressure estimation methods, Phys. Chem. Chem. Phys., 16, 19453–19469, <ext-link xlink:href="https://doi.org/10.1039/C4CP00857J" ext-link-type="DOI">10.1039/C4CP00857J</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Pankow(2003)</label><mixed-citation>Pankow, J. F.: Gas/particle partitioning of neutral and ionizing compounds to single and multi-phase aerosol particles. 1. Unified modeling framework, Atmos. Environ., 37, 3323–3333, <ext-link xlink:href="https://doi.org/10.1016/S1352-2310(03)00346-7" ext-link-type="DOI">10.1016/S1352-2310(03)00346-7</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Pankow and Asher(2008)</label><mixed-citation>Pankow, J. F. and Asher, W. E.: SIMPOL.1: a simple group contribution method for predicting vapor pressures and enthalpies of vaporization of multifunctional organic compounds, Atmos. Chem. Phys., 8, 2773–2796, <ext-link xlink:href="https://doi.org/10.5194/acp-8-2773-2008" ext-link-type="DOI">10.5194/acp-8-2773-2008</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Pankow and Barsanti(2009)</label><mixed-citation>Pankow, J. F. and Barsanti, C. K.: The carbon number-polarity grid: A means to manage the complexity of the mix of organic compounds when modeling atmospheric organic particulate matter, Atmos. Environ., 43, 2829–2835, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2008.12.050" ext-link-type="DOI">10.1016/j.atmosenv.2008.12.050</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx45"><label>Pankow and Chang(2008)</label><mixed-citation> Pankow, J. F. and Chang, E. I.: Variation in the sensitivity of predicted levels of atmospheric organic particulate matter (OPM), Environ. Sci. Technol., 42, 7321–7329, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx46"><label>Pavlov et al.(2011)Pavlov, Rybalkin, Karulin, Kozhevnikov, Savelyev, and Churinov</label><mixed-citation>Pavlov, D., Rybalkin, M., Karulin, B., Kozhevnikov, M., Savelyev, A., and Churinov, A.: Indigo: universal cheminformatics API, J. Cheminformatics, 3, <ext-link xlink:href="https://doi.org/10.1186/1758-2946-3-S1-P4" ext-link-type="DOI">10.1186/1758-2946-3-S1-P4</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx47"><label>Pun et al.(2002)Pun, Griffin, Seigneur, and Seinfeld</label><mixed-citation>Pun, B. K. L., Griffin, R. J., Seigneur, C., and Seinfeld, J. H.: Secondary organic aerosol – 2. Thermodynamic model for gas/particle partitioning of molecular constituents, J. Geophys. Res.-Atmos., 107, <ext-link xlink:href="https://doi.org/10.1029/2001JD000542" ext-link-type="DOI">10.1029/2001JD000542</ext-link>, 2002.</mixed-citation></ref>
      <ref id="bib1.bibx48"><label>Rastak et al.(2017)Rastak, Pajunoja, Acosta Navarro, Ma, Song, Partridge, Kirkevåg, Leong, Hu, Taylor, Lambe, Cerully, Bougiatioti, Liu, Krejci, Petäjä, Percival, Davidovits, Worsnop, Ekman, Nenes, Martin, Jimenez, Collins, Topping, Bertram, Zuend, Virtanen, and Riipinen</label><mixed-citation>Rastak, N., Pajunoja, A., Acosta Navarro, J. C., Ma, J., Song, M., Partridge, D. G., Kirkevåg, A., Leong, Y., Hu, W. W., Taylor, N. F., Lambe, A., Cerully, K., Bougiatioti, A., Liu, P., Krejci, R., Petäjä, T., Percival, C., Davidovits, P., Worsnop, D. R., Ekman, A. M. L., Nenes, A., Martin, S., Jimenez, J. L., Collins, D. R., Topping, D. O., Bertram, A. K., Zuend, A., Virtanen, A., and Riipinen, I.: Microphysical explanation of the RH-dependent water affinity of biogenic organic aerosol and its importance for climate, Geophys. Res. Lett., 44, 5167–5177, <ext-link xlink:href="https://doi.org/10.1002/2017GL073056" ext-link-type="DOI">10.1002/2017GL073056</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx49"><label>Ruggeri et al.(2016)Ruggeri, Bernhard, Henderson, and Takahama</label><mixed-citation>Ruggeri, G., Bernhard, F. A., Henderson, B. H., and Takahama, S.: Model–measurement comparison of functional group abundance in <inline-formula><mml:math id="M618" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-pinene and 1,3,5-trimethylbenzene secondary organic aerosol formation, Atmos. Chem. Phys., 16, 8729–8747, <ext-link xlink:href="https://doi.org/10.5194/acp-16-8729-2016" ext-link-type="DOI">10.5194/acp-16-8729-2016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx50"><label>Saunders et al.(2003)Saunders, Jenkin, Derwent, and Pilling</label><mixed-citation>Saunders, S. M., Jenkin, M. E., Derwent, R. G., and Pilling, M. J.: Protocol for the development of the Master Chemical Mechanism, MCM v3 (Part A): tropospheric degradation of non-aromatic volatile organic compounds, Atmos. Chem. Phys., 3, 161–180, <ext-link xlink:href="https://doi.org/10.5194/acp-3-161-2003" ext-link-type="DOI">10.5194/acp-3-161-2003</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx51"><label>Schervish and Shiraiwa(2023)</label><mixed-citation>Schervish, M. and Shiraiwa, M.: Impact of phase state and non-ideal mixing on equilibration timescales of secondary organic aerosol partitioning, Atmos. Chem. Phys., 23, 221–233, <ext-link xlink:href="https://doi.org/10.5194/acp-23-221-2023" ext-link-type="DOI">10.5194/acp-23-221-2023</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx52"><label>Schmedding and Zuend(2025)</label><mixed-citation>Schmedding, R. and Zuend, A.: The role of interfacial tension in the size-dependent phase separation of atmospheric aerosol particles, Atmos. Chem. Phys., 25, 327–346, <ext-link xlink:href="https://doi.org/10.5194/acp-25-327-2025" ext-link-type="DOI">10.5194/acp-25-327-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx53"><label>Schmedding et al.(2025)Schmedding, Franssen, and Zuend</label><mixed-citation>Schmedding, R., Franssen, M., and Zuend, A.: A Machine Learning Approach for Predicting the Pure-Component Surface Tension of Atmospherically Relevant Organic Compounds, ACS ES&amp;T Air, <ext-link xlink:href="https://doi.org/10.1021/acsestair.4c00291" ext-link-type="DOI">10.1021/acsestair.4c00291</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx54"><label>Semeniuk and Dastoor(2020)</label><mixed-citation>Semeniuk, K. and Dastoor, A.: Current State of Atmospheric Aerosol Thermodynamics and Mass Transfer Modeling: A Review, Atmosphere-Basel, 11, <ext-link xlink:href="https://doi.org/10.3390/atmos11020156" ext-link-type="DOI">10.3390/atmos11020156</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx55"><label>Smith et al.(2011)Smith, Kuwata, and Martin</label><mixed-citation>Smith, M. L., Kuwata, M., and Martin, S. T.: Secondary Organic Material Produced by the Dark Ozonolysis of <inline-formula><mml:math id="M619" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-Pinene Minimally Affects the Deliquescence and Efflorescence of Ammonium Sulfate, Aerosol Sci. Tech., 45, 244–261, <ext-link xlink:href="https://doi.org/10.1080/02786826.2010.532178" ext-link-type="DOI">10.1080/02786826.2010.532178</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx56"><label>Sommers et al.(2022)Sommers, Stroud, Adam, O'Brien, Brook, Hayden, Lee, Li, Liggio, Mihele, Mittermeier, Stevens, Wolde, Zuend, and Hayes</label><mixed-citation>Sommers, J. M., Stroud, C. A., Adam, M. G., O'Brien, J., Brook, J. R., Hayden, K., Lee, A. K. Y., Li, K., Liggio, J., Mihele, C., Mittermeier, R. L., Stevens, R. G., Wolde, M., Zuend, A., and Hayes, P. L.: Evaluating SOA formation from different sources of semi- and intermediate-volatility organic compounds from the Athabasca oil sands, Environmental Science: Atmospheres, 2, 469–490, <ext-link xlink:href="https://doi.org/10.1039/D1EA00053E" ext-link-type="DOI">10.1039/D1EA00053E</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx57"><label>Sparks(1973)</label><mixed-citation>Sparks, D. N.: Algorithm AS 58: Euclidean Cluster Analysis, J. Roy. Stat. Soc. C-App., 22, 126–130, <ext-link xlink:href="https://doi.org/10.2307/2346321" ext-link-type="DOI">10.2307/2346321</ext-link>, 1973.</mixed-citation></ref>
      <ref id="bib1.bibx58"><label>Steinbeck et al.(2003)Steinbeck, Han, Kuhn, Horlacher, Luttmann, and Willighagen</label><mixed-citation> Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., and Willighagen, E.: The Chemistry Development Kit (CDK): An open-source Java library for chemo-and bioinformatics, J. Chem. Inf. Comp. Sci., 43, 493–500, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx59"><label>Topping et al.(2016)Topping, Barley, Bane, Higham, Aumont, Dingle, and McFiggans</label><mixed-citation>Topping, D., Barley, M., Bane, M. K., Higham, N., Aumont, B., Dingle, N., and McFiggans, G.: UManSysProp v1.0: an online and open-source facility for molecular property prediction and atmospheric aerosol calculations, Geosci. Model Dev., 9, 899–914, <ext-link xlink:href="https://doi.org/10.5194/gmd-9-899-2016" ext-link-type="DOI">10.5194/gmd-9-899-2016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx60"><label>Topping and Bane(2022)</label><mixed-citation>Topping, D. L. and Bane, M. (Eds.): Introduction to aerosol modelling: From theory to code, John Wiley &amp; Sons, <ext-link xlink:href="https://doi.org/10.1002/9781119625728" ext-link-type="DOI">10.1002/9781119625728</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx61"><label>Topping et al.(2007)Topping, McFiggans, Kiss, Varga, Facchini, Decesari, and Mircea</label><mixed-citation>Topping, D. O., McFiggans, G. B., Kiss, G., Varga, Z., Facchini, M. C., Decesari, S., and Mircea, M.: Surface tensions of multi-component mixed inorganic/organic aqueous systems of atmospheric significance: measurements, model predictions and importance for cloud activation predictions, Atmos. Chem. Phys., 7, 2371–2398, <ext-link xlink:href="https://doi.org/10.5194/acp-7-2371-2007" ext-link-type="DOI">10.5194/acp-7-2371-2007</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx62"><label>Toropov et al.(2008)Toropov, Rasulev, Leszczynska, and Leszczynski</label><mixed-citation> Toropov, A. A., Rasulev, B. F., Leszczynska, D., and Leszczynski, J.: Multiplicative SMILES-based optimal descriptors: QSPR modeling of fullerene C60 solubility in organic solvents, Chem. Phys. Lett., 457, 332–336, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx63"><label>Tulet et al.(2006)Tulet, Grini, Griffin, and Petitcol</label><mixed-citation>Tulet, P., Grini, A., Griffin, R. J., and Petitcol, S.: ORILAM-SOA: A computationally efficient model for predicting secondary organic aerosols in three-dimensional atmospheric models, J. Geophys. Res., 111, <ext-link xlink:href="https://doi.org/10.1029/2006JD007152" ext-link-type="DOI">10.1029/2006JD007152</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx64"><label>Wang et al.(2022)Wang, Couvidat, and Sartelet</label><mixed-citation>Wang, Z., Couvidat, F., and Sartelet, K.: GENerator of reduced Organic Aerosol mechanism (GENOA v1.0): an automatic generation tool of semi-explicit mechanisms, Geosci. Model Dev., 15, 8957–8982, <ext-link xlink:href="https://doi.org/10.5194/gmd-15-8957-2022" ext-link-type="DOI">10.5194/gmd-15-8957-2022</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx65"><label>Weininger(1988)</label><mixed-citation>Weininger, D.: SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules, J. Chem. Inf. Comp. Sci., 28, 31–36, <ext-link xlink:href="https://doi.org/10.1021/ci00057a005" ext-link-type="DOI">10.1021/ci00057a005</ext-link>, 1988.</mixed-citation></ref>
      <ref id="bib1.bibx66"><label>Wienke et al.(1998)Wienke, and Gmehling</label><mixed-citation>Wienke, G. and Gmehling, J.: Prediction of octanol–water partition coefficients, Henry coefficients and water solubilities using UNIFAC, Toxicol. Environ. Chem., 65, 57–86, <ext-link xlink:href="https://doi.org/10.1080/02772249809358557" ext-link-type="DOI">10.1080/02772249809358557</ext-link>, 1998.</mixed-citation></ref>
      <ref id="bib1.bibx67"><label>Wiser et al.(2025)Wiser, Sen, Wang, Lee-Taylor, Barsanti, Orlando, Westervelt, Henze, Fiore, Berman, Carter, and McNeill</label><mixed-citation>Wiser, F., Sen, S., Wang, Z., Lee-Taylor, J., Barsanti, K. C., Orlando, J., Westervelt, D. M., Henze, D. K., Fiore, A. M., Berman, A., Carter, R., and McNeill, V. F.: A graph theory-based algorithm for the reduction of atmospheric chemical mechanisms, PNAS Nexus, 4, 11, <ext-link xlink:href="https://doi.org/10.1093/pnasnexus/pgaf273" ext-link-type="DOI">10.1093/pnasnexus/pgaf273</ext-link>, 2025. </mixed-citation></ref>
      <ref id="bib1.bibx68"><label>Zhang et al.(2024)Zhang, Zuend, Top, Surdu, Ei Haddad, Slowik, Prevot, and Bell</label><mixed-citation>Zhang, J., Zuend, A., Top, J., Surdu, M., Ei Haddad, I., Slowik, J. G., Prevot, A. S. H., and Bell, D. M.: Estimation of the Volatility and Apparent Activity Coefficient of Levoglucosan in Wood-Burning Organic Aerosols, Environ. Sci. Tech. Let., 11, 1214–1219, <ext-link xlink:href="https://doi.org/10.1021/acs.estlett.4c00608" ext-link-type="DOI">10.1021/acs.estlett.4c00608</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx69"><label>Zuend and Seinfeld(2012)</label><mixed-citation>Zuend, A. and Seinfeld, J. H.: Modeling the gas-particle partitioning of secondary organic aerosol: the importance of liquid-liquid phase separation, Atmos. Chem. Phys., 12, 3857–3882, <ext-link xlink:href="https://doi.org/10.5194/acp-12-3857-2012" ext-link-type="DOI">10.5194/acp-12-3857-2012</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx70"><label>Zuend and Seinfeld(2013)</label><mixed-citation>Zuend, A. and Seinfeld, J. H.: A practical method for the calculation of liquid-liquid equilibria in multicomponent organic-water-electrolyte systems using physicochemical constraints, Fluid Phase Equilibr., 337, 201–213, <ext-link xlink:href="https://doi.org/10.1016/j.fluid.2012.09.034" ext-link-type="DOI">10.1016/j.fluid.2012.09.034</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx71"><label>Zuend et al.(2008)Zuend, Marcolli, Luo, and Peter</label><mixed-citation>Zuend, A., Marcolli, C., Luo, B. P., and Peter, T.: A thermodynamic model of mixed organic-inorganic aerosols to predict activity coefficients, Atmos. Chem. Phys., 8, 4559–4593, <ext-link xlink:href="https://doi.org/10.5194/acp-8-4559-2008" ext-link-type="DOI">10.5194/acp-8-4559-2008</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx72"><label>Zuend et al.(2010)Zuend, Marcolli, Peter, and Seinfeld</label><mixed-citation>Zuend, A., Marcolli, C., Peter, T., and Seinfeld, J. H.: Computation of liquid-liquid equilibria and phase stabilities: implications for RH-dependent gas/particle partitioning of organic-inorganic aerosols, Atmos. Chem. Phys., 10, 7795–7820, <ext-link xlink:href="https://doi.org/10.5194/acp-10-7795-2010" ext-link-type="DOI">10.5194/acp-10-7795-2010</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx73"><label>Zuend et al.(2011)Zuend, Marcolli, Booth, Lienhard, Soonsin, Krieger, Topping, McFiggans, Peter, and Seinfeld</label><mixed-citation>Zuend, A., Marcolli, C., Booth, A. M., Lienhard, D. M., Soonsin, V., Krieger, U. K., Topping, D. O., McFiggans, G., Peter, T., and Seinfeld, J. H.: New and extended parameterization of the thermodynamic model AIOMFAC: calculation of activity coefficients for organic-inorganic mixtures containing carboxyl, hydroxyl, carbonyl, ether, ester, alkenyl, alkyl, and aromatic functional groups, Atmos. Chem. Phys., 11, 9155–9206, <ext-link xlink:href="https://doi.org/10.5194/acp-11-9155-2011" ext-link-type="DOI">10.5194/acp-11-9155-2011</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx74"><label>Zuend et al.(2025)Zuend, Hassan-Barthaux, and Amaladhasan</label><mixed-citation>Zuend, A., Hassan-Barthaux, D., and Amaladhasan, D. A.: SMILES_to_sat_vapour_pressure, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.17172675" ext-link-type="DOI">10.5281/zenodo.17172675</ext-link>, 2025.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>S2AS v1.0 and 2D polarity–volatility lumping framework  v1.0: automated compound classification and scalable  lumping for organic aerosol modelling</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Allen et al.(2016)Allen, Pon, Greiner, and Wishart</label><mixed-citation>
      
Allen, F., Pon, A., Greiner, R., and Wishart, D.:
Computational prediction of electron ionization mass spectra to assist in GC/MS compound identification, Anal. Chem., 88, 7689–7697, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Amaladhasan and Zuend(2026a)</label><mixed-citation>
      
Amaladhasan, D. A. and Zuend, A.: 2D polarity–volatility lumping framework, Zenodo [code], <a href="https://doi.org/10.5281/zenodo.18968224" target="_blank">https://doi.org/10.5281/zenodo.18968224</a>, 2026a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Amaladhasan and Zuend(2026b)</label><mixed-citation>
      
Amaladhasan, D. A. and Zuend, A.: SMILES to AIOMFAC subgroups (S2AS) tool, Zenodo [code], <a href="https://doi.org/10.5281/zenodo.18968164" target="_blank">https://doi.org/10.5281/zenodo.18968164</a>, 2026b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Amaladhasan et al.(2026)Amaladhasan, Zuend, and Hassan-Barthaux</label><mixed-citation>
      
Amaladhasan, D. A., Zuend, A., and Hassan-Barthaux, D.: Alpha-pinene and Toluene SOA system data used in Amaladhasan et al. for 2D lumping, Zenodo [data set], <a href="https://doi.org/10.5281/zenodo.17088390" target="_blank">https://doi.org/10.5281/zenodo.17088390</a>, 2026.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Armeli et al.(2023)Armeli, Peters, and Koop</label><mixed-citation>
      
Armeli, G., Peters, J.-H., and Koop, T.:
Machine-Learning-Based Prediction of the Glass Transition Temperature of Organic Compounds Using Experimental Data, ACS Omega, 8, 12298–12309, <a href="https://doi.org/10.1021/acsomega.2c08146" target="_blank">https://doi.org/10.1021/acsomega.2c08146</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Aumont et al.(2005)Aumont, Szopa, and Madronich</label><mixed-citation>
      
Aumont, B., Szopa, S., and Madronich, S.:
Modelling the evolution of organic carbon during its gas-phase tropospheric oxidation: development of an explicit model based on a self generating approach, Atmos. Chem. Phys., 5, 2497–2517, <a href="https://doi.org/10.5194/acp-5-2497-2005" target="_blank">https://doi.org/10.5194/acp-5-2497-2005</a>, 2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Barley and McFiggans(2010)</label><mixed-citation>
      
Barley, M. H. and McFiggans, G.:
The critical assessment of vapour pressure estimation methods for use in modelling the formation of atmospheric organic aerosol, Atmos. Chem. Phys., 10, 749–767, <a href="https://doi.org/10.5194/acp-10-749-2010" target="_blank">https://doi.org/10.5194/acp-10-749-2010</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Bertram et al.(2011)Bertram, Martin, Hanna, Smith, Bodsworth, Chen, Kuwata, Liu, You, and Zorn</label><mixed-citation>
      
Bertram, A. K., Martin, S. T., Hanna, S. J., Smith, M. L., Bodsworth, A., Chen, Q., Kuwata, M., Liu, A., You, Y., and Zorn, S. R.:
Predicting the relative humidities of liquid-liquid phase separation, efflorescence, and deliquescence of mixed particles of ammonium sulfate, organic material, and water using the organic-to-sulfate mass ratio of the particle and the oxygen-to-carbon elemental ratio of the organic component, Atmos. Chem. Phys., 11, 10995–11006, <a href="https://doi.org/10.5194/acp-11-10995-2011" target="_blank">https://doi.org/10.5194/acp-11-10995-2011</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Bilde et al.(2015)Bilde, Barsanti, Booth, Cappa, Donahue, Emanuelsson, McFiggans, Krieger, Marcolli, Topping, Ziemann, Barley, Clegg, Dennis-Smither, Hallquist, Hallquist, Khlystov, Kulmala, Mogensen, Percival, Pope, Reid, Ribeiro da Silva, Rosenoern, Salo, Soonsin, Yli-Juuti, Prisle, Pagels, Rarey, Zardini, and Riipinen</label><mixed-citation>
      
Bilde, M., Barsanti, K., Booth, M., Cappa, C. D., Donahue, N. M., Emanuelsson, E. U., McFiggans, G., Krieger, U. K., Marcolli, C., Topping, D., Ziemann, P., Barley, M., Clegg, S., Dennis-Smither, B., Hallquist, M., Hallquist, A. M., Khlystov, A., Kulmala, M., Mogensen, D., Percival, C. J., Pope, F., Reid, J. P., Ribeiro da Silva, M. A. V., Rosenoern, T., Salo, K., Soonsin, V. P., Yli-Juuti, T., Prisle, N. L., Pagels, J., Rarey, J., Zardini, A. A., and Riipinen, I.:
Saturation Vapor Pressures and Transition Enthalpies of Low-Volatility Organic Molecules of Atmospheric Relevance: From Dicarboxylic Acids to Complex Mixtures, Chem. Rev., 115, 4115–4156, <a href="https://doi.org/10.1021/cr5005502" target="_blank">https://doi.org/10.1021/cr5005502</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Burkardt(2008)</label><mixed-citation>
      
Burkardt, J.:
ASA058 the K-Means Problem, <a href="https://people.math.sc.edu/Burkardt/f_src/asa058/asa058.html" target="_blank"/> (last access: 8 August 2025), 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Byun et al.(1999)Byun, Young, and Odman</label><mixed-citation>
      
Byun, D. W., Young, J., and Odman, M. T.: Governing Equations and Computational Structure of the Community Multiscale Air Quality (CMAQ) Chemical Transport Model, Chap. 6, Science Algorithms of the EPA Models-3 Community Multiscale Air Quality (CMAQ) Modeling System, 6-1–6-41, <a href="https://www.cmascenter.org/cmaq/science_documentation/pdf/ch06.pdf" target="_blank"/> (last access: 25 May 2026), 1999.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Chang and Pankow(2006)</label><mixed-citation>
      
Chang, E. I. and Pankow, J. F.:
Prediction of activity coefficients in liquid aerosol particles containing organic compounds, dissolved inorganic salts, and water – Part 2: Consideration of phase separation effects by an X-UNIFAC model, Atmos. Environ., 40, 6422–6436, <a href="https://doi.org/10.1016/j.atmosenv.2006.04.031" target="_blank">https://doi.org/10.1016/j.atmosenv.2006.04.031</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Chang and Pankow(2010)</label><mixed-citation>
      
Chang, E. I. and Pankow, J. F.:
Organic particulate matter formation at varying relative humidity using surrogate secondary and primary organic compounds with activity corrections in the condensed phase obtained using a method based on the Wilson equation, Atmos. Chem. Phys., 10, 5475–5490, <a href="https://doi.org/10.5194/acp-10-5475-2010" target="_blank">https://doi.org/10.5194/acp-10-5475-2010</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Compernolle et al.(2011)Compernolle, Ceulemans, and Müller</label><mixed-citation>
      
Compernolle, S., Ceulemans, K., and Müller, J.-F.:
EVAPORATION: a new vapour pressure estimation methodfor organic molecules including non-additivity and intramolecular interactions, Atmos. Chem. Phys., 11, 9431–9450, <a href="https://doi.org/10.5194/acp-11-9431-2011" target="_blank">https://doi.org/10.5194/acp-11-9431-2011</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>DeRieux et al.(2018)DeRieux, Li, Lin, Laskin, Laskin, Bertram, Nizkorodov, and Shiraiwa</label><mixed-citation>
      
DeRieux, W.-S. W., Li, Y., Lin, P., Laskin, J., Laskin, A., Bertram, A. K., Nizkorodov, S. A., and Shiraiwa, M.:
Predicting the glass transition temperature and viscosity of secondary organic material using molecular composition, Atmos. Chem. Phys., 18, 6331–6351, <a href="https://doi.org/10.5194/acp-18-6331-2018" target="_blank">https://doi.org/10.5194/acp-18-6331-2018</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Donahue et al.(2006)Donahue, Robinson, Stanier, and Pandis</label><mixed-citation>
      
Donahue, N. M., Robinson, A. L., Stanier, C. O., and Pandis, S. N.:
Coupled Partitioning, Dilution, and Chemical Aging of Semivolatile Organics, Environ. Sci. Technol., 40, 2635–2643, <a href="https://doi.org/10.1021/es052297c" target="_blank">https://doi.org/10.1021/es052297c</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Donahue et al.(2011)Donahue, Epstein, Pandis, and Robinson</label><mixed-citation>
      
Donahue, N. M., Epstein, S. A., Pandis, S. N., and Robinson, A. L.:
A two-dimensional volatility basis set: 1. organic-aerosol mixing thermodynamics, Atmos. Chem. Phys., 11, 3303–3318, <a href="https://doi.org/10.5194/acp-11-3303-2011" target="_blank">https://doi.org/10.5194/acp-11-3303-2011</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Donahue et al.(2012)Donahue, Kroll, Pandis, and Robinson</label><mixed-citation>
      
Donahue, N. M., Kroll, J. H., Pandis, S. N., and Robinson, A. L.:
A two-dimensional volatility basis set – Part 2: Diagnostics of organic-aerosol evolution, Atmos. Chem. Phys., 12, 615–634, <a href="https://doi.org/10.5194/acp-12-615-2012" target="_blank">https://doi.org/10.5194/acp-12-615-2012</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Ehrlich and Rarey(2012)</label><mixed-citation>
      
Ehrlich, H.-C. and Rarey, M.:
Systematic benchmark of substructure search in molecular graphs-From Ullmann to VF2, J. Cheminformatics, 4, 1–17, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Erdakos and Pankow(2004)</label><mixed-citation>
      
Erdakos, G. B. and Pankow, J. F.:
Gas/particle partitioning of neutral and ionizing compounds to single- and multi-phase aerosol particles. 2. Phase separation in liquid particulate matter containing both polar and low-polarity organic compounds, Atmos. Environ., 38, 1005–1013, <a href="https://doi.org/10.1016/j.atmosenv.2003.10.038" target="_blank">https://doi.org/10.1016/j.atmosenv.2003.10.038</a>, 2004.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Fredenslund et al.(1975)Fredenslund, Jones, and Prausnitz</label><mixed-citation>
      
Fredenslund, A., Jones, R. L., and Prausnitz, J. M.:
Group-Contribution Estimation of Activity Coefficients in Nonideal Liquid Mixtures, AIChE J., 21, 1086–1099, 1975.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Galeazzo and Shiraiwa(2022)</label><mixed-citation>
      
Galeazzo, T. and Shiraiwa, M.:
Predicting glass transition temperature and melting point of organic compounds via machine learning and molecular embeddings, Environmental Science: Atmospheres, 2, 362–374, <a href="https://doi.org/10.1039/D1EA00090J" target="_blank">https://doi.org/10.1039/D1EA00090J</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Girolami(1994)</label><mixed-citation>
      
Girolami, G. S.:
A Simple “Back of the Envelope” Method for Estimating the Densities and Molecular Volumes of Liquids and Solids, J. Chem. Educ., 71, 962, <a href="https://doi.org/10.1021/ed071p962" target="_blank">https://doi.org/10.1021/ed071p962</a>, 1994.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Griffin et al.(2003)Griffin, Nguyen, Dabdub, and Seinfeld</label><mixed-citation>
      
Griffin, R. J., Nguyen, K., Dabdub, D., and Seinfeld, J. H.:
A Coupled Hydrophobic-Hydrophilic Model for Predicting Secondary Organic Aerosol Formation, J. Atmos. Chem., 44, 171–190, 2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Hallquist et al.(2009)Hallquist, Wenger, Baltensperger, Rudich, Simpson, Claeys, Dommen, Donahue, George, Goldstein, Hamilton, Herrmann, Hoffmann, Iinuma, Jang, Jenkin, Jimenez, Kiendler-Scharr, Maenhaut, McFiggans, Mentel, Monod, Prevot, Seinfeld, Surratt, Szmigielski, and Wildt</label><mixed-citation>
      
Hallquist, M., Wenger, J. C., Baltensperger, U., Rudich, Y., Simpson, D., Claeys, M., Dommen, J., Donahue, N. M., George, C., Goldstein, A. H., Hamilton, J. F., Herrmann, H., Hoffmann, T., Iinuma, Y., Jang, M., Jenkin, M. E., Jimenez, J. L., Kiendler-Scharr, A., Maenhaut, W., McFiggans, G., Mentel, Th. F., Monod, A., Prévôt, A. S. H., Seinfeld, J. H., Surratt, J. D., Szmigielski, R., and Wildt, J.:
The formation, properties and impact of secondary organic aerosol: current and emerging issues, Atmos. Chem. Phys., 9, 5155–5236, <a href="https://doi.org/10.5194/acp-9-5155-2009" target="_blank">https://doi.org/10.5194/acp-9-5155-2009</a>, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Hansen et al.(1991)Hansen, Rasmussen, Fredenslund, Schiller, and Gmehling</label><mixed-citation>
      
Hansen, H. K., Rasmussen, P., Fredenslund, A., Schiller, M., and Gmehling, J.:
Vapor–liquid equilibria by UNIFAC group contribution. 5. Revision and extension, Ind. Eng. Chem. Res., 30, 2352–2355, <a href="https://doi.org/10.1021/ie00058a017" target="_blank">https://doi.org/10.1021/ie00058a017</a>, 1991.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Hartigan and Wong(1979)</label><mixed-citation>
      
Hartigan, J. A. and Wong, M. A.:
Algorithm AS 136: A K-Means Clustering Algorithm, J. Roy. Stat. Soc. C-App., 28, 100–108, <a href="https://doi.org/10.2307/2346830" target="_blank">https://doi.org/10.2307/2346830</a>, 1979.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Huang et al.(2021)Huang, Mahrt, Xu, Shiraiwa, Zuend, and Bertram</label><mixed-citation>
      
Huang, Y., Mahrt, F., Xu, S., Shiraiwa, M., Zuend, A., and Bertram, A. K.:
Coexistence of three liquid phases in individual atmospheric aerosol particles, P. Natl. Acad. Sci. USA, 118, e2102512118, <a href="https://doi.org/10.1073/pnas.2102512118" target="_blank">https://doi.org/10.1073/pnas.2102512118</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Jenkin et al.(1997)Jenkin, Saunders, and Pilling</label><mixed-citation>
      
Jenkin, M. E., Saunders, S. M., and Pilling, M. J.:
The tropospheric degradation of volatile organic compounds: A protocol for mechanism development, Atmos. Environ., 31, 81–104, <a href="https://doi.org/10.1016/S1352-2310(96)00105-7" target="_blank">https://doi.org/10.1016/S1352-2310(96)00105-7</a>, 1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Jenkin et al.(2003)Jenkin, Saunders, Wagner, and Pilling</label><mixed-citation>
      
Jenkin, M. E., Saunders, S. M., Wagner, V., and Pilling, M. J.:
Protocol for the development of the Master Chemical Mechanism, MCM v3 (Part B): tropospheric degradation of aromatic volatile organic compounds, Atmos. Chem. Phys., 3, 181–193, <a href="https://doi.org/10.5194/acp-3-181-2003" target="_blank">https://doi.org/10.5194/acp-3-181-2003</a>, 2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Jimenez et al.(2009)Jimenez, Canagaratna, Donahue, Prevot, Zhang, Kroll, DeCarlo, Allan, Coe, Ng, Aiken, Docherty, Ulbrich, Grieshop, Robinson, Duplissy, Smith, Wilson, Lanz, Hueglin, Sun, Tian, Laaksonen, Raatikainen, Rautiainen, Vaattovaara, Ehn, Kulmala, Tomlinson, Collins, Cubison, Dunlea, Huffman, Onasch, Alfarra, Williams, Bower, Kondo, Schneider, Drewnick, Borrmann, Weimer, Demerjian, Salcedo, Cottrell, Griffin, Takami, Miyoshi, Hatakeyama, Shimono, Sun, Zhang, Dzepina, Kimmel, Sueper, Jayne, Herndon, Trimborn, Williams, Wood, Middlebrook, Kolb, Baltensperger, and Worsnop</label><mixed-citation>
      
Jimenez, J. L., Canagaratna, M. R., Donahue, N. M., Prevot, A. S. H., Zhang, Q., Kroll, J. H., DeCarlo, P. F., Allan, J. D., Coe, H., Ng, N. L., Aiken, A. C., Docherty, K. S., Ulbrich, I. M., Grieshop, A. P., Robinson, A. L., Duplissy, J., Smith, J. D., Wilson, K. R., Lanz, V. A., Hueglin, C., Sun, Y. L., Tian, J., Laaksonen, A., Raatikainen, T., Rautiainen, J., Vaattovaara, P., Ehn, M., Kulmala, M., Tomlinson, J. M., Collins, D. R., Cubison, M. J., Dunlea, E. J., Huffman, J. A., Onasch, T. B., Alfarra, M. R., Williams, P. I., Bower, K., Kondo, Y., Schneider, J., Drewnick, F., Borrmann, S., Weimer, S., Demerjian, K., Salcedo, D., Cottrell, L., Griffin, R., Takami, A., Miyoshi, T., Hatakeyama, S., Shimono, A., Sun, J. Y., Zhang, Y. M., Dzepina, K., Kimmel, J. R., Sueper, D., Jayne, J. T., Herndon, S. C., Trimborn, A. M., Williams, L. R., Wood, E. C., Middlebrook, A. M., Kolb, C. E., Baltensperger, U., and Worsnop, D. R.:
Evolution of Organic Aerosols in the Atmosphere, Science, 326, 1525–1529, <a href="https://doi.org/10.1126/science.1180353" target="_blank">https://doi.org/10.1126/science.1180353</a>, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Kamlet et al.(1988)Kamlet, Doherty, Abraham, Marcus, and Taft</label><mixed-citation>
      
Kamlet, M. J., Doherty, R. M., Abraham, M. H., Marcus, Y., and Taft, R. W.:
Linear solvation energy relationship. 46. An improved equation for correlation and prediction of octanol/water partition coefficients of organic nonelectrolytes (including strong hydrogen bond donor solutes), J. Phys. Chem., 92, 5244–5255, <a href="https://doi.org/10.1021/j100329a035" target="_blank">https://doi.org/10.1021/j100329a035</a>, 1988.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Kroll et al.(2011)Kroll, Donahue, Jimenez, Kessler, Canagaratna, Wilson, Altieri, Mazzoleni, Wozniak, Bluhm, Mysak, Smith, Kolb, and Worsnop</label><mixed-citation>
      
Kroll, J. H., Donahue, N. M., Jimenez, J. L., Kessler, S. H., Canagaratna, M. R., Wilson, K. R., Altieri, K. E., Mazzoleni, L. R., Wozniak, A. S., Bluhm, H., Mysak, E. R., Smith, J. D., Kolb, C. E., and Worsnop, D. R.:
Carbon oxidation state as a metric for describing the chemistry of atmospheric organic aerosol, Nat. Chem., 3, 133–139, <a href="https://doi.org/10.1038/nchem.948" target="_blank">https://doi.org/10.1038/nchem.948</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Landrum(2013)</label><mixed-citation>
      
Landrum, G.: RDKit: A software suite for cheminformatics, computational chemistry, and predictive modeling, <a href="https://www.rdkit.org" target="_blank"/> (last access: 25 May 2026), 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Lannuque et al.(2021)Lannuque, D’Anna, Couvidat, Valorso, and Sartelet</label><mixed-citation>
      
Lannuque, V., D’Anna, B., Couvidat, F., Valorso, R., and Sartelet, K.:
Improvement in Modeling of OH and HO<sub>2</sub> Radical Concentrations during Toluene and Xylene Oxidation with RACM2 Using MCM/GECKO-A, Atmosphere-Basel, 12, <a href="https://doi.org/10.3390/atmos12060732" target="_blank">https://doi.org/10.3390/atmos12060732</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Marcolli and Peter(2005)</label><mixed-citation>
      
Marcolli, C. and Peter, Th.:
Water activity in polyol/water systems: new UNIFAC parameterization, Atmos. Chem. Phys., 5, 1545–1555, <a href="https://doi.org/10.5194/acp-5-1545-2005" target="_blank">https://doi.org/10.5194/acp-5-1545-2005</a>, 2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Mouchel-Vallon et al.(2020)Mouchel-Vallon, Lee-Taylor, Hodzic, Artaxo, Aumont, Camredon, Gurarie, Jimenez, Lenschow, Martin et al.</label><mixed-citation>
      
Mouchel-Vallon, C., Lee-Taylor, J., Hodzic, A., Artaxo, P., Aumont, B., Camredon, M., Gurarie, D., Jimenez, J.-L., Lenschow, D. H., Martin, S. T., Nascimento, J., Orlando, J. J., Palm, B. B., Shilling, J. E., Shrivastava, M., and Madronich, S.:
Exploration of oxidative chemistry and secondary organic aerosol formation in the Amazon during the wet season: explicit modeling of the Manaus urban plume with GECKO-A, Atmos. Chem. Phys., 20, 5995–6014, <a href="https://doi.org/10.5194/acp-20-5995-2020" target="_blank">https://doi.org/10.5194/acp-20-5995-2020</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Nannoolal et al.(2008)Nannoolal, Rarey, and Ramjugernath</label><mixed-citation>
      
Nannoolal, Y., Rarey, J., and Ramjugernath, D.:
Estimation of pure component properties: Part 3. Estimation of the vapor pressure of non-electrolyte organic compounds via group contributions and group interactions, Fluid Phase Equilibr., 269, 117–133, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>O'Boyle et al.(2011)O'Boyle Jr., Humphrey, Pollack, Hawver, and Story</label><mixed-citation>
      
O'Boyle Jr., E. H., Humphrey, R. H., Pollack, J. M., Hawver, T. H., and Story, P. A.:
The relation between emotional intelligence and job performance: A meta-analysis, J. Organ. Behav., 32, 788–818, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>OEChem(2012)</label><mixed-citation>
      
OEChem: OpenEye Scientific Software, Inc., Santa Fe, NM, USA, <a href="https://www.eyesopen.com/oechem-tk" target="_blank"/> (last access: 25 May 2026), 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>O'Meara et al.(2014)O'Meara, Booth, Barley, Topping, and McFiggans</label><mixed-citation>
      
O'Meara, S., Booth, A. M., Barley, M. H., Topping, D., and McFiggans, G.:
An assessment of vapour pressure estimation methods, Phys. Chem. Chem. Phys., 16, 19453–19469, <a href="https://doi.org/10.1039/C4CP00857J" target="_blank">https://doi.org/10.1039/C4CP00857J</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Pankow(2003)</label><mixed-citation>
      
Pankow, J. F.:
Gas/particle partitioning of neutral and ionizing compounds to single and multi-phase aerosol particles. 1. Unified modeling framework, Atmos. Environ., 37, 3323–3333, <a href="https://doi.org/10.1016/S1352-2310(03)00346-7" target="_blank">https://doi.org/10.1016/S1352-2310(03)00346-7</a>, 2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Pankow and Asher(2008)</label><mixed-citation>
      
Pankow, J. F. and Asher, W. E.:
SIMPOL.1: a simple group contribution method for predicting vapor pressures and enthalpies of vaporization of multifunctional organic compounds, Atmos. Chem. Phys., 8, 2773–2796, <a href="https://doi.org/10.5194/acp-8-2773-2008" target="_blank">https://doi.org/10.5194/acp-8-2773-2008</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Pankow and Barsanti(2009)</label><mixed-citation>
      
Pankow, J. F. and Barsanti, C. K.:
The carbon number-polarity grid: A means to manage the complexity of the mix of organic compounds when modeling atmospheric organic particulate matter, Atmos. Environ., 43, 2829–2835, <a href="https://doi.org/10.1016/j.atmosenv.2008.12.050" target="_blank">https://doi.org/10.1016/j.atmosenv.2008.12.050</a>, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Pankow and Chang(2008)</label><mixed-citation>
      
Pankow, J. F. and Chang, E. I.:
Variation in the sensitivity of predicted levels of atmospheric organic particulate matter (OPM), Environ. Sci. Technol., 42, 7321–7329, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Pavlov et al.(2011)Pavlov, Rybalkin, Karulin, Kozhevnikov, Savelyev, and Churinov</label><mixed-citation>
      
Pavlov, D., Rybalkin, M., Karulin, B., Kozhevnikov, M., Savelyev, A., and Churinov, A.: Indigo: universal cheminformatics API, J. Cheminformatics, 3, <a href="https://doi.org/10.1186/1758-2946-3-S1-P4" target="_blank">https://doi.org/10.1186/1758-2946-3-S1-P4</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Pun et al.(2002)Pun, Griffin, Seigneur, and Seinfeld</label><mixed-citation>
      
Pun, B. K. L., Griffin, R. J., Seigneur, C., and Seinfeld, J. H.:
Secondary organic aerosol – 2. Thermodynamic model for gas/particle partitioning of molecular constituents, J. Geophys. Res.-Atmos., 107, <a href="https://doi.org/10.1029/2001JD000542" target="_blank">https://doi.org/10.1029/2001JD000542</a>, 2002.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Rastak et al.(2017)Rastak, Pajunoja, Acosta Navarro, Ma, Song, Partridge, Kirkevåg, Leong, Hu, Taylor, Lambe, Cerully, Bougiatioti, Liu, Krejci, Petäjä, Percival, Davidovits, Worsnop, Ekman, Nenes, Martin, Jimenez, Collins, Topping, Bertram, Zuend, Virtanen, and Riipinen</label><mixed-citation>
      
Rastak, N., Pajunoja, A., Acosta Navarro, J. C., Ma, J., Song, M., Partridge, D. G., Kirkevåg, A., Leong, Y., Hu, W. W., Taylor, N. F., Lambe, A., Cerully, K., Bougiatioti, A., Liu, P., Krejci, R., Petäjä, T., Percival, C., Davidovits, P., Worsnop, D. R., Ekman, A. M. L., Nenes, A., Martin, S., Jimenez, J. L., Collins, D. R., Topping, D. O., Bertram, A. K., Zuend, A., Virtanen, A., and Riipinen, I.:
Microphysical explanation of the RH-dependent water affinity of biogenic organic aerosol and its importance for climate, Geophys. Res. Lett., 44, 5167–5177, <a href="https://doi.org/10.1002/2017GL073056" target="_blank">https://doi.org/10.1002/2017GL073056</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Ruggeri et al.(2016)Ruggeri, Bernhard, Henderson, and Takahama</label><mixed-citation>
      
Ruggeri, G., Bernhard, F. A., Henderson, B. H., and Takahama, S.:
Model–measurement comparison of functional group abundance in <i>α</i>-pinene and 1,3,5-trimethylbenzene secondary organic aerosol formation, Atmos. Chem. Phys., 16, 8729–8747, <a href="https://doi.org/10.5194/acp-16-8729-2016" target="_blank">https://doi.org/10.5194/acp-16-8729-2016</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Saunders et al.(2003)Saunders, Jenkin, Derwent, and Pilling</label><mixed-citation>
      
Saunders, S. M., Jenkin, M. E., Derwent, R. G., and Pilling, M. J.:
Protocol for the development of the Master Chemical Mechanism, MCM v3 (Part A): tropospheric degradation of non-aromatic volatile organic compounds, Atmos. Chem. Phys., 3, 161–180, <a href="https://doi.org/10.5194/acp-3-161-2003" target="_blank">https://doi.org/10.5194/acp-3-161-2003</a>, 2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Schervish and Shiraiwa(2023)</label><mixed-citation>
      
Schervish, M. and Shiraiwa, M.:
Impact of phase state and non-ideal mixing on equilibration timescales of secondary organic aerosol partitioning, Atmos. Chem. Phys., 23, 221–233, <a href="https://doi.org/10.5194/acp-23-221-2023" target="_blank">https://doi.org/10.5194/acp-23-221-2023</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Schmedding and Zuend(2025)</label><mixed-citation>
      
Schmedding, R. and Zuend, A.:
The role of interfacial tension in the size-dependent phase separation of atmospheric aerosol particles, Atmos. Chem. Phys., 25, 327–346, <a href="https://doi.org/10.5194/acp-25-327-2025" target="_blank">https://doi.org/10.5194/acp-25-327-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>Schmedding et al.(2025)Schmedding, Franssen, and Zuend</label><mixed-citation>
      
Schmedding, R., Franssen, M., and Zuend, A.:
A Machine Learning Approach for Predicting the Pure-Component Surface Tension of Atmospherically Relevant Organic Compounds, ACS ES&amp;T Air, <a href="https://doi.org/10.1021/acsestair.4c00291" target="_blank">https://doi.org/10.1021/acsestair.4c00291</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Semeniuk and Dastoor(2020)</label><mixed-citation>
      
Semeniuk, K. and Dastoor, A.:
Current State of Atmospheric Aerosol Thermodynamics and Mass Transfer Modeling: A Review, Atmosphere-Basel, 11, <a href="https://doi.org/10.3390/atmos11020156" target="_blank">https://doi.org/10.3390/atmos11020156</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>Smith et al.(2011)Smith, Kuwata, and Martin</label><mixed-citation>
      
Smith, M. L., Kuwata, M., and Martin, S. T.:
Secondary Organic Material Produced by the Dark Ozonolysis of <i>α</i>-Pinene Minimally Affects the Deliquescence and Efflorescence of Ammonium Sulfate, Aerosol Sci. Tech., 45, 244–261, <a href="https://doi.org/10.1080/02786826.2010.532178" target="_blank">https://doi.org/10.1080/02786826.2010.532178</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>Sommers et al.(2022)Sommers, Stroud, Adam, O'Brien, Brook, Hayden, Lee, Li, Liggio, Mihele, Mittermeier, Stevens, Wolde, Zuend, and Hayes</label><mixed-citation>
      
Sommers, J. M., Stroud, C. A., Adam, M. G., O'Brien, J., Brook, J. R., Hayden, K., Lee, A. K. Y., Li, K., Liggio, J., Mihele, C., Mittermeier, R. L., Stevens, R. G., Wolde, M., Zuend, A., and Hayes, P. L.:
Evaluating SOA formation from different sources of semi- and intermediate-volatility organic compounds from the Athabasca oil sands, Environmental Science: Atmospheres, 2, 469–490, <a href="https://doi.org/10.1039/D1EA00053E" target="_blank">https://doi.org/10.1039/D1EA00053E</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>Sparks(1973)</label><mixed-citation>
      
Sparks, D. N.:
Algorithm AS 58: Euclidean Cluster Analysis, J. Roy. Stat. Soc. C-App., 22, 126–130, <a href="https://doi.org/10.2307/2346321" target="_blank">https://doi.org/10.2307/2346321</a>, 1973.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>Steinbeck et al.(2003)Steinbeck, Han, Kuhn, Horlacher, Luttmann, and Willighagen</label><mixed-citation>
      
Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., and Willighagen, E.:
The Chemistry Development Kit (CDK): An open-source Java library for chemo-and bioinformatics, J. Chem. Inf. Comp. Sci., 43, 493–500, 2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>Topping et al.(2016)Topping, Barley, Bane, Higham, Aumont, Dingle, and McFiggans</label><mixed-citation>
      
Topping, D., Barley, M., Bane, M. K., Higham, N., Aumont, B., Dingle, N., and McFiggans, G.:
UManSysProp v1.0: an online and open-source facility for molecular property prediction and atmospheric aerosol calculations, Geosci. Model Dev., 9, 899–914, <a href="https://doi.org/10.5194/gmd-9-899-2016" target="_blank">https://doi.org/10.5194/gmd-9-899-2016</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>Topping and Bane(2022)</label><mixed-citation>
      
Topping, D. L. and Bane, M. (Eds.): Introduction to aerosol modelling: From theory to code, John Wiley &amp; Sons, <a href="https://doi.org/10.1002/9781119625728" target="_blank">https://doi.org/10.1002/9781119625728</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib61"><label>Topping et al.(2007)Topping, McFiggans, Kiss, Varga, Facchini, Decesari, and Mircea</label><mixed-citation>
      
Topping, D. O., McFiggans, G. B., Kiss, G., Varga, Z., Facchini, M. C., Decesari, S., and Mircea, M.:
Surface tensions of multi-component mixed inorganic/organic aqueous systems of atmospheric significance: measurements, model predictions and importance for cloud activation predictions, Atmos. Chem. Phys., 7, 2371–2398, <a href="https://doi.org/10.5194/acp-7-2371-2007" target="_blank">https://doi.org/10.5194/acp-7-2371-2007</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib62"><label>Toropov et al.(2008)Toropov, Rasulev, Leszczynska, and Leszczynski</label><mixed-citation>
      
Toropov, A. A., Rasulev, B. F., Leszczynska, D., and Leszczynski, J.:
Multiplicative SMILES-based optimal descriptors: QSPR modeling of fullerene C60 solubility in organic solvents, Chem. Phys. Lett., 457, 332–336, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib63"><label>Tulet et al.(2006)Tulet, Grini, Griffin, and Petitcol</label><mixed-citation>
      
Tulet, P., Grini, A., Griffin, R. J., and Petitcol, S.: ORILAM-SOA: A computationally efficient model for predicting secondary organic aerosols in three-dimensional atmospheric models, J. Geophys. Res., 111, <a href="https://doi.org/10.1029/2006JD007152" target="_blank">https://doi.org/10.1029/2006JD007152</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib64"><label>Wang et al.(2022)Wang, Couvidat, and Sartelet</label><mixed-citation>
      
Wang, Z., Couvidat, F., and Sartelet, K.:
GENerator of reduced Organic Aerosol mechanism (GENOA v1.0): an automatic generation tool of semi-explicit mechanisms, Geosci. Model Dev., 15, 8957–8982, <a href="https://doi.org/10.5194/gmd-15-8957-2022" target="_blank">https://doi.org/10.5194/gmd-15-8957-2022</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib65"><label>Weininger(1988)</label><mixed-citation>
      
Weininger, D.:
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules, J. Chem. Inf. Comp. Sci., 28, 31–36, <a href="https://doi.org/10.1021/ci00057a005" target="_blank">https://doi.org/10.1021/ci00057a005</a>, 1988.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib66"><label>Wienke et al.(1998)Wienke, and Gmehling</label><mixed-citation>
      
Wienke, G. and Gmehling, J.:
Prediction of octanol–water partition coefficients, Henry coefficients and water solubilities using UNIFAC, Toxicol. Environ. Chem., 65, 57–86, <a href="https://doi.org/10.1080/02772249809358557" target="_blank">https://doi.org/10.1080/02772249809358557</a>, 1998.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib67"><label>Wiser et al.(2025)Wiser, Sen, Wang, Lee-Taylor, Barsanti, Orlando, Westervelt, Henze, Fiore, Berman, Carter, and McNeill</label><mixed-citation>
      
Wiser, F., Sen, S., Wang, Z., Lee-Taylor, J., Barsanti, K. C., Orlando, J., Westervelt, D. M., Henze, D. K., Fiore, A. M., Berman, A., Carter, R., and McNeill, V. F.:
A graph theory-based algorithm for the reduction of atmospheric chemical mechanisms, PNAS Nexus, 4, 11, <a href="https://doi.org/10.1093/pnasnexus/pgaf273" target="_blank">https://doi.org/10.1093/pnasnexus/pgaf273</a>, 2025.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib68"><label>Zhang et al.(2024)Zhang, Zuend, Top, Surdu, Ei Haddad, Slowik, Prevot, and Bell</label><mixed-citation>
      
Zhang, J., Zuend, A., Top, J., Surdu, M., Ei Haddad, I., Slowik, J. G., Prevot, A. S. H., and Bell, D. M.:
Estimation of the Volatility and Apparent Activity Coefficient of Levoglucosan in Wood-Burning Organic Aerosols, Environ. Sci. Tech. Let., 11, 1214–1219, <a href="https://doi.org/10.1021/acs.estlett.4c00608" target="_blank">https://doi.org/10.1021/acs.estlett.4c00608</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib69"><label>Zuend and Seinfeld(2012)</label><mixed-citation>
      
Zuend, A. and Seinfeld, J. H.:
Modeling the gas-particle partitioning of secondary organic aerosol: the importance of liquid-liquid phase separation, Atmos. Chem. Phys., 12, 3857–3882, <a href="https://doi.org/10.5194/acp-12-3857-2012" target="_blank">https://doi.org/10.5194/acp-12-3857-2012</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib70"><label>Zuend and Seinfeld(2013)</label><mixed-citation>
      
Zuend, A. and Seinfeld, J. H.:
A practical method for the calculation of liquid-liquid equilibria in multicomponent organic-water-electrolyte systems using physicochemical constraints, Fluid Phase Equilibr., 337, 201–213, <a href="https://doi.org/10.1016/j.fluid.2012.09.034" target="_blank">https://doi.org/10.1016/j.fluid.2012.09.034</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib71"><label>Zuend et al.(2008)Zuend, Marcolli, Luo, and Peter</label><mixed-citation>
      
Zuend, A., Marcolli, C., Luo, B. P., and Peter, T.:
A thermodynamic model of mixed organic-inorganic aerosols to predict activity coefficients, Atmos. Chem. Phys., 8, 4559–4593, <a href="https://doi.org/10.5194/acp-8-4559-2008" target="_blank">https://doi.org/10.5194/acp-8-4559-2008</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib72"><label>Zuend et al.(2010)Zuend, Marcolli, Peter, and Seinfeld</label><mixed-citation>
      
Zuend, A., Marcolli, C., Peter, T., and Seinfeld, J. H.:
Computation of liquid-liquid equilibria and phase stabilities: implications for RH-dependent gas/particle partitioning of organic-inorganic aerosols, Atmos. Chem. Phys., 10, 7795–7820, <a href="https://doi.org/10.5194/acp-10-7795-2010" target="_blank">https://doi.org/10.5194/acp-10-7795-2010</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib73"><label>Zuend et al.(2011)Zuend, Marcolli, Booth, Lienhard, Soonsin, Krieger, Topping, McFiggans, Peter, and Seinfeld</label><mixed-citation>
      
Zuend, A., Marcolli, C., Booth, A. M., Lienhard, D. M., Soonsin, V., Krieger, U. K., Topping, D. O., McFiggans, G., Peter, T., and Seinfeld, J. H.:
New and extended parameterization of the thermodynamic model AIOMFAC: calculation of activity coefficients for organic-inorganic mixtures containing carboxyl, hydroxyl, carbonyl, ether, ester, alkenyl, alkyl, and aromatic functional groups, Atmos. Chem. Phys., 11, 9155–9206, <a href="https://doi.org/10.5194/acp-11-9155-2011" target="_blank">https://doi.org/10.5194/acp-11-9155-2011</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib74"><label>Zuend et al.(2025)Zuend, Hassan-Barthaux, and Amaladhasan</label><mixed-citation>
      
Zuend, A., Hassan-Barthaux, D., and Amaladhasan, D. A.: SMILES_to_sat_vapour_pressure, Zenodo [code], <a href="https://doi.org/10.5281/zenodo.17172675" target="_blank">https://doi.org/10.5281/zenodo.17172675</a>, 2025.

    </mixed-citation></ref-html>--></article>
