<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">GMD</journal-id><journal-title-group>
    <journal-title>Geoscientific Model Development</journal-title>
    <abbrev-journal-title abbrev-type="publisher">GMD</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Geosci. Model Dev.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1991-9603</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/gmd-19-4319-2026</article-id><title-group><article-title>Approximating the universal thermal climate index using sparse regression with orthogonal polynomials</article-title><alt-title>Approximating the UTCI using sparse regression with orthogonal polynomials</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Roman</surname><given-names>Sabin</given-names></name>
          <email>sabin.roman@ijs.si</email>
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2 aff1">
          <name><surname>Todorovski</surname><given-names>Ljupčo</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Džeroski</surname><given-names>Sašo</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Skok</surname><given-names>Gregor</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>Department of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Faculty of Mathematics and Physics, University of Ljubljana, Jadranska ulica 19, 1000 Ljubljana, Slovenia</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Sabin Roman (sabin.roman@ijs.si)</corresp></author-notes><pub-date><day>21</day><month>May</month><year>2026</year></pub-date>
      
      <volume>19</volume>
      <issue>10</issue>
      <fpage>4319</fpage><lpage>4330</lpage>
      <history>
        <date date-type="received"><day>12</day><month>November</month><year>2025</year></date>
           <date date-type="rev-request"><day>6</day><month>January</month><year>2026</year></date>
           <date date-type="rev-recd"><day>16</day><month>April</month><year>2026</year></date>
           <date date-type="accepted"><day>20</day><month>April</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Sabin Roman et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://gmd.copernicus.org/articles/19/4319/2026/gmd-19-4319-2026.html">This article is available from https://gmd.copernicus.org/articles/19/4319/2026/gmd-19-4319-2026.html</self-uri><self-uri xlink:href="https://gmd.copernicus.org/articles/19/4319/2026/gmd-19-4319-2026.pdf">The full text article is available as a PDF file from https://gmd.copernicus.org/articles/19/4319/2026/gmd-19-4319-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e116">The Universal Thermal Climate Index (UTCI) is a measure of thermal comfort that quantifies how humans experience environmental conditions. Due to its robustness and versatility as a bioclimatic indicator, it has been extensively employed across a wide range of studies in bioclimatology and is increasingly used as an operational measure of outdoor thermal comfort. At the same time, calculating the UTCI value from the relevant environmental parameters is nominally not straightforward, which is why using a 6th-degree polynomial approximation has become the standard way to calculate UTCI values. At the same time, although it is computationally efficient, the error of this polynomial approximation can be substantial. The goal of this study was to develop an improved version of the polynomial approximation – one that retains comparable computational efficiency but is more robust in terms of numerical stability and substantially more accurate, particularly in reducing the frequency of larger errors. This goal was successfully achieved using sparse orthogonal regression, namely sparse regression with an orthogonal polynomial basis, which not only substantially reduces the average errors (i.e., the mean error, the mean absolute error, and the root mean square error) but also drastically reduces the frequency of large errors. By leveraging Legendre polynomial bases, approximation models could be constructed that efficiently populate a Pareto front of accuracy versus complexity and exhibit stable, hierarchical coefficient structures across varying model capacities. Training the new approximation models over only 20 % of the data, with the testing performed over the remaining 80 %, highlights successful generalization, with the results also being robust under bootstrapping. The decomposition effectively approximates the UTCI as a Fourier-like expansion in an orthogonal basis, yielding results near the theoretical optimum in the <inline-formula><mml:math id="M1" display="inline"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> (least squares) sense.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>HORIZON EUROPE Marie Sklodowska-Curie Actions</funding-source>
<award-id>101081355</award-id>
</award-group>
<award-group id="gs2">
<funding-source>The Slovenian Research and Innovation Agency</funding-source>
<award-id>P1-0188</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e139">The Universal Thermal Climate Index (UTCI) is a measure of thermal comfort that quantifies how humans experience environmental conditions. It is derived from an advanced thermo-physiological model <xref ref-type="bibr" rid="bib1.bibx25" id="paren.1"/> and expressed in units of temperature. The index accounts for multiple factors, including air temperature, humidity, wind speed, radiation, and clothing insulation <xref ref-type="bibr" rid="bib1.bibx14" id="paren.2"/>. A notable advantage of the UTCI compared to many other bioclimatic indices is its ability to represent thermal conditions in terms that are applicable to human strain under a wide range of climatic conditions (e.g., for both hot and cold conditions, <xref ref-type="bibr" rid="bib1.bibx7" id="altparen.3"/>). Based on the UTCI value, the environmental conditions can be classified into one of the ten thermal stress categories <xref ref-type="bibr" rid="bib1.bibx14" id="paren.4"/>, ranging from Extreme heat stress (<inline-formula><mml:math id="M2" display="inline"><mml:mrow><mml:mtext>UTCI</mml:mtext><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">43</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M3" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula>) to Extreme cold stress (<inline-formula><mml:math id="M4" display="inline"><mml:mrow><mml:mtext>UTCI</mml:mtext><mml:mo>&lt;</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">40</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M5" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula>).</p>
      <p id="d2e201">Owing to its robustness and versatility as a bioclimatic indicator, the UTCI has been extensively employed across a wide range of studies in bioclimatology and related scientific disciplines. Its applications encompass diverse research areas, including the assessment of regional and local bioclimate characteristics, the study of urban bioclimate, recreation, tourism, and sports, epidemiological and health-related research, as well as the assessment and forecasting of bioclimatic changes <xref ref-type="bibr" rid="bib1.bibx6" id="paren.5"/>. The UTCI has also seen growing adoption across numerous countries as a standardized measure of outdoor thermal comfort and is increasingly integrated into routine operational meteorological forecasts. For example, within Europe, UTCI is used operationally in the Czech Republic, Italy, Poland, Portugal, and Slovenia <xref ref-type="bibr" rid="bib1.bibx17 bib1.bibx22" id="paren.6"/>.</p>
      <p id="d2e210">At the same time, calculating the UTCI value from the relevant environmental parameters is nominally not straightforward. Namely, the UTCI is based on the Fiala multi-node model of human thermoregulation <xref ref-type="bibr" rid="bib1.bibx20" id="paren.7"/>. However, running the complete Fiala model is computationally expensive and requires expert knowledge to operate the complex simulation software <xref ref-type="bibr" rid="bib1.bibx14" id="paren.8"/>. This is the reason the authors of <xref ref-type="bibr" rid="bib1.bibx14" id="text.9"/> provided two simplified approximate procedures for calculating the UTCI values that could be used in operational settings. The first approximation is based on a 4-dimensional look-up table of <inline-formula><mml:math id="M6" display="inline"><mml:mrow><mml:mn mathvariant="normal">104</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">643</mml:mn></mml:mrow></mml:math></inline-formula> accurate pre-calculated UTCI values that cover a wide range of relevant combinations of the meteorological parameters. Using this look-up table, interpolation from nearby data points can be used to determine approximate UTCI values for intermediate values of meteorological parameters. The second approximation is based on a 6th-degree regression polynomial with 210 coefficients.</p>
      <p id="d2e233">Each approximation has its benefits and weaknesses. The look-up table approach is more accurate, but storing the tabulated values and searching for neighboring datapoints poses challenges to the implementation of this algorithm, while also resulting in a longer execution time compared to the other approach <xref ref-type="bibr" rid="bib1.bibx12" id="paren.10"/>. In contrast, the polynomial approximation is less accurate, but computationally faster and substantially easier to implement in various programming languages and computational environments, as it relies on only the most common, primitive mathematical operators and does not require storing the tabulated values. At the same time, the motivation for improving the polynomial approximation is not simply a matter of storage, since the size of the look-up table is modest in modern computational settings. Rather, an improved polynomial approximation remains attractive for several practical reasons: <list list-type="custom"><list-item><label>i.</label>
      <p id="d2e241">It is fully self-contained and does not depend on external tabulated data, which facilitates reproducibility and makes redistribution and integration into open-source software and operational tools more straightforward;</p></list-item><list-item><label>ii.</label>
      <p id="d2e245">It is computationally more efficient than look-up-table-based interpolation, which has been reported to be slower by roughly three orders of magnitude <xref ref-type="bibr" rid="bib1.bibx14" id="paren.11"/>, an important consideration in large-scale applications such as numerical weather prediction and climate reanalysis;</p></list-item><list-item><label>iii.</label>
      <p id="d2e252">It is simpler to implement and port across programming languages and computational environments, including constrained, embedded, or legacy systems, because it requires only basic arithmetic operations and avoids the additional logic needed for multidimensional interpolation, data handling, and neighborhood search;</p></list-item><list-item><label>iv.</label>
      <p id="d2e256">It provides a direct, continuous, and analytically defined mapping over the domain of validity, whereas the look-up table still requires interpolation, and in some cases extrapolation, for environmental states not explicitly represented in the tabulated values;</p></list-item><list-item><label>v.</label>
      <p id="d2e260">Its predictive behavior on unseen data can be assessed directly through a train–test evaluation framework; in the present case, training on 20 % of the dataset and testing on the remaining 80 % still yields very good predictive performance, indicating strong generalization.</p></list-item></list> For these reasons, the polynomial approximation is best viewed not as a universal replacement for the look-up-table approach, but as a complementary alternative that is particularly useful in applications where speed, portability, reproducibility, and ease of deployment are important.</p>
      <p id="d2e265">Due to its simplicity and computational efficiency, the polynomial approximation has become the standard way of calculating the UTCI values. It has been incorporated into various bioclimatic software packages and libraries (e.g., the Bioklima software <xref ref-type="bibr" rid="bib1.bibx5" id="paren.12"/>, the Thermofeel Python library <xref ref-type="bibr" rid="bib1.bibx11" id="paren.13"/>, and the pyThermalComfort Python library <xref ref-type="bibr" rid="bib1.bibx42" id="paren.14"/>), as well as numerical weather prediction and reanalysis systems (e.g., the ALADIN model <xref ref-type="bibr" rid="bib1.bibx43" id="paren.15"/>, and the ERA5 reanalysis <xref ref-type="bibr" rid="bib1.bibx18" id="paren.16"/>). At the same time, the error of the polynomial approximation can be substantial. For example, when evaluated on the aforementioned look-up table of accurate UTCI values, the root-mean-square-error is about 1.1 <inline-formula><mml:math id="M7" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> while the frequency of absolute errors larger than 2 <inline-formula><mml:math id="M8" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> is about 8 %, and the frequency of errors larger than 3 <inline-formula><mml:math id="M9" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> is about 2 %. This is problematic since an error of a few degrees Celsius can increase the likelihood of misclassification of the thermal stress category, some of which span only a 6 <inline-formula><mml:math id="M10" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> interval.</p>
      <p id="d2e324">The goal of this study is to develop an improved version of the polynomial approximation – one that has comparable computational complexity to the existing approximation but is more robust in terms of numerical stability and substantially more accurate, particularly in reducing the frequency of larger errors. To achieve this goal, symbolic and sparse regression techniques are used as tools for interpretable and efficient function approximation. We fit the UTCI offset using sparse regression on an orthogonal Legendre polynomial basis. To emphasize this key feature and distinguish it from standard sparse regression on monomials, we refer to this approach as sparse orthogonal regression.</p>
      <p id="d2e327">We also note that the aim was not to derive an approximation that was as accurate as possible. For example, a sufficiently complex neural-network-based model would likely provide more accurate estimates of the UTCI values. However, such a model would also require the use of machine-learning libraries, as well as suitable Graphics Processing Units, to function efficiently. This means that its implementation in various programming languages and computational environments would be substantially more difficult. On the other hand, replacing an existing polynomial approximation with a new one is fairly straightforward, meaning that implementing the new approximation into existing bioclimatic software packages/libraries and numerical weather prediction systems would be relatively easy.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Methods</title>
      <p id="d2e338">Formally, the UTCI is defined as <xref ref-type="bibr" rid="bib1.bibx14" id="paren.17"/>

          <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M11" display="block"><mml:mrow><mml:mtext>UTCI</mml:mtext><mml:mo>=</mml:mo><mml:mtext>Ta</mml:mtext><mml:mo>+</mml:mo><mml:mtext>Offset</mml:mtext><mml:mo>(</mml:mo><mml:mtext>Ta</mml:mtext><mml:mo>,</mml:mo><mml:mtext>va</mml:mtext><mml:mo>,</mml:mo><mml:mtext>Tr</mml:mtext><mml:mo>,</mml:mo><mml:mtext>rH</mml:mtext><mml:mtext> or </mml:mtext><mml:mtext>pa</mml:mtext><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        where Ta is the air temperature and the Offset is the physiologically equivalent temperature difference, representing how other environmental factors modify the effect of the thermal stress on the human body. The Offset function represents the deviation of the UTCI from the actual air temperature and depends on Ta, wind speed at 10 m (va), mean radiant temperature (Tr), which accounts for the effect of all incoming radiation, and humidity, which can be represented by either relative humidity (rH) or water vapour pressure (pa).</p>
      <p id="d2e385">The dataset provided by <xref ref-type="bibr" rid="bib1.bibx14" id="text.18"/> contains accurate values of the Offset function covering a wide range of environmental states. The variables and their ranges are included in Table <xref ref-type="table" rid="T1"/>. The intervals of the environmental variables also represent the domain where the sixth-degree polynomial regression approximation is considered valid <xref ref-type="bibr" rid="bib1.bibx14" id="paren.19"/>. Using the approximation for conditions outside of these intervals can lead to large errors and unrealistic values of the Offset function and should be avoided <xref ref-type="bibr" rid="bib1.bibx12" id="paren.20"/>.</p>

<table-wrap id="T1" specific-use="star"><label>Table 1</label><caption><p id="d2e402">Description of variables used in this study, following <xref ref-type="bibr" rid="bib1.bibx14" id="text.21"/>. The normalized ranges map each variable to <inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, with respect to the interval of validity, suitable for use with Legendre polynomial bases. Although water vapor pressure (pa) is not used directly as an input for the new approximation, it can be computed from air temperature (Ta) and relative humidity (rH), and its effect is therefore accounted for through the inclusion of rH.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Variable</oasis:entry>
         <oasis:entry colname="col2">Description</oasis:entry>
         <oasis:entry colname="col3">Valid range</oasis:entry>
         <oasis:entry colname="col4">Normalized</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">name</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4">range</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Ta</oasis:entry>
         <oasis:entry colname="col2">Air temperature</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M13" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">50</mml:mn></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:mo>+</mml:mo><mml:mn mathvariant="normal">50</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M15" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">va</oasis:entry>
         <oasis:entry colname="col2">Wind speed at 10 m</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M17" display="inline"><mml:mn mathvariant="normal">0.5</mml:mn></mml:math></inline-formula> to <inline-formula><mml:math id="M18" display="inline"><mml:mn mathvariant="normal">30.3</mml:mn></mml:math></inline-formula> <inline-formula><mml:math id="M19" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">m</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">s</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M21" display="inline"><mml:mrow><mml:mtext>Tr</mml:mtext><mml:mo>-</mml:mo><mml:mtext>Ta</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">Mean Radiant–air temperature difference</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M22" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">30</mml:mn></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:mo>+</mml:mo><mml:mn mathvariant="normal">70</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M24" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">rH</oasis:entry>
         <oasis:entry colname="col2">Relative humidity</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M26" display="inline"><mml:mn mathvariant="normal">5</mml:mn></mml:math></inline-formula> % to <inline-formula><mml:math id="M27" display="inline"><mml:mn mathvariant="normal">100</mml:mn></mml:math></inline-formula> %</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">pa</oasis:entry>
         <oasis:entry colname="col2">Water vapour pressure</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M29" display="inline"><mml:mn mathvariant="normal">0</mml:mn></mml:math></inline-formula> to <inline-formula><mml:math id="M30" display="inline"><mml:mn mathvariant="normal">5</mml:mn></mml:math></inline-formula> kPa</oasis:entry>
         <oasis:entry colname="col4">Not used</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e735">In Fig. <xref ref-type="fig" rid="F1"/>a we see how the UTCI Offset varies along the different environmental variables. Instead of the humidity (rH), the water vapor pressure (pa) can be used which is a nonlinear function of rH and the air temperature (Ta). However, the variables have different distribution, see Fig. <xref ref-type="fig" rid="F1"/>b, which impacts the extent that approximations of UTCI can generalize, discussed below.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e744"><bold>(a)</bold> 3D plot of UTCI Offset <xref ref-type="bibr" rid="bib1.bibx14" id="paren.22"/> at 5 % relative humidity, showing how wind speed (va), air temperature (Ta), and mean radiant temperature difference (<inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:mtext>Tr</mml:mtext><mml:mo>-</mml:mo><mml:mtext>Ta</mml:mtext></mml:mrow></mml:math></inline-formula>) combine to influence thermal stress. Color indicates the UTCI Offset magnitude across these environmental dimensions. <bold>(b)</bold> The different distributions of the water vapor pressure and relative humidity in the computed Offset dataset <xref ref-type="bibr" rid="bib1.bibx14" id="paren.23"/>. The water vapor pressure is strongly peaked at zero, while the relative humidity is uniform across its range.</p></caption>
        <graphic xlink:href="https://gmd.copernicus.org/articles/19/4319/2026/gmd-19-4319-2026-f01.png"/>

      </fig>

      <p id="d2e776">Equation discovery aims to learn interpretable mathematical expressions, either differential or algebraic equations, from measurements of the variables of a given observed system <xref ref-type="bibr" rid="bib1.bibx44" id="paren.24"/>. Positioned at the intersection of symbolic machine learning and system identification, it is becoming increasingly relevant in environmental and climate science, where data-driven yet transparent models are essential <xref ref-type="bibr" rid="bib1.bibx38 bib1.bibx30" id="paren.25"/>. Traditional modeling approaches rely on expert-derived formulations <xref ref-type="bibr" rid="bib1.bibx27 bib1.bibx28 bib1.bibx31 bib1.bibx32 bib1.bibx33" id="paren.26"/>, but the growing complexity and volume of climate data call for automated alternatives. Symbolic regression, which iteratively combines mathematical operators and variables to fit data, forms the core of equation discovery <xref ref-type="bibr" rid="bib1.bibx10 bib1.bibx46 bib1.bibx19" id="paren.27"/>. Most methods employ evolutionary or other (e.g., enumerative) search strategies to explore the space of candidate equations <xref ref-type="bibr" rid="bib1.bibx39 bib1.bibx41 bib1.bibx23" id="paren.28"/>.</p>
      <p id="d2e794">Recent advances integrate probabilistic grammars to incorporate prior knowledge and constrain the search to physically meaningful expressions <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx9 bib1.bibx24" id="paren.29"/>. This structured approach improves both model interpretability and search efficiency, especially in domains governed by established scientific principles. Equation discovery has been applied to various environmental systems <xref ref-type="bibr" rid="bib1.bibx1 bib1.bibx4 bib1.bibx3 bib1.bibx2" id="paren.30"/>, including ecosystem dynamics <xref ref-type="bibr" rid="bib1.bibx21 bib1.bibx16 bib1.bibx34 bib1.bibx35 bib1.bibx40" id="paren.31"/>. In these settings, it can match or even surpass expert-built models while simultaneously revealing new relationships <xref ref-type="bibr" rid="bib1.bibx45 bib1.bibx47" id="paren.32"/>. Its ability to generate compact, interpretable, and physically plausible models makes it especially suitable for climate applications, where model transparency and adherence to physical principles are vital.</p>
      <p id="d2e809">As already mentioned, the errors of the sixth-degree regression polynomial from <xref ref-type="bibr" rid="bib1.bibx14" id="text.33"/> can be substantial. Figure <xref ref-type="fig" rid="F2"/>a shows the approximation error at 5 % relative humidity, while Fig. <xref ref-type="fig" rid="F2"/>b displays a histogram of the errors, revealing a normal distribution centered at zero, indicating minimal bias. We aim to improve upon this standard approximation using equation discovery and sparse regression methods by utilizing the accurate Offset dataset provided by <xref ref-type="bibr" rid="bib1.bibx14" id="text.34"/>.</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e825">The error of the standard polynomial UTCI approximation <xref ref-type="bibr" rid="bib1.bibx14" id="paren.35"/> for relative humidity of 5 %. <bold>(a)</bold> The difference between the standard UTCI approximation and the accurate values of the Offset function. <bold>(b)</bold> Histogram of the differences showing a normal distribution centered at zero.</p></caption>
        <graphic xlink:href="https://gmd.copernicus.org/articles/19/4319/2026/gmd-19-4319-2026-f02.png"/>

      </fig>

      <p id="d2e843">Sparse machine learning models aim to construct parsimonious predictive functions by enforcing zero-valued coefficients in high-dimensional parameter spaces, thereby performing implicit feature selection <xref ref-type="bibr" rid="bib1.bibx15" id="paren.36"/>. This sparsity promotes interpretability, reduces overfitting, and improves computational tractability, especially when the number of candidate predictors is large or when strong correlations exist among inputs. Sparse regression <xref ref-type="bibr" rid="bib1.bibx15" id="paren.37"/>, a key instantiation of this paradigm, extends linear regression with an <inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>-norm regularization term – most notably in the Lasso <xref ref-type="bibr" rid="bib1.bibx26" id="paren.38"><named-content content-type="pre">Least Absolute Shrinkage and Selection Operator,</named-content></xref> – to penalize unnecessary parameters and induce a compact representation.</p>
      <p id="d2e868">In this work, we employ sparse regression to identify compact, interpretable models of the UTCI, emphasizing its suitability for high-dimensional input spaces with redundant or weakly relevant features. While sparse modeling is well-established in statistical learning, its application to orthogonal polynomial bases-particularly in the context of bioclimatic indices – remains unexplored. By leveraging the structure of orthogonal polynomials, we obtain improved numerical stability and additive expansions that facilitate coefficient interpretability. To our knowledge, this is the first application of sparse regression using orthogonal bases to approximate the UTCI, addressing both predictive accuracy and model parsimony. Our results show that this approach surpasses the standard sixth-degree polynomial approximation in both accuracy and efficiency.</p>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Results and discussion</title>
      <p id="d2e879">Table <xref ref-type="table" rid="T2"/> presents a detailed comparison of model performance across a range of polynomial degrees for both standard (non-sparse) linear regression and sparse regression techniques, evaluated in the context of approximating the UTCI. The standard approximation <xref ref-type="bibr" rid="bib1.bibx14" id="paren.39"/> is a sixth-degree regression polynomial model with four variables, consisting of 210 terms and achieving a root mean squared loss of <inline-formula><mml:math id="M33" display="inline"><mml:mn mathvariant="normal">1.12</mml:mn></mml:math></inline-formula> <inline-formula><mml:math id="M34" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula>. This serves as the benchmark to be matched or improved upon. It is important to note that the standard approximation does not directly employ the relative humidity (rH), but the water vapor pressure (pa), which can be derived from the relative humidity (rH) and air temperature (Ta). As we noted above, in the dataset, the relative humidity is well represented across its entire range, see Fig. <xref ref-type="fig" rid="F1"/>b, while the water vapor pressure is strongly peaked close to zero. Optimization employing the water vapor pressure (pa) as an independent variable (instead of rH) is thus poorly conditioned and leads to instability in the regression coefficients, both in simple and sparse regression. While using the pa (instead of rH) can achieve better accuracy (lower loss), it comes at the price of losing parameter consistency across optimizations with different polynomial degrees. For this reason, we report our results employing the relative humidity (rH) instead of the water vapor pressure (pa), see Table <xref ref-type="table" rid="T1"/>.</p>

<table-wrap id="T2" specific-use="star"><label>Table 2</label><caption><p id="d2e912">Root mean squared train loss [<inline-formula><mml:math id="M35" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula>], test loss [<inline-formula><mml:math id="M36" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula>] and the number of parameters (shown in parenthesis) in approximating the UTCI Offset. The baseline reference, labeled as “Standard”, corresponds to the sixth-degree regression polynomial model with four variables <xref ref-type="bibr" rid="bib1.bibx14" id="paren.40"/>. Unless otherwise stated the test loss equals the train loss. Where two loss values are reported (train loss on the top and test loss below), they indicate a notable train-test discrepancy, typically suggesting overfitting. Training is done with 20 % of the data and testing is performed with 80 %. Results are robust under bootstrapping. Bold indicates the model selected as the final UTCI approximation.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="8">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Method</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6"/>
         <oasis:entry colname="col7"/>
         <oasis:entry colname="col8"/>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Standard</oasis:entry>
         <oasis:entry namest="col2" nameend="col8" align="center">1.12 </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry namest="col2" nameend="col8" align="center">(210) </oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry rowsep="1" namest="col2" nameend="col8" align="center">Polynomial degree </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">4th</oasis:entry>
         <oasis:entry colname="col3">6th</oasis:entry>
         <oasis:entry colname="col4">8th</oasis:entry>
         <oasis:entry colname="col5">10th</oasis:entry>
         <oasis:entry colname="col6">12th</oasis:entry>
         <oasis:entry colname="col7">14th</oasis:entry>
         <oasis:entry colname="col8">16th</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Linear</oasis:entry>
         <oasis:entry colname="col2">2.1</oasis:entry>
         <oasis:entry colname="col3">1.3</oasis:entry>
         <oasis:entry colname="col4">0.92</oasis:entry>
         <oasis:entry colname="col5">Train: 0.67</oasis:entry>
         <oasis:entry colname="col6">0.54</oasis:entry>
         <oasis:entry colname="col7">0.44</oasis:entry>
         <oasis:entry colname="col8">0.36</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">regression</oasis:entry>
         <oasis:entry colname="col2">(70)</oasis:entry>
         <oasis:entry colname="col3">(210)</oasis:entry>
         <oasis:entry colname="col4">(495)</oasis:entry>
         <oasis:entry colname="col5">Test: 0.71</oasis:entry>
         <oasis:entry colname="col6">0.62</oasis:entry>
         <oasis:entry colname="col7">0.66</oasis:entry>
         <oasis:entry colname="col8">1.74</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5">(1001)</oasis:entry>
         <oasis:entry colname="col6">(1820)</oasis:entry>
         <oasis:entry colname="col7">(3060)</oasis:entry>
         <oasis:entry colname="col8">(4845)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Sparse</oasis:entry>
         <oasis:entry colname="col2">2.1</oasis:entry>
         <oasis:entry colname="col3">1.38</oasis:entry>
         <oasis:entry colname="col4">1.03</oasis:entry>
         <oasis:entry colname="col5"><bold>0.88</bold></oasis:entry>
         <oasis:entry colname="col6">0.69</oasis:entry>
         <oasis:entry colname="col7">0.63</oasis:entry>
         <oasis:entry colname="col8">0.6</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">orthogonal</oasis:entry>
         <oasis:entry colname="col2">(65)</oasis:entry>
         <oasis:entry colname="col3">(124)</oasis:entry>
         <oasis:entry colname="col4">(176)</oasis:entry>
         <oasis:entry colname="col5">(209)</oasis:entry>
         <oasis:entry colname="col6">(355)</oasis:entry>
         <oasis:entry colname="col7">(400)</oasis:entry>
         <oasis:entry colname="col8">(424)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">regression</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6"/>
         <oasis:entry colname="col7"/>
         <oasis:entry colname="col8"/>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e1198">The regression methods are applied to polynomial basis expansions of increasing degree, evaluated on the basis of root mean squared test loss and number of active parameters. Unlike many studies in the literature where models are trained on the majority of the data and evaluated on a relatively small test set, our approach inverts this paradigm: training is conducted on only 20 % of the available data, while performance is assessed on the remaining 80 %. Despite this stringent evaluation setting, the models achieve comparable performance on both training and test sets, underscoring their strong generalization capabilities. This performance stability is further validated through bootstrapping, which reveals minimal variance in both loss metrics and selected features across resampled datasets. The reported performance metrics – such as train/test loss and number of parameters – remain stable when the model training and evaluation process is repeated on multiple random re-samplings (bootstrapped subsets) of the data. This suggests that the results are not sensitive to specific data splits and that the models generalize well across different subsets of the dataset, indicating reliability and consistency in the reported findings. These findings demonstrate the robustness and reliability of the proposed framework.</p>
      <p id="d2e1202">To make the fitted model class explicit, let <inline-formula><mml:math id="M37" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>T</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">a</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M38" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>v</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">a</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M39" display="inline"><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">r</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="true" mathvariant="normal">̃</mml:mo></mml:mover></mml:math></inline-formula>, and <inline-formula><mml:math id="M40" display="inline"><mml:mover accent="true"><mml:mtext>rH</mml:mtext><mml:mo mathvariant="normal" stretchy="true">̃</mml:mo></mml:mover></mml:math></inline-formula> denote the normalized versions of <inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">r</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, and rH, respectively, each mapped to the interval <inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> according to the ranges in Table <xref ref-type="table" rid="T1"/>. In this formulation, relative humidity is retained as an input variable in order to account for the effect of water vapor. The approximation of the UTCI offset can then be written in the general form

          <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M45" display="block"><mml:mrow><mml:mover accent="true"><mml:mtext>Offset</mml:mtext><mml:mo mathvariant="normal" stretchy="true">^</mml:mo></mml:mover><mml:mo>(</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">r</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mtext>rH</mml:mtext><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:munder><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">α</mml:mi><mml:mo>∈</mml:mo><mml:msub><mml:mi mathvariant="script">A</mml:mi><mml:mi>p</mml:mi></mml:msub></mml:mrow></mml:munder><mml:msub><mml:mi>c</mml:mi><mml:mi mathvariant="bold-italic">α</mml:mi></mml:msub><mml:munderover><mml:mo movablelimits="false">∏</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mn mathvariant="normal">4</mml:mn></mml:munderover><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        
        where <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> denotes the Legendre polynomial of degree <inline-formula><mml:math id="M47" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>T</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">a</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>v</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover><mml:mi mathvariant="normal">a</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">r</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="true" mathvariant="normal">̃</mml:mo></mml:mover><mml:mo>,</mml:mo><mml:mover accent="true"><mml:mtext>rH</mml:mtext><mml:mo stretchy="true" mathvariant="normal">̃</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">α</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is a multi-index, and

          <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M50" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="script">A</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mfenced open="{" close="}"><mml:mrow><mml:mi mathvariant="bold-italic">α</mml:mi><mml:mo>∈</mml:mo><mml:msubsup><mml:mi mathvariant="double-struck">N</mml:mi><mml:mn mathvariant="normal">0</mml:mn><mml:mn mathvariant="normal">4</mml:mn></mml:msubsup><mml:mo>:</mml:mo><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub><mml:mo>≤</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:math></disp-formula>

        is the set of all basis terms up to total polynomial degree <inline-formula><mml:math id="M51" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>. Thus, the model is a linear combination of products of Legendre polynomials in the four normalized environmental variables. For a given maximum degree <inline-formula><mml:math id="M52" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>, the full candidate basis contains <inline-formula><mml:math id="M53" display="inline"><mml:mfenced close=")" open="("><mml:mfrac linethickness="0"><mml:mrow><mml:mi>p</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow><mml:mn mathvariant="normal">4</mml:mn></mml:mfrac></mml:mfenced></mml:math></inline-formula> terms, which yields the sequence <inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:mn mathvariant="normal">70</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">210</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">495</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi></mml:mrow></mml:math></inline-formula> reported in Table <xref ref-type="table" rid="T2"/> for degrees <inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">6</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">8</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi></mml:mrow></mml:math></inline-formula>. Sparse orthogonal regression restricts this expansion by retaining only a subset of the candidate terms,

          <disp-formula id="Ch1.E4" content-type="numbered"><label>4</label><mml:math id="M56" display="block"><mml:mrow><mml:mover accent="true"><mml:mtext>Offset</mml:mtext><mml:mo stretchy="true" mathvariant="normal">^</mml:mo></mml:mover><mml:mo>(</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">r</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mtext>rH</mml:mtext><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:munder><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">α</mml:mi><mml:mo>∈</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi>p</mml:mi></mml:msub></mml:mrow></mml:munder><mml:msub><mml:mi>c</mml:mi><mml:mi mathvariant="bold-italic">α</mml:mi></mml:msub><mml:munderover><mml:mo movablelimits="false">∏</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mn mathvariant="normal">4</mml:mn></mml:munderover><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        where <inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo>⊆</mml:mo><mml:msub><mml:mi mathvariant="script">A</mml:mi><mml:mi>p</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is selected by the Lasso regularization. The number of active parameters therefore depends on two factors: the maximum polynomial degree, which determines the size of the candidate pool, and the regularization strength, which determines how many of those candidate terms are retained in the final model. This is the reason why the number of parameters changes across polynomial degrees and also along the Pareto fronts shown in Fig. <xref ref-type="fig" rid="F3"/>. In this sense, the approximation can be viewed as a Fourier-like decomposition in an orthogonal polynomial basis, where lower-order terms capture the dominant structure of the UTCI offset and higher-order terms provide progressively finer corrections. A key advantage of the orthogonal basis is that it yields order-by-order consistency, see Fig. <xref ref-type="fig" rid="F4"/>: when higher-degree terms are introduced, the coefficients associated with lower-order structure remain much more stable than in regressions based on ordinary monomials.</p>

      <fig id="F3"><label>Figure 3</label><caption><p id="d2e1822">Loss versus number of parameters for different polynomial degrees. The regularization parameter was varied in the lasso regression to yield a Pareto front in model accuracy and complexity for each degree.</p></caption>
        <graphic xlink:href="https://gmd.copernicus.org/articles/19/4319/2026/gmd-19-4319-2026-f03.png"/>

      </fig>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e1833">Parameters (or polynomial coefficients) and how they change for different polynomial degrees for <bold>(a)</bold> simple regression and <bold>(b)</bold> sparse regression (using Legendre basis). <bold>(c)</bold> Sorted sparse-regression coefficients (Legendre basis) versus parameter index on a logarithmic <inline-formula><mml:math id="M58" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axis show a clear, Fourier–like decay with order – approximately <inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula> – that is stable across model capacities (degrees 4, 8, 12, 16), indicating a hierarchical structure where lower–order terms dominate and higher–order terms provide incremental refinement.</p></caption>
        <graphic xlink:href="https://gmd.copernicus.org/articles/19/4319/2026/gmd-19-4319-2026-f04.png"/>

      </fig>

      <p id="d2e1870">Linear regression without any sparsity constraints shows improved performance at higher degrees, with test loss reducing as model capacity increases. However, this comes with a dramatic increase in the number of parameters; it reaches over 1800 coefficients by degree 12. Furthermore, the discrepancy between train and test losses at higher degrees (e.g., 0.62 vs. 0.54 <inline-formula><mml:math id="M60" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> at degree 12) indicates overfitting, despite the improved predictive accuracy. The resulting models are also substantially more complex, raising concerns regarding interpretation and generalization. Sparse regression with standard polynomial bases shows similar performance at low degrees but fails to converge beyond the 6th degree. This indicates that enforcing sparsity in a poorly conditioned basis becomes increasingly difficult as model complexity grows.</p>
      <p id="d2e1883">In contrast, sparse regression using an orthogonal Legendre basis (or sparse orthogonal regression) exhibits superior stability and accuracy across all degrees. It outperforms the baseline 6th-degree polynomial fit from degree 8th onward, achieving a test loss of 0.88 <inline-formula><mml:math id="M61" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> at degree 10 with only 209 parameters – almost the same count as the original benchmark model, but with improved generalization. As the degree increases to 16, the loss reduces further to 0.60 <inline-formula><mml:math id="M62" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> using 424 parameters – a fraction of those used by the corresponding standard regression model. The orthogonality of the Legendre basis likely contributes to better numerical conditioning, facilitating sparse model discovery even at high degrees. These results emphasize the importance of basis selection and regularization strategy in symbolic regression tasks. Sparse methods, when combined with well-structured bases like Legendre polynomials, offer a promising path toward accurate, compact, and interpretable models in high-dimensional settings.</p>
      <p id="d2e1907">Furthermore, optimization of nonlinear objective functions using gradient-based algorithms can be computationally intensive, especially in high-dimensional spaces where convergence is slow and local minima may hinder performance. In contrast, the regression-based approach proposed in this article – particularly through sparse regression with orthogonal polynomials – offers significantly faster computation. By framing the problem as a structured regression task rather than a nonlinear optimization, the method avoids costly iterative procedures and scales efficiently with dimensionality, making it highly suitable for rapid modeling of complex environmental indices like the UTCI.</p>
      <p id="d2e1910">Figure <xref ref-type="fig" rid="F3"/> illustrates the relationship between model complexity (measured by the number of parameters) and prediction accuracy (log-scaled loss) for sparse regression models using Legendre polynomial bases of varying degrees. Each curve corresponds to a fixed polynomial degree, ranging from 4 to 16, with points reflecting models of increasing complexity obtained through regularization. A clear trend is observed: for a given polynomial degree, increasing the number of parameters generally results in improved model accuracy (i.e., lower loss). However, diminishing returns set in, and the rate of improvement flattens. More notably, the envelope formed by the lowest loss at each level of complexity across all degrees traces an emergent Pareto front <xref ref-type="bibr" rid="bib1.bibx36" id="paren.41"/>. This front captures the trade-off between model simplicity and predictive performance.</p>
      <p id="d2e1918">Higher-degree models (e.g., degrees 12–16) dominate this frontier at higher parameter counts, offering better loss with only marginal increases in complexity. In contrast, lower-degree models saturate quickly, highlighting their limited expressivity. The Pareto front thus reflects the optimal set of models that balance accuracy and sparsity, guiding model selection under complexity constraints. The use of Legendre polynomials ensures numerical stability and encourages efficient basis representations, which supports the recovery of compact yet accurate models in this sparse regression setting.</p>
      <p id="d2e1921">In Fig. <xref ref-type="fig" rid="F4"/>a and b we visualize the behavior of regression coefficients obtained from simple regression and sparse regression with orthogonal Legendre polynomials. Both plots use a logarithmic <inline-formula><mml:math id="M63" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axis to indicate the parameter index and reveal how coefficients evolve as higher-degree polynomial terms are introduced. In Fig. <xref ref-type="fig" rid="F4"/>a, each line corresponds to simple regression solutions using polynomial bases of increasing degree. The <inline-formula><mml:math id="M64" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axis denotes the index of polynomial terms (sorted or sequential), while the <inline-formula><mml:math id="M65" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis shows the corresponding coefficient values. A key observation is that the coefficients of lower-degree terms (left side of the plot) are not stable across model orders. As higher-degree terms are added, previously estimated lower-order coefficients shift significantly, often changing sign and magnitude.</p>
      <p id="d2e1949">Figure <xref ref-type="fig" rid="F4"/>b presents coefficient values for sparse regression using Legendre polynomials, with colors indicating contributions from different polynomial degrees. Here, a contrasting pattern emerges: coefficients associated with lower-degree terms remain stable as higher-degree terms are added. New coefficients primarily emerge in the higher-order region of the <inline-formula><mml:math id="M66" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axis, without disturbing the existing ones. This stability results from the orthogonality of the Legendre basis, which decorrelates the polynomial terms and enables additive refinement without re-tuning existing coefficients.</p>
      <p id="d2e1961">The contrast between the Fig. <xref ref-type="fig" rid="F4"/>a and b underscores the advantage of orthogonal polynomial bases in sparse regression. Simple regression results in unstable, entangled coefficient estimates that shift with basis expansion, complicating interpretability and reuse. Sparse regression with ordinary polynomial bases fails to converge for higher degrees. In contrast, sparsity and orthogonal polynomials yield stable, hierarchical models where lower-order structure is preserved and higher-order terms incrementally enrich the representation. This behavior is particularly valuable for symbolic regression and interpretable modeling, where each term ideally reflects a distinct, meaningful contribution to the model output.</p>
      <p id="d2e1967">In Fourier analysis, the magnitude of coefficients typically decays as <inline-formula><mml:math id="M67" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula> (where <inline-formula><mml:math id="M68" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> is the order of the term) for functions of bounded variation <xref ref-type="bibr" rid="bib1.bibx37" id="paren.42"/> – a class that includes many naturally occurring signals and is a reasonable assumption for observational data. This decay reflects the fact that higher-order (or higher-frequency) components contribute less to the overall structure of such functions. A similar trend is observed in sparse regression using orthogonal polynomial bases, see Fig. <xref ref-type="fig" rid="F4"/>c. When coefficients are sorted by magnitude, they exhibit a clear decreasing pattern, analogous to the Fourier case, with lower-order terms capturing the dominant structure and higher-order terms refining the approximation in a controlled manner.</p>
      <p id="d2e1994">This suggests that through the use of sparse regression with an orthogonal polynomial basis, we have achieved a Fourier-like decomposition of the UTCI Offset in the Legendre basis (instead of the trigonometric one). This has a number of theoretical advantages: due to the orthogonality of the basis functions, the decomposition minimizes the <inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> distance (least squares) between approximation and function, guaranteeing the best possible polynomial fit for a given model complexity <xref ref-type="bibr" rid="bib1.bibx37" id="paren.43"/>. Additionally, the coefficients are uncorrelated and hierarchically structured, ensuring that lower-order components remain stable as higher-order terms are added – enhancing both interpretability and numerical robustness.</p>
      <p id="d2e2011">Based on the analysis results and one of the initial goals (that the new approximation should have comparable computational complexity to the existing one), we selected the sparse regression model based on tenth-degree Legendre polynomials as the most suitable approximation. The final version of the new polynomial, which has 209 coefficients, was calculated using the whole dataset of tabulated values.</p>
      <p id="d2e2014">Figure <xref ref-type="fig" rid="F5"/>a shows the spatial distribution of the Offset errors for the new approximation at a fixed relative humidity of 5 %. The errors are small and smoothly varying, indicating good agreement across the input space. Figure <xref ref-type="fig" rid="F5"/>b presents a comparison of error histograms for both the standard and new approximations. The sparse-model-based approximation produces a narrower, more sharply peaked distribution centered at zero, highlighting a reduction in error variance and suggesting better generalization. Figure <xref ref-type="fig" rid="F5"/>c shows the cumulative distribution of absolute errors for the two approximations. The curve for the new approximation rises more steeply and reaches higher cumulative values at lower error thresholds, indicating that a larger proportion of predictions fall within smaller error margins.</p>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e2025"><bold>(a)</bold> Spatial distribution of the UTCI Offset error (approximation minus reference) for the new sparse-model-based approximation at a fixed relative humidity of 5 %, showing small, smoothly varying discrepancies. <bold>(b)</bold> Comparison of error histograms for the standard UTCI approximation and the new approximation based on the tenth-degree Legendre polynomials. <bold>(c)</bold> Cumulative distributions of the absolute errors of the two approximations.</p></caption>
        <graphic xlink:href="https://gmd.copernicus.org/articles/19/4319/2026/gmd-19-4319-2026-f05.png"/>

      </fig>

      <p id="d2e2043">Table <xref ref-type="table" rid="T3"/> summarizes the most relevant properties of the two approximations. The results show a clear improvement in accuracy: the new approximation not only substantially reduces the average errors (i.e., the mean error, the mean absolute error, and the root mean square error) but also drastically reduces the frequency of large deviations compared to the standard approximation. For example, the frequency of absolute errors larger than <inline-formula><mml:math id="M70" display="inline"><mml:mn mathvariant="normal">2</mml:mn></mml:math></inline-formula> <inline-formula><mml:math id="M71" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> is halved from 8 % to 4 %, the frequency of errors larger than <inline-formula><mml:math id="M72" display="inline"><mml:mn mathvariant="normal">3</mml:mn></mml:math></inline-formula> <inline-formula><mml:math id="M73" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> reduces from 2 % to 0.5 %, while the frequency of errors larger than <inline-formula><mml:math id="M74" display="inline"><mml:mn mathvariant="normal">4</mml:mn></mml:math></inline-formula> <inline-formula><mml:math id="M75" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> reduces from 0.3 % to 0.01 %. These results clearly show the added benefits of the new approximation and confirm that the sparse regression approach can achieve comparable or improved predictive accuracy while maintaining interpretability and model parsimony.</p>

<table-wrap id="T3" specific-use="star"><label>Table 3</label><caption><p id="d2e2103">Comparison of properties of the standard <xref ref-type="bibr" rid="bib1.bibx14" id="paren.44"/> and new polynomial approximations of UTCI Offset function. The values outside of the parentheses reflect the evaluation of the approximations on the full dataset of <inline-formula><mml:math id="M76" display="inline"><mml:mrow><mml:mn mathvariant="normal">104</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">643</mml:mn></mml:mrow></mml:math></inline-formula> accurate Offset values provided by <xref ref-type="bibr" rid="bib1.bibx14" id="text.45"/>. The values shown in the parentheses reflect the evaluation using the independent dataset of 1000 accurate UTCI values <xref ref-type="bibr" rid="bib1.bibx13" id="paren.46"/>, which were not used during the development of the new approximation. Both approximations are only valid for the intervals of environmental variables available in the full dataset (Table <xref ref-type="table" rid="T1"/>).</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Standard</oasis:entry>
         <oasis:entry colname="col3">New</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">approximation</oasis:entry>
         <oasis:entry colname="col3">approximation</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Polynomial degree</oasis:entry>
         <oasis:entry colname="col2">6th</oasis:entry>
         <oasis:entry colname="col3">10th</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Basis functions</oasis:entry>
         <oasis:entry colname="col2">monomials</oasis:entry>
         <oasis:entry colname="col3">Legendre</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Number of coefficients</oasis:entry>
         <oasis:entry colname="col2">210</oasis:entry>
         <oasis:entry colname="col3">209</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Mean Error</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:mn mathvariant="normal">1.7</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M78" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> (0.35 <inline-formula><mml:math id="M79" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M80" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2.7</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">15</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M81" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> (0.22 <inline-formula><mml:math id="M82" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Mean Absolute Error</oasis:entry>
         <oasis:entry colname="col2">0.81 <inline-formula><mml:math id="M83" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> (1.33 <inline-formula><mml:math id="M84" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col3">0.64 <inline-formula><mml:math id="M85" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> (0.71 <inline-formula><mml:math id="M86" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Root Mean Square Error</oasis:entry>
         <oasis:entry colname="col2">1.17 <inline-formula><mml:math id="M87" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> (2.77 <inline-formula><mml:math id="M88" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col3">0.88 <inline-formula><mml:math id="M89" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> (0.96 <inline-formula><mml:math id="M90" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Freq. of abs. errors larger than 2 <inline-formula><mml:math id="M91" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">8.4 % (15.5 %)</oasis:entry>
         <oasis:entry colname="col3">4.2 % (5.0 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Freq. of abs. errors larger than 3 <inline-formula><mml:math id="M92" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">2.2 % (6.3 %)</oasis:entry>
         <oasis:entry colname="col3">0.50 % (0.60 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Freq. of abs. errors larger than 4 <inline-formula><mml:math id="M93" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">0.34 % (3.8 %)</oasis:entry>
         <oasis:entry colname="col3">0.011 % (0.10 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Freq. of abs. errors larger than 5 <inline-formula><mml:math id="M94" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">0.038 % (3.3 %)</oasis:entry>
         <oasis:entry colname="col3">0.00096 % (0 %)</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e2480">We also evaluated the new approximation on the independent dataset of 1000 accurate UTCI values, which were not used during the development of the approximation. This dataset was prepared by the authors of the <xref ref-type="bibr" rid="bib1.bibx14" id="text.47"/> paper, and is freely available on a Zenodo repository <xref ref-type="bibr" rid="bib1.bibx13" id="paren.48"/>. Similarly to the evaluation of the new approximation on the full dataset, evaluation on the independent dataset shows a substantial reduction of the mean errors and a drastic reduction in the frequency of large errors compared to the standard approximation (Table <xref ref-type="table" rid="T3"/>).</p>
      <p id="d2e2491">Since the new approximation was determined using the full dataset of accurate Offset values <xref ref-type="bibr" rid="bib1.bibx14" id="paren.49"/>, it is, same as the standard approximation, only valid for the intervals of environmental variables available in this dataset (Table <xref ref-type="table" rid="T1"/>). Using the approximation for conditions outside of these intervals can potentially lead to large errors or unrealistic results and should be avoided.</p>
</sec>
<sec id="Ch1.S4" sec-type="conclusions">
  <label>4</label><title>Conclusions</title>
      <p id="d2e2507">The goal of this study was to develop an improved version of the polynomial approximation – one that would have comparable computational complexity to the existing approximation but would be more robust in terms of numerical stability and substantially more accurate, particularly in reducing the frequency of larger errors. This goal was successfully achieved using sparse regression with an orthogonal polynomial basis.</p>
      <p id="d2e2510">Sparse regression methods, such as LASSO, helped reduce overfitting and improve interpretability. As we have shown, the choice of basis functions is crucial: orthogonal polynomials like Legendre polynomials offer better numerical stability and conditioning than monomials. They enable hierarchical models where higher-order terms don’t affect lower-order estimates, making them especially useful in sparse, interpretable models. Empirical results support these theoretical advantages.</p>
      <p id="d2e2513">Using sparse regression with an orthogonal polynomial basis (or sparse orthogonal regression), we have: <list list-type="custom"><list-item><label>a.</label>
      <p id="d2e2518">Achieved substantially better accuracy – compared to the standard approximation, the new approximation not only substantially reduces the average errors (i.e., the mean error, the mean absolute error, and the root mean square error) but also drastically reduces the frequency of large errors.</p></list-item><list-item><label>b.</label>
      <p id="d2e2522">Retained a comparable computational complexity – the number of coefficients is almost the same for both approximations, meaning the computational complexity is comparable.</p></list-item><list-item><label>c.</label>
      <p id="d2e2526">Found a Pareto front for different model complexities – loss curves reveal that sparse models with orthogonal bases efficiently populate a Pareto front, balancing complexity and accuracy.</p></list-item><list-item><label>d.</label>
      <p id="d2e2530">Determined coefficients consistent over models with different capacities – coefficient plots for models built on orthogonal bases show the progressive inclusion of higher-order components without disrupting lower-order structure, in contrast to models using simple regression and ordinary polynomials.</p></list-item><list-item><label>e.</label>
      <p id="d2e2534">Achieved successful generalization – training the model over only 20 % of the data, while testing was performed over the other 80 %, highlights successful generalization. The results are also robust under bootstrapping.</p></list-item><list-item><label>f.</label>
      <p id="d2e2538">Essentially decomposed the UTCI in a Fourier expansion with a Legendre-polynomial basis, with parameters scaling as expected. Thus, we are arguably close to the theoretical optimum results for a robust approximation in the <inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> metric (or least squares).</p></list-item></list></p>
      <p id="d2e2552">Sparse orthogonal regression provides an effective framework for constructing accurate and numerically stable polynomial approximations of the UTCI. Our main contribution is therefore not methodological novelty in sparse regression itself, but the use of an orthogonal polynomial basis as a practical approximation strategy with favorable numerical properties, including order-by-order consistency and stable low-order truncations. In addition, the results obtained from random train–test splits, together with their robustness under bootstrapping, show that using only 20 % of the data for training is not a requirement of the method, but a deliberately stringent test of generalization. The comparable performance on the remaining 80 % of the data indicates that the approach remains accurate, robust, and efficient even under a severe limitation in the number of training data points, while remaining well suited for practical applications that require portability and ease of implementation.</p>
      <p id="d2e2556">We have also prepared an easy-to-use Python function for the new approximation (please refer to the Code and data availability section on how to obtain the code). The code relies only on basic mathematical operations, which makes it easy to adapt to other programming languages, such as Fortran or C++. We also implemented a check to see if the environmental state falls within the domain of validity of the approximation. If this is not the case, the code produces a warning that the resulting UTCI values could have large errors or be unrealistic.</p>
</sec>

      
      </body>
    <back><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d2e2564">The code used to calculate the new UTCI approximation, generate the reported model comparisons, and reproduce the analysis, tables, and figures presented in this paper is archived on Zenodo (<ext-link xlink:href="https://doi.org/10.5281/zenodo.16880382" ext-link-type="DOI">10.5281/zenodo.16880382</ext-link>, <xref ref-type="bibr" rid="bib1.bibx29" id="altparen.50"/>). The archive includes reproducibility instructions and the required Python environment specification.</p>

      <p id="d2e2573">The offset data used for fitting and evaluating the approximation are the supplementary material of <xref ref-type="bibr" rid="bib1.bibx14" id="text.51"/>, available from the publisher as electronic supplementary material and downloaded automatically by the reproduction code. The independent UTCI test data used for additional validation are publicly available on Zenodo (<ext-link xlink:href="https://doi.org/10.5281/zenodo.5503967" ext-link-type="DOI">10.5281/zenodo.5503967</ext-link>, <xref ref-type="bibr" rid="bib1.bibx13" id="altparen.52"/>). No additional non-public data were used.</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e2588">SR – Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing (original draft preparation), GS – Conceptualization, Resources, Validation, Software, Writing (review and editing), LT – Conceptualization, Methodology, Project administration, Supervision, Validation, Writing (review and editing), SD – Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Writing (review and editing).</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e2594">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e2600">Co-funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Research Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e2609">The authors appreciate the fruitful discussions within the SHED discussion group with Jure Brence, Nina Omejc, Sebastian Mežnar, and Boštjan Gec.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e2614">This publication is supported by the European Union's Horizon Europe research and innovation programme under the Marie Skłodowska-Curie Postdoctoral Fellowship Programme, SMASH co-funded under the grant agreement No. 101081355. The operation (SMASH project) is co-funded by the Republic of Slovenia and the European Union from the European Regional Development Fund. The authors acknowledge the financial support of the Slovenian Research Agency via the Gravity project AI for Science, GC-0001 and of the Slovenian Research And Innovation Agency (research core funding No. P1-0188).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e2620">This paper was edited by Ting Sun and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Atanasova et al.(2006a)Atanasova, Recknagel, Todorovski, Džeroski, and Kompare</label><mixed-citation>Atanasova, N., Recknagel, F., Todorovski, L., Džeroski, S., and Kompare, B.: Computational assemblage of ordinary differential equations for chlorophyll-<inline-formula><mml:math id="M96" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> using a lake process equation library and measured data of Lake Kasumigaura, Ecological Informatics: Scope, Techniques and Applications, 409–427, <ext-link xlink:href="https://doi.org/10.1007/3-540-28426-5_20" ext-link-type="DOI">10.1007/3-540-28426-5_20</ext-link>, 2006a.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Atanasova et al.(2006b)Atanasova, Todorovski, Džeroski, Remec, Recknagel, and Kompare</label><mixed-citation> Atanasova, N., Todorovski, L., Džeroski, S., Remec, Š. R., Recknagel, F., and Kompare, B.: Automated modelling of a food web in lake Bled using measured data and a library of domain knowledge, Ecol. Model., 194, 37–48, 2006b.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Atanasova et al.(2008)Atanasova, Todorovski, Džeroski, and Kompare</label><mixed-citation> Atanasova, N., Todorovski, L., Džeroski, S., and Kompare, B.: Application of automated model discovery from data and expert knowledge to a real-world domain: Lake Glumsø, Ecol. Model., 212, 92–98, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Atanasova et al.(2011)Atanasova, Džeroski, Kompare, Todorovski, and Gal</label><mixed-citation> Atanasova, N., Džeroski, S., Kompare, B., Todorovski, L., and Gal, G.: Automated discovery of a model for dinoflagellate dynamics, Environ. Modell. Softw., 26, 658–668, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Błażejczyk(2025)</label><mixed-citation>Błażejczyk, K.: BioKlima – Universal tool for bioclimatic and thermophysiological studies, <uri>https://www.igipz.pan.pl/bioklima-crd.html</uri>, last access:: 10 October 2025.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Błażejczyk and Kuchcik(2021)</label><mixed-citation>Błażejczyk, K. and Kuchcik, M.: UTCI applications in practice (methodological questions), Geographia Polonica, 94, <ext-link xlink:href="https://doi.org/10.7163/GPol.0198" ext-link-type="DOI">10.7163/GPol.0198</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Blazejczyk et al.(2012)Blazejczyk, Epstein, Jendritzky, Staiger, and Tinz</label><mixed-citation>Blazejczyk, K., Epstein, Y., Jendritzky, G., Staiger, H., and Tinz, B.: Comparison of UTCI to selected thermal indices, Int. J. Biometeorol., 56, 515–535, <ext-link xlink:href="https://doi.org/10.1007/s00484-011-0453-2" ext-link-type="DOI">10.1007/s00484-011-0453-2</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Brence et al.(2020)Brence, Todorovski, and Džeroski</label><mixed-citation>Brence, J., Todorovski, L., and Džeroski, S.: Probabilistic grammars for equation discovery, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.2012.00428" ext-link-type="DOI">10.48550/arXiv.2012.00428</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Brence et al.(2023)Brence, Džeroski, and Todorovski</label><mixed-citation> Brence, J., Džeroski, S., and Todorovski, L.: Dimensionally-consistent equation discovery through probabilistic attribute grammars, Inform. Sciences, 632, 742–756, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Bridewell et al.(2005)Bridewell, Asadi, Langley, and Todorovski</label><mixed-citation> Bridewell, W., Asadi, N. B., Langley, P., and Todorovski, L.: Reducing overfitting in process model induction, in: Proceedings of the 22nd International Conference on Machine Learning, 81–88,   ISBN 1-55860-486-3, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Brimicombe et al.(2022)Brimicombe, Napoli, Quintino, Pappenberger, Cornforth, and Cloke</label><mixed-citation>Brimicombe, C., Napoli, C. D., Quintino, T., Pappenberger, F., Cornforth, R., and Cloke, H. L.: Thermofeel: A python thermal comfort indices library, SoftwareX, 18, 101005, <ext-link xlink:href="https://doi.org/10.1016/j.softx.2022.101005" ext-link-type="DOI">10.1016/j.softx.2022.101005</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Bröde(2021a)</label><mixed-citation>Bröde, P.: Issues in UTCI Calculation from a Decade's Experience, Springer International Publishing, Cham, 13–21, <ext-link xlink:href="https://doi.org/10.1007/978-3-030-76716-7_2" ext-link-type="DOI">10.1007/978-3-030-76716-7_2</ext-link>, 2021a.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Bröde(2021b)</label><mixed-citation>Bröde, P.: UTCI-Test-Data, Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.5503967" ext-link-type="DOI">10.5281/zenodo.5503967</ext-link>, 2021b.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Bröde et al.(2012)Bröde, Fiala, Błażejczyk, Holmér, Jendritzky, Kampmann, Tinz, and Havenith</label><mixed-citation>Bröde, P., Fiala, D., Błażejczyk, K., Holmér, I., Jendritzky, G., Kampmann, B., Tinz, B., and Havenith, G.: Deriving the operational procedure for the Universal Thermal Climate Index (UTCI), Int. J. Biometeorol., 56, 481–494, <ext-link xlink:href="https://doi.org/10.1007/s00484-011-0454-1" ext-link-type="DOI">10.1007/s00484-011-0454-1</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Brunton et al.(2016)Brunton, Proctor, and Kutz</label><mixed-citation> Brunton, S. L., Proctor, J. L., and Kutz, J. N.: Discovering governing equations from data by sparse identification of nonlinear dynamical systems, P. Natl. Acad. Sci. USA, 113, 3932–3937, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Čerepnalkoski et al.(2012)Čerepnalkoski, Taškova, Todorovski, Atanasova, and Džeroski</label><mixed-citation> Čerepnalkoski, D., Taškova, K., Todorovski, L., Atanasova, N., and Džeroski, S.: The influence of parameter fitting methods on model structure selection in automated modeling of aquatic ecosystems, Ecol. Model., 245, 136–165, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Di Napoli et al.(2021a)Di Napoli, Messeri, Novák, Rio, Wieczorek, Morabito, Silva, Crisci, and Pappenberger</label><mixed-citation>Di Napoli, C., Messeri, A., Novák, M., Rio, J., Wieczorek, J., Morabito, M., Silva, P., Crisci, A., and Pappenberger, F.: The Universal Thermal Climate Index as an Operational Forecasting Tool of Human Biometeorological Conditions in Europe, in: Applications of the Universal Thermal Climate Index UTCI in Biometeorology, Springer International Publishing, Cham, 193–208, <ext-link xlink:href="https://doi.org/10.1007/978-3-030-76716-7_10" ext-link-type="DOI">10.1007/978-3-030-76716-7_10</ext-link>, 2021a.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Di Napoli et al.(2021b)Di Napoli, Barnard, Prudhomme, Cloke, and Pappenberger</label><mixed-citation>Di Napoli, C., Barnard, C., Prudhomme, C., Cloke, H. L., and Pappenberger, F.: ERA5-HEAT: A global gridded historical dataset of human thermal comfort indices from climate reanalysis, Geosci. Data J., 8, <ext-link xlink:href="https://doi.org/10.1002/gdj3.102" ext-link-type="DOI">10.1002/gdj3.102</ext-link>, 2021b.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Džeroski et al.(2007)Džeroski, Langley, and Todorovski</label><mixed-citation>Džeroski, S., Langley, P., and Todorovski, L.: Computational discovery of scientific knowledge, in: Computational discovery of scientific knowledge: Introduction, techniques, and applications in environmental and life sciences, Springer, 1–14, <ext-link xlink:href="https://doi.org/10.1007/978-3-540-73920-3_1" ext-link-type="DOI">10.1007/978-3-540-73920-3_1</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Fiala et al.(2012)Fiala, Havenith, Brode, Kampmann et al.</label><mixed-citation>Fiala, D., Havenith, G., Bröde, P., Kampmann, B., and Jendritzky, G.: UTCI-Fiala multi-node model of human heat transfer and temperature regulation, Int. J. Biometeorol., 56, 429–441, <ext-link xlink:href="https://doi.org/10.1007/s00484-011-0424-7" ext-link-type="DOI">10.1007/s00484-011-0424-7</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Jeraj et al.(2006)Jeraj, Džeroski, Todorovski, and Debeljak</label><mixed-citation> Jeraj, M., Džeroski, S., Todorovski, L., and Debeljak, M.: Application of machine learning methods to palaeoecological data, Ecol. Model., 191, 159–169, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Kuzmanović et al.(2024)Kuzmanović, Banko, and Skok</label><mixed-citation>Kuzmanović, D., Banko, J., and Skok, G.: Improving the operational forecasts of outdoor Universal Thermal Climate Index with post-processing, Int. J. Biometeorol., 68, 965–977, <ext-link xlink:href="https://doi.org/10.1007/s00484-024-02640-6" ext-link-type="DOI">10.1007/s00484-024-02640-6</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Mežnar et al.(2023)Mežnar, Džeroski, and Todorovski</label><mixed-citation> Mežnar, S., Džeroski, S., and Todorovski, L.: Efficient generator of mathematical expressions for symbolic regression, Mach. Learn., 112, 4563–4596, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Omejc et al.(2024)Omejc, Gec, Brence, Todorovski, and Džeroski</label><mixed-citation> Omejc, N., Gec, B., Brence, J., Todorovski, L., and Džeroski, S.: Probabilistic grammars for modeling dynamical systems from coarse, noisy, and partial data, Mach. Learn., 113, 7689–7721, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Pappenberger et al.(2015)Pappenberger, Jendritzky, Staiger, Dutra, Di Giuseppe, Richardson, and Cloke</label><mixed-citation>Pappenberger, F., Jendritzky, G., Staiger, H., Dutra, E., Di Giuseppe, F., Richardson, D. S., and Cloke, H. L.: Global forecasting of thermal health hazards: the skill of probabilistic predictions of the Universal Thermal Climate Index (UTCI), Int. J. Biometeorol., 59, 311–323, <ext-link xlink:href="https://doi.org/10.1007/s00484-014-0843-3" ext-link-type="DOI">10.1007/s00484-014-0843-3</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Reid et al.(2016)Reid, Tibshirani, and Friedman</label><mixed-citation>Reid, S., Tibshirani, R., and Friedman, J.: A study of error variance estimation in lasso regression, Stat. Sinica,   26, 35–67,  <ext-link xlink:href="https://doi.org/10.5705/ss.2014.042" ext-link-type="DOI">10.5705/ss.2014.042</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Roman(2021)</label><mixed-citation>Roman, S.: Historical dynamics of the Chinese dynasties, Heliyon, 7,  e07293, <ext-link xlink:href="https://doi.org/10.1016/j.heliyon.2021.e07293" ext-link-type="DOI">10.1016/j.heliyon.2021.e07293</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Roman(2023)</label><mixed-citation>Roman, S.: Theories and models: Understanding and Predicting Societal Collapse, in: The Era of Global Risk: An Introduction to Existential Risk Studies, Open Book Publishers, 27–54, <ext-link xlink:href="https://doi.org/10.11647/obp.0336.02" ext-link-type="DOI">10.11647/obp.0336.02</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Roman(2025a)</label><mixed-citation>Roman, S.: Code for Approximating the universal thermal climate index (UTCI) using sparse regression with orthogonal polynomials, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.16880382" ext-link-type="DOI">10.5281/zenodo.16880382</ext-link>, 2025a.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Roman(2025b)</label><mixed-citation>Roman, S.: Maximum Entropy Models for Unimodal Time Series: Case Studies of Universe 25 and St. Matthew Island, in: International Conference on Discovery Science, Springer, 32–44, <ext-link xlink:href="https://doi.org/10.1007/978-3-032-05461-6_3" ext-link-type="DOI">10.1007/978-3-032-05461-6_3</ext-link>, 2025b.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Roman and Bertolotti(2022)</label><mixed-citation>Roman, S. and Bertolotti, F.: A master equation for power laws, Roy. Soc. Open Sci., 9, 220531, <ext-link xlink:href="https://doi.org/10.1098/rsos.220531" ext-link-type="DOI">10.1098/rsos.220531</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Roman and Bertolotti(2023)</label><mixed-citation>Roman, S. and Bertolotti, F.: Global history, the emergence of chaos and inducing sustainability in networks of socio-ecological systems, Plos one, 18, e0293391, <ext-link xlink:href="https://doi.org/10.1371/journal.pone.0293391" ext-link-type="DOI">10.1371/journal.pone.0293391</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Roman and Palmer(2019)</label><mixed-citation>Roman, S. and Palmer, E.: The Growth and Decline of the Western Roman Empire: Quantifying the Dynamics of Army Size, Territory, and Coinage, Cliodynamics, 10,  76–98, <ext-link xlink:href="https://doi.org/10.21237/C7clio10243683" ext-link-type="DOI">10.21237/C7clio10243683</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Simidjievski et al.(2015)Simidjievski, Todorovski, and Džeroski</label><mixed-citation> Simidjievski, N., Todorovski, L., and Džeroski, S.: Learning ensembles of population dynamics models and their application to modelling aquatic ecosystems, Ecol. Model., 306, 305–317, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Simidjievski et al.(2016)Simidjievski, Todorovski, and Džeroski</label><mixed-citation>Simidjievski, N., Todorovski, L., and Džeroski, S.: Modeling dynamic systems with efficient ensembles of process-based models, PloS one, 11, e0153507, <ext-link xlink:href="https://doi.org/10.1371/journal.pone.0153507" ext-link-type="DOI">10.1371/journal.pone.0153507</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Smits and Kotanchek(2005)</label><mixed-citation>Smits, G. F. and Kotanchek, M.: Pareto-front exploitation in symbolic regression, in: Genetic programming theory and practice II, Springer, 283–299, <ext-link xlink:href="https://doi.org/10.1007/0-387-23254-0_17" ext-link-type="DOI">10.1007/0-387-23254-0_17</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Stein and Shakarchi(2011)</label><mixed-citation>Stein, E. M. and Shakarchi, R.: Fourier analysis: an introduction, vol. 1, Princeton University Press, ISBN 978-0-691-11384-5, 2011.  </mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Steinmann et al.(2025)Steinmann, Verstegen, Van Voorn, Roman, and Ligtenberg</label><mixed-citation> Steinmann, P., Verstegen, J., Van Voorn, G., Roman, S., and Ligtenberg, A.: Scenario search: finding diverse, plausible and comprehensive scenario sets for complex systems, Socio-Environmental Systems Modelling, 7, 18823–18823, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Tanevski et al.(2016a)Tanevski, Todorovski, and Džeroski</label><mixed-citation> Tanevski, J., Todorovski, L., and Džeroski, S.: Learning stochastic process-based models of dynamical systems from knowledge and data, BMC Syst. Biol., 10, 1–17, 2016a.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>Tanevski et al.(2016b)Tanevski, Todorovski, and Džeroski</label><mixed-citation>Tanevski, J., Todorovski, L., and Džeroski, S.: Process-based design of dynamical biological systems, Sci. Rep.-UK, 6, 34107, <ext-link xlink:href="https://doi.org/10.1038/srep34107" ext-link-type="DOI">10.1038/srep34107</ext-link>, 2016b.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Tanevski et al.(2020)Tanevski, Todorovski, and Džeroski</label><mixed-citation>Tanevski, J., Todorovski, L., and Džeroski, S.: Combinatorial search for selecting the structure of models of dynamical systems with equation discovery, Eng. Appl. Artif. Intel., 89, 103423, <ext-link xlink:href="https://doi.org/10.1016/j.engappai.2019.103423" ext-link-type="DOI">10.1016/j.engappai.2019.103423</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Tartarini and Schiavon(2020)</label><mixed-citation>Tartarini, F. and Schiavon, S.: pythermalcomfort: A Python package for thermal comfort research, SoftwareX, 12, 100578, <ext-link xlink:href="https://doi.org/10.1016/j.softx.2020.100578" ext-link-type="DOI">10.1016/j.softx.2020.100578</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Termonia et al.(2018)Termonia, Fischer, Bazile, Bouyssel, Brozkova, Bénard, Bochenek, Degrauwe, Derková, Khatib, Hamdi, Mašek, Pottier, Pristov, Seity, Smolikova, Španiel, Tudor, Wang, and Joly</label><mixed-citation>Termonia, P., Fischer, C., Bazile, E., Bouyssel, F., Brožková, R., Bénard, P., Bochenek, B., Degrauwe, D., Derková, M., El Khatib, R., Hamdi, R., Mašek, J., Pottier, P., Pristov, N., Seity, Y., Smolíková, P., Španiel, O., Tudor, M., Wang, Y., Wittmann, C., and Joly, A.: The ALADIN System and its canonical model configurations AROME CY41T1 and ALARO CY40T1, Geosci. Model Dev., 11, 257–281, <ext-link xlink:href="https://doi.org/10.5194/gmd-11-257-2018" ext-link-type="DOI">10.5194/gmd-11-257-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Todorovski and Džeroski(1997)</label><mixed-citation> Todorovski, L. and Džeroski, S.: Declarative bias in equation discovery, in: Proceedings of the International Conference on Machine Learning, 376–384,  ISBN 1-55860-486-3, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx45"><label>Todorovski and Džeroski(2001)</label><mixed-citation>Todorovski, L. and Džeroski, S.: Theory revision in equation discovery, in: International Conference on Discovery Science, Springer, 389–400, <ext-link xlink:href="https://doi.org/10.1007/3-540-45650-3_33" ext-link-type="DOI">10.1007/3-540-45650-3_33</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx46"><label>Todorovski and Džeroski(2006)</label><mixed-citation> Todorovski, L. and Džeroski, S.: Integrating knowledge-driven and data-driven approaches to modeling, Ecol. Model., 194, 3–13, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx47"><label>Todorovski et al.(1998)Todorovski, Džeroski, and Kompare</label><mixed-citation> Todorovski, L., Džeroski, S., and Kompare, B.: Modelling and prediction of phytoplankton growth with equation discovery, Ecol. Model., 113, 71–81, 1998.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Approximating the universal thermal climate index using sparse regression with orthogonal polynomials</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Atanasova et al.(2006a)Atanasova, Recknagel, Todorovski, Džeroski, and Kompare</label><mixed-citation>
      
Atanasova, N., Recknagel, F., Todorovski, L., Džeroski, S., and Kompare, B.:
Computational assemblage of ordinary differential equations for chlorophyll-<i>a</i> using a lake process equation library and measured data of Lake Kasumigaura, Ecological Informatics: Scope, Techniques and Applications, 409–427, <a href="https://doi.org/10.1007/3-540-28426-5_20" target="_blank">https://doi.org/10.1007/3-540-28426-5_20</a>, 2006a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Atanasova et al.(2006b)Atanasova, Todorovski, Džeroski, Remec, Recknagel, and Kompare</label><mixed-citation>
      
Atanasova, N., Todorovski, L., Džeroski, S., Remec, Š. R., Recknagel, F., and Kompare, B.:
Automated modelling of a food web in lake Bled using measured data and a library of domain knowledge, Ecol. Model., 194, 37–48, 2006b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Atanasova et al.(2008)Atanasova, Todorovski, Džeroski, and Kompare</label><mixed-citation>
      
Atanasova, N., Todorovski, L., Džeroski, S., and Kompare, B.:
Application of automated model discovery from data and expert knowledge to a real-world domain: Lake Glumsø, Ecol. Model., 212, 92–98, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Atanasova et al.(2011)Atanasova, Džeroski, Kompare, Todorovski, and Gal</label><mixed-citation>
      
Atanasova, N., Džeroski, S., Kompare, B., Todorovski, L., and Gal, G.:
Automated discovery of a model for dinoflagellate dynamics, Environ. Modell. Softw., 26, 658–668, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Błażejczyk(2025)</label><mixed-citation>
      
Błażejczyk, K.:
BioKlima – Universal tool for bioclimatic and thermophysiological studies, <a href="https://www.igipz.pan.pl/bioklima-crd.html" target="_blank"/>, last access:: 10 October 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Błażejczyk and Kuchcik(2021)</label><mixed-citation>
      
Błażejczyk, K. and Kuchcik, M.:
UTCI applications in practice (methodological questions), Geographia Polonica, 94, <a href="https://doi.org/10.7163/GPol.0198" target="_blank">https://doi.org/10.7163/GPol.0198</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Blazejczyk et al.(2012)Blazejczyk, Epstein, Jendritzky, Staiger, and Tinz</label><mixed-citation>
      
Blazejczyk, K., Epstein, Y., Jendritzky, G., Staiger, H., and Tinz, B.:
Comparison of UTCI to selected thermal indices, Int. J. Biometeorol., 56, 515–535, <a href="https://doi.org/10.1007/s00484-011-0453-2" target="_blank">https://doi.org/10.1007/s00484-011-0453-2</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Brence et al.(2020)Brence, Todorovski, and Džeroski</label><mixed-citation>
      
Brence, J., Todorovski, L., and Džeroski, S.:
Probabilistic grammars for equation discovery, arXiv [preprint], <a href="https://doi.org/10.48550/arXiv.2012.00428" target="_blank">https://doi.org/10.48550/arXiv.2012.00428</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Brence et al.(2023)Brence, Džeroski, and Todorovski</label><mixed-citation>
      
Brence, J., Džeroski, S., and Todorovski, L.:
Dimensionally-consistent equation discovery through probabilistic attribute grammars, Inform. Sciences, 632, 742–756, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Bridewell et al.(2005)Bridewell, Asadi, Langley, and Todorovski</label><mixed-citation>
      
Bridewell, W., Asadi, N. B., Langley, P., and Todorovski, L.:
Reducing overfitting in process model induction, in: Proceedings of the 22nd International Conference on Machine Learning, 81–88,   ISBN 1-55860-486-3, 2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Brimicombe et al.(2022)Brimicombe, Napoli, Quintino, Pappenberger, Cornforth, and Cloke</label><mixed-citation>
      
Brimicombe, C., Napoli, C. D., Quintino, T., Pappenberger, F., Cornforth, R., and Cloke, H. L.:
Thermofeel: A python thermal comfort indices library, SoftwareX, 18, 101005, <a href="https://doi.org/10.1016/j.softx.2022.101005" target="_blank">https://doi.org/10.1016/j.softx.2022.101005</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Bröde(2021a)</label><mixed-citation>
      
Bröde, P.:
Issues in UTCI Calculation from a Decade's Experience, Springer International Publishing, Cham, 13–21, <a href="https://doi.org/10.1007/978-3-030-76716-7_2" target="_blank">https://doi.org/10.1007/978-3-030-76716-7_2</a>, 2021a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Bröde(2021b)</label><mixed-citation>
      
Bröde, P.:
UTCI-Test-Data, Zenodo [data set], <a href="https://doi.org/10.5281/zenodo.5503967" target="_blank">https://doi.org/10.5281/zenodo.5503967</a>, 2021b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Bröde et al.(2012)Bröde, Fiala, Błażejczyk, Holmér, Jendritzky, Kampmann, Tinz, and Havenith</label><mixed-citation>
      
Bröde, P., Fiala, D., Błażejczyk, K., Holmér, I., Jendritzky, G., Kampmann, B., Tinz, B., and Havenith, G.:
Deriving the operational procedure for the Universal Thermal Climate Index (UTCI), Int. J. Biometeorol., 56, 481–494, <a href="https://doi.org/10.1007/s00484-011-0454-1" target="_blank">https://doi.org/10.1007/s00484-011-0454-1</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Brunton et al.(2016)Brunton, Proctor, and Kutz</label><mixed-citation>
      
Brunton, S. L., Proctor, J. L., and Kutz, J. N.:
Discovering governing equations from data by sparse identification of nonlinear dynamical systems, P. Natl. Acad. Sci. USA, 113, 3932–3937, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Čerepnalkoski et al.(2012)Čerepnalkoski, Taškova, Todorovski, Atanasova, and Džeroski</label><mixed-citation>
      
Čerepnalkoski, D., Taškova, K., Todorovski, L., Atanasova, N., and Džeroski, S.:
The influence of parameter fitting methods on model structure selection in automated modeling of aquatic ecosystems, Ecol. Model., 245, 136–165, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Di Napoli et al.(2021a)Di Napoli, Messeri, Novák, Rio, Wieczorek, Morabito, Silva, Crisci, and Pappenberger</label><mixed-citation>
      
Di Napoli, C., Messeri, A., Novák, M., Rio, J., Wieczorek, J., Morabito, M., Silva, P., Crisci, A., and Pappenberger, F.:
The Universal Thermal Climate Index as an Operational Forecasting Tool of Human Biometeorological Conditions in Europe, in: Applications of the Universal Thermal Climate Index UTCI in Biometeorology, Springer International Publishing, Cham, 193–208, <a href="https://doi.org/10.1007/978-3-030-76716-7_10" target="_blank">https://doi.org/10.1007/978-3-030-76716-7_10</a>, 2021a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Di Napoli et al.(2021b)Di Napoli, Barnard, Prudhomme, Cloke, and Pappenberger</label><mixed-citation>
      
Di Napoli, C., Barnard, C., Prudhomme, C., Cloke, H. L., and Pappenberger, F.:
ERA5-HEAT: A global gridded historical dataset of human thermal comfort indices from climate reanalysis, Geosci. Data J., 8, <a href="https://doi.org/10.1002/gdj3.102" target="_blank">https://doi.org/10.1002/gdj3.102</a>, 2021b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Džeroski et al.(2007)Džeroski, Langley, and Todorovski</label><mixed-citation>
      
Džeroski, S., Langley, P., and Todorovski, L.:
Computational discovery of scientific knowledge, in: Computational discovery of scientific knowledge: Introduction, techniques, and applications in environmental and life sciences, Springer, 1–14, <a href="https://doi.org/10.1007/978-3-540-73920-3_1" target="_blank">https://doi.org/10.1007/978-3-540-73920-3_1</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Fiala et al.(2012)Fiala, Havenith, Brode, Kampmann et al.</label><mixed-citation>
      
Fiala, D., Havenith, G., Bröde, P., Kampmann, B., and Jendritzky, G.:
UTCI-Fiala multi-node model of human heat transfer and temperature regulation, Int. J. Biometeorol., 56, 429–441, <a href="https://doi.org/10.1007/s00484-011-0424-7" target="_blank">https://doi.org/10.1007/s00484-011-0424-7</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Jeraj et al.(2006)Jeraj, Džeroski, Todorovski, and Debeljak</label><mixed-citation>
      
Jeraj, M., Džeroski, S., Todorovski, L., and Debeljak, M.:
Application of machine learning methods to palaeoecological data, Ecol. Model., 191, 159–169, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Kuzmanović et al.(2024)Kuzmanović, Banko, and Skok</label><mixed-citation>
      
Kuzmanović, D., Banko, J., and Skok, G.:
Improving the operational forecasts of outdoor Universal Thermal Climate Index with post-processing, Int. J. Biometeorol., 68, 965–977, <a href="https://doi.org/10.1007/s00484-024-02640-6" target="_blank">https://doi.org/10.1007/s00484-024-02640-6</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Mežnar et al.(2023)Mežnar, Džeroski, and Todorovski</label><mixed-citation>
      
Mežnar, S., Džeroski, S., and Todorovski, L.:
Efficient generator of mathematical expressions for symbolic regression, Mach. Learn., 112, 4563–4596, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Omejc et al.(2024)Omejc, Gec, Brence, Todorovski, and Džeroski</label><mixed-citation>
      
Omejc, N., Gec, B., Brence, J., Todorovski, L., and Džeroski, S.:
Probabilistic grammars for modeling dynamical systems from coarse, noisy, and partial data, Mach. Learn., 113, 7689–7721, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Pappenberger et al.(2015)Pappenberger, Jendritzky, Staiger, Dutra, Di Giuseppe, Richardson, and Cloke</label><mixed-citation>
      
Pappenberger, F., Jendritzky, G., Staiger, H., Dutra, E., Di Giuseppe, F., Richardson, D. S., and Cloke, H. L.:
Global forecasting of thermal health hazards: the skill of probabilistic predictions of the Universal Thermal Climate Index (UTCI), Int. J. Biometeorol., 59, 311–323, <a href="https://doi.org/10.1007/s00484-014-0843-3" target="_blank">https://doi.org/10.1007/s00484-014-0843-3</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Reid et al.(2016)Reid, Tibshirani, and Friedman</label><mixed-citation>
      
Reid, S., Tibshirani, R., and Friedman, J.:
A study of error variance estimation in lasso regression, Stat. Sinica,   26, 35–67,  <a href="https://doi.org/10.5705/ss.2014.042" target="_blank">https://doi.org/10.5705/ss.2014.042</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Roman(2021)</label><mixed-citation>
      
Roman, S.:
Historical dynamics of the Chinese dynasties, Heliyon, 7,  e07293, <a href="https://doi.org/10.1016/j.heliyon.2021.e07293" target="_blank">https://doi.org/10.1016/j.heliyon.2021.e07293</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Roman(2023)</label><mixed-citation>
      
Roman, S.:
Theories and models: Understanding and Predicting Societal Collapse, in: The Era of Global Risk: An Introduction to Existential Risk Studies, Open Book Publishers, 27–54, <a href="https://doi.org/10.11647/obp.0336.02" target="_blank">https://doi.org/10.11647/obp.0336.02</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Roman(2025a)</label><mixed-citation>
      
Roman, S.:
Code for Approximating the universal thermal climate index (UTCI) using sparse regression with orthogonal polynomials, Zenodo [code], <a href="https://doi.org/10.5281/zenodo.16880382" target="_blank">https://doi.org/10.5281/zenodo.16880382</a>, 2025a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Roman(2025b)</label><mixed-citation>
      
Roman, S.:
Maximum Entropy Models for Unimodal Time Series: Case Studies of Universe 25 and St. Matthew Island, in: International Conference on Discovery Science, Springer, 32–44, <a href="https://doi.org/10.1007/978-3-032-05461-6_3" target="_blank">https://doi.org/10.1007/978-3-032-05461-6_3</a>, 2025b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Roman and Bertolotti(2022)</label><mixed-citation>
      
Roman, S. and Bertolotti, F.:
A master equation for power laws, Roy. Soc. Open Sci., 9, 220531, <a href="https://doi.org/10.1098/rsos.220531" target="_blank">https://doi.org/10.1098/rsos.220531</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Roman and Bertolotti(2023)</label><mixed-citation>
      
Roman, S. and Bertolotti, F.:
Global history, the emergence of chaos and inducing sustainability in networks of socio-ecological systems, Plos one, 18, e0293391, <a href="https://doi.org/10.1371/journal.pone.0293391" target="_blank">https://doi.org/10.1371/journal.pone.0293391</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Roman and Palmer(2019)</label><mixed-citation>
      
Roman, S. and Palmer, E.:
The Growth and Decline of the Western Roman Empire: Quantifying the Dynamics of Army Size, Territory, and Coinage, Cliodynamics, 10,  76–98, <a href="https://doi.org/10.21237/C7clio10243683" target="_blank">https://doi.org/10.21237/C7clio10243683</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Simidjievski et al.(2015)Simidjievski, Todorovski, and Džeroski</label><mixed-citation>
      
Simidjievski, N., Todorovski, L., and Džeroski, S.:
Learning ensembles of population dynamics models and their application to modelling aquatic ecosystems, Ecol. Model., 306, 305–317, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Simidjievski et al.(2016)Simidjievski, Todorovski, and Džeroski</label><mixed-citation>
      
Simidjievski, N., Todorovski, L., and Džeroski, S.:
Modeling dynamic systems with efficient ensembles of process-based models, PloS one, 11, e0153507, <a href="https://doi.org/10.1371/journal.pone.0153507" target="_blank">https://doi.org/10.1371/journal.pone.0153507</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Smits and Kotanchek(2005)</label><mixed-citation>
      
Smits, G. F. and Kotanchek, M.:
Pareto-front exploitation in symbolic regression, in: Genetic programming theory and practice II, Springer, 283–299, <a href="https://doi.org/10.1007/0-387-23254-0_17" target="_blank">https://doi.org/10.1007/0-387-23254-0_17</a>, 2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Stein and Shakarchi(2011)</label><mixed-citation>
      
Stein, E. M. and Shakarchi, R.:
Fourier analysis: an introduction, vol. 1, Princeton University Press, ISBN 978-0-691-11384-5, 2011.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Steinmann et al.(2025)Steinmann, Verstegen, Van Voorn, Roman, and Ligtenberg</label><mixed-citation>
      
Steinmann, P., Verstegen, J., Van Voorn, G., Roman, S., and Ligtenberg, A.:
Scenario search: finding diverse, plausible and comprehensive scenario sets for complex systems, Socio-Environmental Systems Modelling, 7, 18823–18823, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Tanevski et al.(2016a)Tanevski, Todorovski, and Džeroski</label><mixed-citation>
      
Tanevski, J., Todorovski, L., and Džeroski, S.:
Learning stochastic process-based models of dynamical systems from knowledge and data, BMC Syst. Biol., 10, 1–17, 2016a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Tanevski et al.(2016b)Tanevski, Todorovski, and Džeroski</label><mixed-citation>
      
Tanevski, J., Todorovski, L., and Džeroski, S.:
Process-based design of dynamical biological systems, Sci. Rep.-UK, 6, 34107, <a href="https://doi.org/10.1038/srep34107" target="_blank">https://doi.org/10.1038/srep34107</a>, 2016b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Tanevski et al.(2020)Tanevski, Todorovski, and Džeroski</label><mixed-citation>
      
Tanevski, J., Todorovski, L., and Džeroski, S.:
Combinatorial search for selecting the structure of models of dynamical systems with equation discovery, Eng. Appl. Artif. Intel., 89, 103423, <a href="https://doi.org/10.1016/j.engappai.2019.103423" target="_blank">https://doi.org/10.1016/j.engappai.2019.103423</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Tartarini and Schiavon(2020)</label><mixed-citation>
      
Tartarini, F. and Schiavon, S.:
pythermalcomfort: A Python package for thermal comfort research, SoftwareX, 12, 100578, <a href="https://doi.org/10.1016/j.softx.2020.100578" target="_blank">https://doi.org/10.1016/j.softx.2020.100578</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Termonia et al.(2018)Termonia, Fischer, Bazile, Bouyssel, Brozkova, Bénard, Bochenek, Degrauwe, Derková, Khatib, Hamdi, Mašek, Pottier, Pristov, Seity, Smolikova, Španiel, Tudor, Wang, and Joly</label><mixed-citation>
      
Termonia, P., Fischer, C., Bazile, E., Bouyssel, F., Brožková, R., Bénard, P., Bochenek, B., Degrauwe, D., Derková, M., El Khatib, R., Hamdi, R., Mašek, J., Pottier, P., Pristov, N., Seity, Y., Smolíková, P., Španiel, O., Tudor, M., Wang, Y., Wittmann, C., and Joly, A.:
The ALADIN System and its canonical model configurations AROME CY41T1 and ALARO CY40T1, Geosci. Model Dev., 11, 257–281, <a href="https://doi.org/10.5194/gmd-11-257-2018" target="_blank">https://doi.org/10.5194/gmd-11-257-2018</a>, 2018. 
    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Todorovski and Džeroski(1997)</label><mixed-citation>
      
Todorovski, L. and Džeroski, S.:
Declarative bias in equation discovery, in: Proceedings of the International Conference on Machine Learning, 376–384,  ISBN 1-55860-486-3, 1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Todorovski and Džeroski(2001)</label><mixed-citation>
      
Todorovski, L. and Džeroski, S.:
Theory revision in equation discovery, in: International Conference on Discovery Science, Springer, 389–400, <a href="https://doi.org/10.1007/3-540-45650-3_33" target="_blank">https://doi.org/10.1007/3-540-45650-3_33</a>, 2001.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Todorovski and Džeroski(2006)</label><mixed-citation>
      
Todorovski, L. and Džeroski, S.:
Integrating knowledge-driven and data-driven approaches to modeling, Ecol. Model., 194, 3–13, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Todorovski et al.(1998)Todorovski, Džeroski, and Kompare</label><mixed-citation>
      
Todorovski, L., Džeroski, S., and Kompare, B.:
Modelling and prediction of phytoplankton growth with equation discovery, Ecol. Model., 113, 71–81, 1998.

    </mixed-citation></ref-html>--></article>
