<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">GMD</journal-id><journal-title-group>
    <journal-title>Geoscientific Model Development</journal-title>
    <abbrev-journal-title abbrev-type="publisher">GMD</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Geosci. Model Dev.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1991-9603</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/gmd-14-7425-2021</article-id><title-group><article-title>Robustness of neural network emulations of radiative transfer<?xmltex \hack{\break}?>
parameterizations in a state-of-the-art general circulation model</article-title><alt-title>Robustness of NN emulations of radiative transfer parameterizations</alt-title>
      </title-group><?xmltex \runningtitle{Robustness of NN emulations of radiative transfer parameterizations}?><?xmltex \runningauthor{A.~Belochitski~and~V.~Krasnopolsky}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1 aff2">
          <name><surname>Belochitski</surname><given-names>Alexei</given-names></name>
          <email>alexei.a.belochitski@noaa.gov</email>
        <ext-link>https://orcid.org/0000-0003-0205-1621</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Krasnopolsky</surname><given-names>Vladimir</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>IMSG, Rockville, MD 20852, USA</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>NOAA/NWS/NCEP/EMC, College Park, MD 20740, USA</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Alexei Belochitski (alexei.a.belochitski@noaa.gov)</corresp></author-notes><pub-date><day>6</day><month>December</month><year>2021</year></pub-date>
      
      <volume>14</volume>
      <issue>12</issue>
      <fpage>7425</fpage><lpage>7437</lpage>
      <history>
        <date date-type="received"><day>7</day><month>April</month><year>2021</year></date>
           <date date-type="rev-request"><day>31</day><month>May</month><year>2021</year></date>
           <date date-type="rev-recd"><day>28</day><month>September</month><year>2021</year></date>
           <date date-type="accepted"><day>24</day><month>October</month><year>2021</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2021 Alexei Belochitski</copyright-statement>
        <copyright-year>2021</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://gmd.copernicus.org/articles/14/7425/2021/gmd-14-7425-2021.html">This article is available from https://gmd.copernicus.org/articles/14/7425/2021/gmd-14-7425-2021.html</self-uri><self-uri xlink:href="https://gmd.copernicus.org/articles/14/7425/2021/gmd-14-7425-2021.pdf">The full text article is available as a PDF file from https://gmd.copernicus.org/articles/14/7425/2021/gmd-14-7425-2021.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d1e98">The ability of machine-learning-based (ML-based) model
components to generalize to the previously unseen inputs and its impact on
the stability of the models that use these components have been receiving a lot
of recent attention, especially in the context of ML-based
parameterizations. At the same time, ML-based emulators of existing
physically based parameterizations can be stable, accurate, and fast when
used in the model they were specifically designed for. In this work we show
that shallow-neural-network-based emulators of radiative transfer
parameterizations developed almost a decade ago for a state-of-the-art
general circulation model (GCM) are robust with respect to the substantial
structural and parametric change in the host model: when used in two 7-month-long experiments with a new GCM, they remain stable and generate
realistic output. We concentrate on the stability aspect of the emulators'
performance and discuss features of neural network architecture and training
set design potentially contributing to the robustness of ML-based model
components.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d1e110">One of the main difficulties in developing and implementing high-resolution
environmental models is the complexity of the physical processes involved. For
example, the calculation of radiative transfer in a general circulation model (GCM) often takes a
significant part of the total model run time. From the standpoint of basic
physics, radiative transfer is well understood. Very accurate but
computationally complex benchmark models exist (Oreopoulos et al., 2012) that
demonstrate excellent agreement with observations (Turner et al., 2004).
Parameterizations of radiative transfer seek a compromise between accuracy
and computational performance. Arguably, the biggest simplification they
make is treatment of radiative transfer as a 1-D as opposed to a 3-D process
(independent column approximation, ICA): both solar, or shortwave (SW )radiation, and
terrestrial, or longwave (LW) radiation, are considered to flow within the
local column of the model, up and down the local vertical (two-stream
approximation) but not between columns. This approximation works well at
spatial resolutions characteristic of general circulation models of the
atmosphere (Marshak and Davis, 2005). To integrate over the spectrum of
radiation, parameterizations split it into several broad bands and a number
of representative spectral intervals that are treated monochromatically (Fu
and Liou, 1992). State-of-the-art parameterizations can reproduce benchmark
calculations to a high degree of accuracy even with these simplifications, but they still require substantial computational expense.</p>
      <p id="d1e113">Radiative transfer parameterizations supply their host model with broadband
fluxes and heating rates, which are obtained by integration over time,
space, and frequency. Therefore, a trade-off between accuracy and
computational expense can be found in how finely these dimensions are
discretized (Hogan et al., 2017).</p>
      <p id="d1e116"><list list-type="bullet">
          <list-item>

      <p id="d1e121">Discretization in time – all GCMs update their radiative heating and cooling rates less frequently than
the rest of the model fields. For example, the National Centers for
Environmental Prediction (NCEP) Global Forecast System (GFS) v16 general
circulation model (GCM) in its operational configuration updates its
radiative fields once per model hour, while updates to temperature,
moisture, and most cloud properties due to unresolved<?pagebreak page7426?> physics processes
happen every 150 model seconds or 24 times per single radiation call.
Updates due to dynamical processes happen even more frequently: every 12.5 s (Kain et al., 2020). This approximation is good for slowly changing
fields of certain radiatively active gases but is less justified for
small-scale clouds with lifetimes of an hour or less.</p>
          </list-item>
          <list-item>

      <p id="d1e127">Discretization in space – some GCMs calculate radiative fields on a coarser spatial grid and
interpolate them onto a finer grid used for the rest of the model variables.
For example, the radiation grid in the European Centre for Medium-Range Weather
Forecasts (ECMWF) Integrated Forecast System (IFS) v43R3 in the ensemble
mode is 6.25 times coarser than the physics grid (Hogan et al., 2017). This
may cause 2 m temperature errors in areas of surface heterogeneity, e.g., coasts (Hogan and Bozzo, 2015).</p>
          </list-item>
          <list-item>

      <p id="d1e133">Discretization and sampling in frequency space – the Rapid Radiative Transfer Model (RRTMG), a parameterization of radiative
transfer for GCMs used in NCEP GFS and ECMWF IFS, utilizes 14 bands in the
shortwave (Mlawer et al., 1997), while the parameterization used in the United
Kingdom Met Office Unified Model utilizes 6 (Edwards and Slingo, 1996).
Monte Carlo spectral integration (Pincus and Stevens, 2009) performs
integration over only a part of the radiative spectrum randomly chosen at each point in time and space, allowing us to increase the temporal and spatial resolution of radiation calculations. Monte Carlo integration of the
independent column approximation (McICA) (Pincus et al., 2003) integrates
over the entire spectrum but samples subgrid-scale (SGS) cloud properties
in a random, unbiased manner in each grid column in time and space instead
of integrating over them.</p>
          </list-item>
        </list></p>
      <p id="d1e138">All of the methods for improving computational efficiency of radiative
transfer parameterizations outlined above are either numerical and/or
statistical in nature. In recent years there has been a substantial increase
in interest in adding machine learning (ML) techniques to the arsenal of
these methods (Boukabara et al., 2019). It has been accomplished in at least two different ways: (1)
as an emulation technique for accelerating calculations of radiative
transfer parameterizations or their components and (2) as a tool for the development
of new parameterizations based on data simulated by more sophisticated
models and/or reanalysis.</p>
      <p id="d1e142">An ML-based emulator of a model physics parameterization is a functional
imitation of this parameterization in the sense that the results of model
calculations with the original parameterization and with its ML emulator are
so close to each other by a metric appropriate for an application at hand as to
be identical for the practical purposes. From the mathematical point of
view, model physics and individual parameterizations are mappings,
          <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M1" display="block"><mml:mrow><mml:mi mathvariant="bold-italic">Y</mml:mi><mml:mo>=</mml:mo><mml:mi>M</mml:mi><mml:mfenced open="(" close=")"><mml:mi>X</mml:mi></mml:mfenced><mml:mo>;</mml:mo><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="normal">ℜ</mml:mi><mml:mi>n</mml:mi></mml:msup><mml:mo>,</mml:mo><mml:mi mathvariant="normal">and</mml:mi><mml:mo>,</mml:mo><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mi mathvariant="bold-italic">Y</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="normal">ℜ</mml:mi><mml:mi>m</mml:mi></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
        where <inline-formula><mml:math id="M2" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M3" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula> are the dimensionalities of the input and output vector spaces,
respectively. Therefore, emulating existing parameterizations using ML
techniques is a mapping approximation problem. In practice, this mapping can
be defined by a set of its input and output vectors that is obtained by
running the original model with the parameterization that is to be emulated and saving inputs and outputs of this parameterization with a frequency and
spatiotemporal coverage sufficient to comprehensively cover the domain and range
of the mapping. These data are then used for the emulator training. This
approach allows us to achieve a very high accuracy of approximation because model
output, unlike empirical data, is neither noisy nor sparse.</p>
      <p id="d1e204">The domain and range of the mapping are defined not only by the
parameterization that is being emulated but by the entirety of the
atmospheric model environment: the dynamical core, the suite of physical
parameterizations, and the set of configuration parameters for both. Once
any of these components and/or parameters are modified, the set of possible
model states is altered as well, possibly now including states that were
absent in the emulator's training data set.</p>
      <p id="d1e207">How accurately should the emulator approximate the original mapping?
Unbiased, random, uncorrelated errors in radiative heating rates with
magnitudes as large as the net cooling rate do not statistically affect
the forecast skill of an atmospheric model (Pincus et al., 2003). From the
physical standpoint this can be understood in the following way: random
small local heating rate errors in the bulk of the atmosphere lead to local
small-scale instabilities that are mixed away by the flow; however, there is no such mechanism for the surface variables, such as skin temperature, and
errors in surface fluxes can be more consequential (Pincus and Stevens,
2013). Therefore, it may be useful to think of the above as necessary
conditions on an approximation error of an ML emulator of a radiative transfer
parameterization for it to be a successful functional imitation of the
original scheme.</p>
      <p id="d1e210">Developing a stable and robust neural-network-based (NN-based) emulator is a multifaceted problem
that requires deep understanding of multiple technical aspects of the
training process and details of NN architecture. Many techniques for the stabilization of hybrid statistical–deterministic models have been
developed. Compound parameterization has been proposed for climate and
weather modeling applications for which an additional NN is trained to predict
errors of the NN emulator, and, if the predicted error is above a certain
threshold, compound parameterization falls back to calling the original
physically based scheme (Krasnopolsky et al., 2008b). Stability theory was
used to identify the causes and conditions for instabilities in ML
parameterizations of moist convection when coupled to an idealized linear model
of atmospheric dynamics (Brenowitz et al., 2020). An NN optimization via
random search over hyperparameter space<?pagebreak page7427?> resulted in considerable
improvements in the stability of subgrid physics emulators in the
super-parameterized Community Atmospheric Model version 3.0 (Ott et al.,
2020). A coupled online learning approach was proposed whereby a
high-resolution simulation is nudged to the output of a parallel
lower-resolution hybrid model run and the ML component of the latter is
retrained to emulate tendencies of the former, helping to eliminate biases
and unstable feedback loops (Rasp, 2020). The random forest approach was
successfully used to build a stable ML parameterization of convection (Yuval
and O'Gorman, 2020). Physical constraints were used to achieve the stability of
hybrid models (e.g., Yuval et al., 2021; Kashinath et al., 2021).</p>
      <p id="d1e213">In this work we present robust and stable shallow-NN-based emulators of
radiative transfer parameterizations. We explore how much of a change in the
model's phase space (as well as the original parameterization's domain and
range) a statistical model like the NN can tolerate. We will approach this
question by installing shallow-NN-based emulators of LW and SW RRTMG
developed in 2011 for the NCEP Climate Forecast System (CFS) (Krasnopolsky et al., 2010) into the new
version 16 of NCEP GFS that became operational in March of 2021. Given the
scope of changes in the host model (described in Sect. 3), we do not
expect results of parallel runs to be identical; therefore, we will mostly
concentrate on the stability aspect of the emulators' performance.</p>
      <p id="d1e216">In Sect. 2, we briefly describe design aspects of these and other
emulators of radiative transfer parameterizations reported in the literature so
far. In Sect. 3, we outline major differences between the 2011 version of
CFS and the GFS v16, and we describe numerical experiments with SW and LW
emulators developed for the 2011 version of CFS (Krasnopolsky et al., 2010)
and incorporated into GFS v16. Results of these experiments are examined in
Sect. 4. Section 5 discusses aspects of neural network architecture and
training set design potentially contributing to the stability of ML-based model
components. Conclusions are formulated in Sect. 6.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Survey on technical aspects of existing ML emulators of radiative
transfer parameterizations</title>
      <p id="d1e227">NeuroFlux, a shallow-neural-network-based LW radiative transfer
parameterization developed at ECMWF, was in part an emulator and in part a
new ML-based parameterization (Chevallier et al., 1998, 2000). It consisted
of multiple NNs, each utilizing a hyperbolic tangent as an
activation function (AF) but using a varying number of neurons in the
single hidden layer: two NNs were used to generate vertical profiles of upwelling and downwelling clear-sky LW fluxes per each vertical
layer, and a battery of NNs, two per each vertical layer of the host model,
was used to compute profiles of upwelling and downwelling fluxes due to black-body
cloud on a given layer, with overall fluxes calculated using the multilayer
gray-body model. The training set for clear-sky NNs contained 6000 cloudless
profiles from global ECMWF short-range forecasts; 1 d of
3-hourly data per month of a single year were utilized. From this set,
multiple training sets for cloudy-sky NNs were derived, each containing 6000 profiles as well: a cloud with the emissivity of unity was artificially
introduced on a given vertical layer, and radiative transfer parametrization
was used in the offline mode to calculate resulting radiative fields.
NeuroFlux was accurate and about 1 order of magnitude as fast as the
original parameterization in a model with 31 vertical layers. It was
used operationally within the ECMWF four-dimensional variational data
assimilation system (Janiskova et al., 2002). However, in model
configurations with 60 vertical layers and above, NeuroFlux could not
maintain the balance between speed-up and accuracy (Morcrette et al., 2008).</p>
      <p id="d1e230">The approach based on pure emulation of existing LW and SW radiative
transfer parameterizations using NNs has been pursued at the NCEP Environmental Modeling Center (Krasnopolsky et al., 2008a, 2010, 2012; Belochitski et al.,
2011). In this approach, two shallow NNs with hyperbolic tangent activation
functions, one for LW and the other for SW radiative transfer, generate
heating rate profiles as well as surface and top-of-the-atmosphere radiative
fluxes, replacing the entirety of respective RRTMG LW and SW
parameterizations. It was not only radiative transfer solvers that were emulated but
also the calculations of gas and cloud optical properties (aerosol optical
properties were prescribed from climatology). Two different pairs of
emulators were designed for two different applications: climate simulation
and medium-range weather forecast, each differing in the training set
design. The database for the former application was generated by running the NCEP CFS, a state-of-the-art fully coupled climate
model, for 17 years (1990–2006) and saving instantaneous inputs and outputs
of RRTMG every 3 h for 1 d on the 1st and the 15th of each
month to sample diurnal and annual cycles, as well as decadal variability
and states introduced by time-varying greenhouse gases and aerosols. From this database, 300 global snapshots were randomly chosen and consequently
split into three independent sets for training, testing, and validation,
each containing about 200 000 input/output records (Krasnopolsky et al.,
2010). The data set for the medium-range forecast application was obtained
from a total of 24 10 d NCEP GFS forecasts initialized on the 1st and the 15th of
each month of 2010, with each forecast saving instantaneous 3-hourly
data. Independent data sets were obtained following the same procedure as
for the climate application (Krasnopolsky et al., 2012).</p>
      <p id="d1e233">The dimensionality of data sets and NN input vectors for both applications was
reduced in the following manner: some input profiles (e.g., pressure) that
are highly correlated in the vertical were sampled on every other level
without a decrease in approximation accuracy; some inputs that are uniformly
constant above a certain level (water vapor) or below a certain level (ozone)
were excluded from the<?pagebreak page7428?> training set on these levels; inputs that are given
by prescribed monthly climatological lookup tables (e.g., trace gases,
tropospheric aerosols) were replaced by latitude and periodic functions of
longitude and month number; inputs given by prescribed monthly time series
(e.g., carbon dioxide, stratospheric aerosols) were replaced by the year
number and periodic function of month number. No reduction in dimensionality
was applied to outputs.</p>
      <p id="d1e236">A very high accuracy and up to 2 orders of magnitude increase in speed
compared to the original parameterization for both NCEP CFS and GFS full
radiation have been achieved for model configurations with 64 vertical
levels. The systematic errors introduced by NN emulations of full model
radiation were negligible and did not accumulate during the decadal model
simulation. The random errors of NN emulations were also small. Almost
identical results have been obtained for the parallel multi-decadal climate
runs of the models using the NN and the original parameterization as well as in
the limited testing in the medium-range forecasting mode. Regression trees
were explored as an alternative to NNs and were found to be nearly as
accurate in a 10-year-long climate run while requiring much more computer
memory due to the fact that the entire training data set has to be stored in
memory during model integration (Belochitski et al., 2011).</p>
      <p id="d1e240">Using the approach developed at NCEP, an emulator of RRTMG consisting of a
single shallow NN that replaces both LW and SW parameterizations at once was
developed at the Korean Meteorological Agency for the short-range weather
forecast model Korea Local Analysis and Prediction System in an idealized
configuration with 39 vertical layers (Roh and Song, 2020). Inputs and
outputs to RRTMG were saved on each 3 s time step of a 6 h long
simulation of a squall line, and about 270 000 input/output pairs were
randomly chosen from this data set to create training, validation, and
testing sets. Dimensionality reduction was performed by removing constant
inputs. Several activation functions were tested (tanh, sigmoid, softsign,
arctan, linear), with hyperbolic tangent providing the best overall accuracy of
approximation. The emulator was 2 orders of magnitude as fast as the
original parameterization and was stable in a 6 h long simulation.</p>
      <p id="d1e243">Two dense, fully connected, feed-forward deep-NN-based emulators with three
hidden layers, one emulator per parametrization, were developed for the LW and
SW components of RRTMG Parallel (RRTMG-P) (Pincus et al., 2019) for the Department of Energy's super-parameterized
Energy Exascale Earth System Model (SP-E3SM) (Pal et al., 2019). In SP-E3SM,
radiative transfer parameterizations act in individual columns of a 2-D
cloud-resolving model with 31 vertical levels embedded into columns of the
host GCM. The calculation of cloud and aerosol optical properties was not
emulated; instead, original RRTMG-P subroutines were used. Inputs and
outputs of radiative parameterizations were saved at every time step of a
year-long model run, with 9 % of these data randomly chosen to form a data
set of 12 000 000 input/output records for the LW and of 6 000 000 input/output
records for the SW emulator training and validation. In total, 90 % of the data in these
sets was used for training and 10 % for validation and testing. No
additional dimensionality reduction was performed. Sigmoid AF was chosen as
it was found to provide slightly better training convergence than the
hyperbolic tangent. The emulator was 1 order of magnitude faster than the
original parametrization and was stable in a year-long run.</p>
      <p id="d1e246">A number of ML-based radiative transfer parameterizations or their
components have been developed but, to our knowledge, have not yet been
tested in an online setting or in interactive coupling to an atmospheric
model. Among them are deep-NN-based parameterizations of gas optical
properties for RRTMG-P (Ukkonen et al., 2020; Veerman et al., 2021) and a SW
radiative transfer parameterization based on convolutional deep neural
networks (Lagerquist et al., 2021).</p>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Design of numerical experiments with GFS v16</title>
      <p id="d1e257">GFS v16 differs from the 2011 version of the atmospheric component of NCEP
CFS in a number of ways, the most relevant of which are summarized in Table 1.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1" specific-use="star"><?xmltex \currentcnt{1}?><label>Table 1</label><caption><p id="d1e263">Differences between the atmospheric component of 2011 NCEP CFS and
the 2021 version of NCEP GFS.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="center"/>
     <oasis:colspec colnum="3" colname="col3" align="center"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">CFS 2011</oasis:entry>
         <oasis:entry colname="col3">GFS 2021</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Dynamical core</oasis:entry>
         <oasis:entry colname="col2">Spectral Eulerian</oasis:entry>
         <oasis:entry colname="col3">Finite-volume cubed sphere</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Horizontal resolution</oasis:entry>
         <oasis:entry colname="col2">T126 (<inline-formula><mml:math id="M4" display="inline"><mml:mo lspace="0mm">∼</mml:mo></mml:math></inline-formula> 100 km)</oasis:entry>
         <oasis:entry colname="col3">C768 (<inline-formula><mml:math id="M5" display="inline"><mml:mo lspace="0mm">∼</mml:mo></mml:math></inline-formula> 13 km)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Vertical res. and coordinate</oasis:entry>
         <oasis:entry colname="col2">64 levels, hybrid sigma-<inline-formula><mml:math id="M6" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">127 levels, hybrid sigma-<inline-formula><mml:math id="M7" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Physics grid</oasis:entry>
         <oasis:entry colname="col2">Gaussian</oasis:entry>
         <oasis:entry colname="col3">Cubed sphere</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Radiation</oasis:entry>
         <oasis:entry colname="col2">RRTMG v2.3</oasis:entry>
         <oasis:entry colname="col3">RRTMG LW v4.82, SW v3.8</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Microphysics</oasis:entry>
         <oasis:entry colname="col2">Zhao–Carr, single moment, two species,</oasis:entry>
         <oasis:entry colname="col3">GFDL, single moment, five species,</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">one prognostic variable</oasis:entry>
         <oasis:entry colname="col3">five prognostic variables</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Planetary boundary layer</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M8" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula> profile</oasis:entry>
         <oasis:entry colname="col3">Hybrid TKE-EDMF</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Middle atm. <inline-formula><mml:math id="M9" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="normal">O</mml:mi></mml:mrow></mml:math></inline-formula> photochemistry</oasis:entry>
         <oasis:entry colname="col2">None</oasis:entry>
         <oasis:entry colname="col3">Climatological</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M10" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">O</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> photochemistry</oasis:entry>
         <oasis:entry rowsep="1" colname="col2">None</oasis:entry>
         <oasis:entry rowsep="1" colname="col3">Climatological</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Stratospheric aerosols</oasis:entry>
         <oasis:entry namest="col2" nameend="col3">Time-dependent, prescribed </oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Tropospheric aerosols</oasis:entry>
         <oasis:entry namest="col2" nameend="col3">Climatological </oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M11" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry namest="col2" nameend="col3">Time-dependent, prescribed </oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Trace gases</oasis:entry>
         <oasis:entry namest="col2" nameend="col3">Climatological </oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e515">From the standpoint of the implementation of radiative transfer emulators
developed in 2011 into the modern generation of GFS, the most consequential
change in the model is the near doubling of the number of vertical layers
because it has a direct impact on the size of the input layer of the
NN-based emulator. Therefore, we reconfigure GFS v16 to run with 64 layers
in the vertical.</p>
      <p id="d1e519">Another consequential change in the model appears to be the replacement of the
Zhao–Carr microphysics (Zhao and Carr, 1997) with the GFDL scheme (Lin et al., 1983; Chen and Lin, 2011; Zhou et al., 2019). Using the latter in
combination with 2011 RRTMG emulators resulted in unphysical values of
outgoing LW radiation at the top of the atmosphere (TOA) (not shown).
A potential explanation is that the change in microphysical parameterization
leads to an increase in the number of the model's prognostic variables. Both
the spectral and the finite-volume dynamical cores include zonal and
meridional wind components, pressure, temperature, water vapor, and ozone
mixing ratios as prognostic variables. The Zhao–Carr microphysics add only one
more prognostic to this list: the mixing ratio of total cloud condensate
(defined as the sum of cloud water and cloud ice mixing ratios). The GFDL
microphysics add six prognostic variables: cloud water, cloud ice, rain,
snow, and graupel mixing ratios, as well as cloud fraction. The near
doubling of the number of prognostic variables from 7 to 12 leads to the
proportional increase in the dimensionality of the physical phase space of the
model. As a result, the set of possible model states in GFS v16 is very
different, from a mathematical standpoint, than in the 2011 CFS. Even though
the vector of inputs to the LW parameterization remains the same in the new
model, it is obtained by mapping from a very different<?pagebreak page7429?> mathematical object,
potentially increasing the probability that a given input vector lies
outside  the NN's original training data set domain. For our experiments,
we replaced the GFDL microphysical parametrization with the Zhao–Carr
scheme.</p>
      <p id="d1e522">The new hybrid TKE-EDMF planetary boundary layer (PBL) parameterization (Han
and Bretherton, 2019) also introduces a new prognostic variable, subgrid-scale turbulent kinetic energy, that was absent in the 2011 version of CFS.
Even though we did not see adverse effects stemming from the use of the new
PBL scheme in preliminary testing, we replaced it with the original
<inline-formula><mml:math id="M12" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula> profile and/or EDMF scheme (Han and Pan, 2011) out of  caution.</p>
      <p id="d1e532">Concentrations of radiatively active gases are important inputs to radiative
transfer schemes and, more generally, are important parameters of the Earth
system. From the standpoint of emulator training, a change in these parameters
leads to a change in phase space of the host model, potentially
necessitating retraining of the emulator to ensure its accuracy and
stability. <inline-formula><mml:math id="M13" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> concentration values used during training of 2011
emulators ranged from 350 to 380 ppmv between the years 1990 and 2006,
respectively. In our current experiments spanning 2018, the <inline-formula><mml:math id="M14" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> concentration
was about 409 ppmv, or about 10 % higher on average than in the training
set.</p>
      <p id="d1e557">There were incremental updates and parametric changes to all other
components of the suite of physical parameterizations, which are too numerous to be
listed here; in addition, the model's software infrastructure was completely
overhauled, including a new modeling framework based on the Earth System Modeling Framework (ESMF), a coupler of the
dynamical core to the physics package, an input/output system, and workflow
scripts (for more detail, see the document “GFS/GDAS Changes Since 1991” at
<uri>https://www.emc.ncep.noaa.gov/gmb/STATS/html/model_changes.html</uri>, last access: 27 November 2021.).</p>
      <p id="d1e563">For experiments presented in this paper, we configure GFS v16 to run at C96
horizontal resolution (<inline-formula><mml:math id="M15" display="inline"><mml:mo lspace="0mm">∼</mml:mo></mml:math></inline-formula> 100 km) to reduce the computational
expense of the model. This configuration will be referred to as GFS in the
following discussion and was used in control runs. We then replaced both
modern versions of LW and SW RRTMG parameterizations in GFS with radiative
transfer emulators developed in Krasnopolsky et al. (2010). This version
of the model will be referred to as hybrid deterministic–statistical GFS, or
HGFS. Two 7-month-long runs were performed with each model configuration:
one initialized on 1 January 2018 and the other one on 1 July 2018, both using 2018
values of radiative forcings, with the instantaneous output saved 3-hourly.
Sea surface temperatures (SSTs) in GFS forecasts are initialized from
analysis and exponentially relax to climatology on a 90 d timescale as
forecast progresses. The first 30 d of each of the two 7-month-long runs were
discarded, and the remaining 6 months of data in each experiment were combined
into a single data set mimicking a 12-month-long run forced by
climatological SSTs.</p>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Results</title>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1" specific-use="star"><?xmltex \currentcnt{1}?><?xmltex \def\figurename{Figure}?><label>Figure 1</label><caption><p id="d1e583">Zonal and time mean over 12 months' worth of model output covering
1 February 2018–1 February 2019 for <bold>(a)</bold> longwave heating rate (K d<inline-formula><mml:math id="M16" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>) and <bold>(b)</bold> shortwave
heating rate (K d<inline-formula><mml:math id="M17" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>). Upper row – results produced by HGFS; middle – results by
GFS; lower row: the difference (HGFS <inline-formula><mml:math id="M18" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula> GFS). The vertical coordinate
shows the model level number.</p></caption>
        <?xmltex \igopts{width=483.69685pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/7425/2021/gmd-14-7425-2021-f01.png"/>

      </fig>

      <?pagebreak page7430?><p id="d1e629">Figure 1 shows zonal and time mean over 12 months' worth of model output,
covering the period of 1 February 2018–1 February 2019 for LW (left panel) and SW (right
panel) heating rates. Global biases are small for both heating rates and
constitute about 2 %–3 % of the global mean value. A decrease in LW radiative
cooling at the top of the tropical and subtropical boundary layer is compensated for
by the corresponding decrease in SW radiative heating and consistent with
a decrease in low cloud cover in these areas (not shown). Biases in the
stratopause may be related to the new parameterizations of <inline-formula><mml:math id="M19" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">O</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and
<inline-formula><mml:math id="M20" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="normal">O</mml:mi></mml:mrow></mml:math></inline-formula> photochemistry that were not present in the 2011 version of the
model.
<?xmltex \hack{\newpage}?></p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2" specific-use="star"><?xmltex \currentcnt{2}?><?xmltex \def\figurename{Figure}?><label>Figure 2</label><caption><p id="d1e659">Time mean over 12 months' worth of model output covering
1 February 2018–1 February 2019 for <bold>(a)</bold> outgoing LW radiation at the TOA, <bold>(b)</bold> outgoing SW
radiation at the TOA, and <bold>(c)</bold> downwelling SW radiation at the surface. Upper row
– results produced by HGFS; middle – results by GFS; lower row: the
difference (HGFS <inline-formula><mml:math id="M21" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula> GFS).</p></caption>
        <?xmltex \igopts{width=483.69685pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/7425/2021/gmd-14-7425-2021-f02.png"/>

      </fig>

      <p id="d1e685">Figure 2a shows outgoing longwave radiation (OLR) at TOA, and
panel (b) shows outgoing SW radiation (OSR) at TOA. Global biases are below
1 % of the global time mean; however, local biases are more pronounced.
A decrease in OLR and increase in OSR over the Maritime Continent are
consistent with an increase in high cloud cover in the region (not shown).
An increase in OLR and decrease in OSR in the subtropical areas off western
coasts of continents are consistent with a decrease in stratocumulus cloud
cover (not shown). These changes in cloud cover are also consistent with an
increase in downwelling SW at the surface in the stratocumulus regions and
a decrease over the Maritime Continent, as shown in Fig. 2c,
with global time mean biases being about 0.2 % of the global average.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3" specific-use="star"><?xmltex \currentcnt{3}?><?xmltex \def\figurename{Figure}?><label>Figure 3</label><caption><p id="d1e690">Time means over 12 months' worth of model output covering
1 February 2018–1 February 2019 for <bold>(a)</bold> upwelling SW radiation at the surface, <bold>(b)</bold> upwelling LW radiation at the surface, and <bold>(c)</bold> downwelling LW radiation at the
surface. Upper row – results produced by HGFS; middle – results by GFS; lower row: the difference (HGFS <inline-formula><mml:math id="M22" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula> GFS).</p></caption>
        <?xmltex \igopts{width=483.69685pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/7425/2021/gmd-14-7425-2021-f03.png"/>

      </fig>

      <p id="d1e715">Figure 3a shows upwelling SW radiation flux at the surface. The global mean
negative bias is almost 5 % of the global average value, with negative
biases prevalent over continents and extratropical oceans and positive
biases over tropical oceans. Upwelling LW at the surface (Fig. 3b) is
biased high by about 0.5 % of the global mean value, with positive biases
over most of the continents, polar areas, and most of tropical oceans and
negative biases in the midlatitude oceans, northern Canada, and Alaska, as
well as the Barents and Norwegian seas. Downwelling LW at the surface (Fig. 3b) is biased low globally by approximately 0.5 %. Table 2 summarizes
time and global mean biases for the heating rates and radiative fluxes
predicted by the emulators.</p>
      <p id="d1e718">Figure 4 shows time series of a 10 d running mean of globally averaged LW
and SW fluxes at the surface and the TOA generated by HGFS (black curves)
and GFS (green curves) for the last 6 months (1 February–1 August) of a 7-month-long run initialized on 1 January 2018. Time series of the same quantities for the
run initialized on 1 July 2018 exhibit similar properties and are therefore not
shown. Magnitudes and signs of biases of each emulator-predicted variable
are consistent with their time and globally averaged<?pagebreak page7431?> values shown in Table 2. The HGFS run captures the seasonal cycle and  amplitude of seasonal and
sub-seasonal variability reasonably well. As to be expected from long-term
free-running experiments with a GCM, details of individual weather systems
differ between the two runs even when considered through the lens of a
10 d running mean. This is manifested most starkly in shortwave fluxes
leaving the atmosphere, outgoing SW at TOA (Fig. 4b), and downwelling SW at
the surface (Fig. 4d), which are very sensitive to the instantaneous cloud
distributions.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4" specific-use="star"><?xmltex \currentcnt{4}?><?xmltex \def\figurename{Figure}?><label>Figure 4</label><caption><p id="d1e723">Time series of a running 10 d mean covering 1 February–1 August 2018
for <bold>(a)</bold> outgoing LW at TOA, <bold>(b)</bold> outgoing SW at TOA, <bold>(c)</bold> downwelling LW radiation at the surface, <bold>(d)</bold> downwelling SW radiation at the surface, <bold>(e)</bold> upwelling LW radiation at the surface, and <bold>(f)</bold> upwelling SW radiation at the
surface. Black curves – results produced by HGFS; green – results by GFS.</p></caption>
        <?xmltex \igopts{width=483.69685pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/7425/2021/gmd-14-7425-2021-f04.png"/>

      </fig>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T2" specific-use="star"><?xmltex \currentcnt{2}?><label>Table 2</label><caption><p id="d1e755">Time mean and global mean biases in LW heating rate (LWHR), SW heating rate (SWHR), and upwelling (Up) and downwelling (Dn) radiative fluxes at TOA and at the surface (Sfc) over 12 months' worth of model output covering 1 February 2018–1 February 2019.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="9">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:colspec colnum="6" colname="col6" align="left"/>
     <oasis:colspec colnum="7" colname="col7" align="left"/>
     <oasis:colspec colnum="8" colname="col8" align="left"/>
     <oasis:colspec colnum="9" colname="col9" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Variable</oasis:entry>
         <oasis:entry colname="col2">LWHR</oasis:entry>
         <oasis:entry colname="col3">SWHR</oasis:entry>
         <oasis:entry colname="col4">OLR</oasis:entry>
         <oasis:entry colname="col5">OSR</oasis:entry>
         <oasis:entry colname="col6">LW Up Sfc</oasis:entry>
         <oasis:entry colname="col7">LW Dn Sfc</oasis:entry>
         <oasis:entry colname="col8">SW Up Sfc</oasis:entry>
         <oasis:entry colname="col9">SW Dn Sfc</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Bias</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:mn mathvariant="normal">5.56</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3.62</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">1.36</oasis:entry>
         <oasis:entry colname="col5">0.49</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M25" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.59</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M26" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.55</oasis:entry>
         <oasis:entry colname="col8">1.86</oasis:entry>
         <oasis:entry colname="col9"><inline-formula><mml:math id="M27" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.59</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">K d<inline-formula><mml:math id="M28" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">K d<inline-formula><mml:math id="M29" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">Wm<inline-formula><mml:math id="M30" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5">Wm<inline-formula><mml:math id="M31" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col6">Wm<inline-formula><mml:math id="M32" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col7">Wm<inline-formula><mml:math id="M33" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col8">Wm<inline-formula><mml:math id="M34" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col9">Wm<inline-formula><mml:math id="M35" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S5">
  <label>5</label><title>Discussion</title>
      <p id="d1e1022">What could be the factors contributing to the stability of the emulators
presented in this paper? In the following, we highlight and discuss aspects
of the machine learning technique choice (shallow vs. deep neural network,
activation function selection) and training set design that distinguish the
emulators developed in Krasnopolsky et al. (2010).</p>
<sec id="Ch1.S5.SS1">
  <label>5.1</label><title>Shallow vs. deep neural networks: complexity and nonlinearity</title>
      <p id="d1e1032">Application of shallow NNs (SNNs) to the problem of mapping approximation
has thorough theoretical support. The universal approximation theorem proves
that an SNN is a generic and universal tool for approximating any continuous
or almost continuous mappings under very broad assumptions and for a wide
class of activation functions (e.g., Hornik et al., 1990; Hornik, 1991).
Similarly broad results for deep NNs (DNNs) do not exist  yet (Vapnik,
2019); however, specific combinations of DNN architectures and activation
functions have theoretical support (e.g., Leshno et al., 1993; Lu et al.,
2017; Elbrachter et al., 2021). Until there is a universal theory, it has been
suggested to consider DNN a heuristic approach since, in general, from a
theoretical point of view,<?pagebreak page7432?> a deep network cannot guarantee a solution of any
selection problem that  constitutes a complete learning problem (Vapnik,
2019). These considerations are important to keep in mind when selecting NN
architecture for the emulation of model physics or their components.</p>
      <p id="d1e1035">Next, we compare some properties of DNNs and SNNs to further emphasize their
differences and to point out some properties of DNNs that may lead to
instabilities in deterministic models coupled to DNN-based model components.</p>
      <?pagebreak page7433?><p id="d1e1038">To avoid overfitting and instability, the complexity and nonlinearity of the
approximating and/or emulating NN should not exceed the complexity and nonlinearity of
the mapping to be approximated. A measure of the SNN complexity can be
written as (see below for explanation)
            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M36" display="block"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">SNN</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>k</mml:mi><mml:mo>⋅</mml:mo><mml:mfenced close=")" open="("><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mi>m</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfenced><mml:mo>+</mml:mo><mml:mi>m</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M37" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M38" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula> are the numbers of SNN inputs and outputs, and <inline-formula><mml:math id="M39" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> is the
number of neurons in a single hidden layer. The complexity of the SNN
(Eq. 2) increases linearly with the number of neurons in the hidden
layer, <inline-formula><mml:math id="M40" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>. For given numbers of inputs and outputs there is only one SNN
architecture or configuration with a specified complexity <inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">SNN</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>.
<?xmltex \hack{\newpage}?>
For the DNN complexity, a similar measure of complexity can be written as
(again, see below for explanation)
            <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M42" display="block"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">DNN</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:msubsup><mml:msub><mml:mi>k</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the number of neurons in the layer <inline-formula><mml:math id="M44" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> (<inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:math></inline-formula> correspond
to the input and output layers, respectively). The complexity of the DNN
(Eq. 3) increases geometrically with the increasing number of layers,
<inline-formula><mml:math id="M47" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>.</p>
      <p id="d1e1216">Both <inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">SNN</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">DNN</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are simply the numbers of parameters of the NN
that are trained or fit during SNN and/or DNN training. While there is a one-to-one correspondence between the SNN complexity, <inline-formula><mml:math id="M50" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">SNN</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, and the SNN
architecture, given the fixed number of neurons in the input and output
layers, correspondence between the DNN complexity, <inline-formula><mml:math id="M51" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">DNN</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, and the DNN
architecture is multivalued: many different DNN architectures and/or configurations have the same complexity <inline-formula><mml:math id="M52" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">DNN</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> given the same size of input and output
layers. Overall, controlling the complexity of DNNs is more difficult than
controlling the complexity of an SNN.</p>
      <p id="d1e1275">For an SNN given by the expression
            <disp-formula id="Ch1.E4" content-type="numbered"><label>4</label><mml:math id="M53" display="block"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mi>b</mml:mi><mml:mi>j</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:munderover><mml:msubsup><mml:mi>a</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mn mathvariant="normal">1</mml:mn></mml:msubsup><mml:mo>⋅</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">⋯</mml:mi><mml:mo>,</mml:mo><mml:mi>m</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M54" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M55" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M56" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> are the same as in Eq. (2), and nonlinearity increases
arithmetically or linearly with the addition of each new hidden neuron,
<inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="italic">ϕ</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msubsup><mml:mi>b</mml:mi><mml:mi>i</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msubsup><mml:mi>a</mml:mi><mml:mi mathvariant="normal">is</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msubsup><mml:mo>⋅</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula>, to the single hidden layer of the NN.</p>
      <p id="d1e1421">For a DNN, symbolically written as
            <disp-formula id="Ch1.E5" content-type="numbered"><label>5</label><mml:math id="M58" display="block"><mml:mrow><?xmltex \hack{\hbox\bgroup\fontsize{9.5}{9.5}\selectfont$\displaystyle}?><mml:mtable rowspacing="0.2ex" class="split" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mi mathvariant="bold-italic">Y</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">B</mml:mi><mml:mi>n</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">A</mml:mi><mml:mi>n</mml:mi></mml:msup><mml:mo>⋅</mml:mo><mml:mi mathvariant="italic">ϕ</mml:mi><mml:mfenced close="" open="("><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">B</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mfenced open="" close=")"><mml:mrow><mml:mo>⋅</mml:mo><mml:mi mathvariant="italic">ϕ</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">B</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">A</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup><mml:mo>⋅</mml:mo><mml:mi mathvariant="italic">ϕ</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">B</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mi mathvariant="normal">⋯</mml:mi><mml:mi mathvariant="italic">ϕ</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">B</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">A</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msup><mml:mo>⋅</mml:mo><mml:mi mathvariant="bold-italic">X</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable><?xmltex \hack{$\egroup}?></mml:mrow></mml:math></disp-formula>
          where each new hidden layer or neuron introduces additional nonlinearity on top of the
nonlinearities of the previous<?pagebreak page7434?> hidden layers; thus, the nonlinearity of the
DNN increases geometrically with the addition of new hidden layers much more quickly
than the nonlinearity of the SNN. Thus, controlling the nonlinearity of DNNs is
more difficult than controlling the nonlinearity of SNNs. The higher the
nonlinearity of the model the more unstable and unpredictable generalization
is (especially nonlinear extrapolation that is an ill-posed problem).</p>
      <p id="d1e1570">DNNs represent a very powerful and flexible technique that is extensively used for
emulation of model physics and their components (Kasim et al., 2020).
Discussion of its limitations can be found in Thompson et al. (2020). The
arguments listed here are intended to point out possible sources of
instability of DNNs in the models and the need for careful handling of this
very sensitive tool.</p>
</sec>
<sec id="Ch1.S5.SS2">
  <label>5.2</label><title>Preparation of training sets</title>
      <p id="d1e1581">Specifics of training set design may impact the stability of the NN as well. We
would like to point out a few aspects of training set preparation that, in
our experience, are of relevance to the development of robust ML-based
components of geophysical models.</p>
      <p id="d1e1584">A general rule of thumb when it comes to fitting statistical models to data
is that the number of records in the training set should be at least as
large as the number of model parameters or, in the context of the current
discussion, as the NN complexity introduced in Sect. 5.1. As a
consequence, NNs of larger complexity require larger training sets to
approximate a given mapping. To use DNN as an example, as the complexity of
DNN, <inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">DNN</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (Eq. 3), increases geometrically with the number of DNN
layers, so does the amount of data required for the DNN training (Thompson
et al. 2020).</p>
      <p id="d1e1598">We also find that comprehensiveness of the training set is an important
contributing factor to the generalization capability of the NN. In the
context of the application at hand, comprehensiveness of the training set means
that it should encompass as much of the complexity of the underlying physical
system as permitted by the numerical model that hosts the NN. In practice,
it translates into sampling diurnal, seasonal and annual variability, and states introduced by boundary conditions, e.g., greenhouse gas and
aerosol concentrations, realistic orography, and surface state. Inclusion of
events of special interest, e.g., hurricanes, snow storms, droughts, and extreme
precipitation events, is beneficial as well.</p>
      <p id="d1e1601">Care should be taken for proper sampling of the training data. For example,
saving the training data set on a Gaussian longitude–latitude grid will
result in overrepresentation of polar areas, and data must be resampled to
get more uniform representation over the globe.</p>
      <p id="d1e1605">Purging and normalization of inputs and outputs are important. Constant
inputs and outputs must be removed: from the standpoint of mapping
emulation, constants carry no information about the input-to-output
relation; however, with incorrect normalization, they may become a source of
noise during training. Normalization of inputs and outputs strongly affects
NN training. More specifically to the present application, if some inputs or
outputs of an NN are vertical profiles of a physical variable, as is common
in geophysical models, the profiles should be normalized as a whole, as
opposed to as a collection of independent variables, for the NN to better
capture correlations and dependencies between the levels of the profile
(Krasnopolsky, 2013).</p>
</sec>
<sec id="Ch1.S5.SS3">
  <label>5.3</label><title>Continuously vs. not continuously differentiable activation functions</title>
      <p id="d1e1616">The universal approximation theorem for SNNs is satisfied for a wide class of
bounded, nonlinear AFs. Note that many popular AFs used in DNN
applications, e.g., variants of ReLU, do not belong to this class. However,
for a specific problem of mapping approximation, it may be useful to
consider additional restrictions on AFs.</p>
      <p id="d1e1619">If the AF is almost continuous or, in other words, has only finite
discontinuities (e.g., step function), the first derivative (Jacobian) of the
NN using this AF will be singular. If the AF is not continuously
differentiable (e.g., ReLU), its first derivative will not be continuous
(will have finite discontinuities) and neither will the NN Jacobian. Using a
non-continuously differentiable NN as a model component may lead to
instability, especially if the Jacobian of this component is calculated in
the model. Using gradient-based optimization algorithms for training such
NNs may be challenging due to discontinuities in gradients.</p>
      <p id="d1e1622">If the AF is monotonic, the error surface associated with a single-layer
model is guaranteed to be convex, simplifying the training process (Wu,
2009). When AF approximates identity function near the origin (i.e., <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:mi mathvariant="italic">ϕ</mml:mi><mml:mfenced open="(" close=")"><mml:mn mathvariant="normal">0</mml:mn></mml:mfenced><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>.</mml:mo><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="italic">ϕ</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mfenced open="(" close=")"><mml:mn mathvariant="normal">0</mml:mn></mml:mfenced><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">ϕ</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> is
continuous at 0), the neural network will learn efficiently when its weights
are initialized with small random values. When the activation function does
not approximate identity near the origin, special care must be used when
initializing the weights (Sussillo and Abbott, 2015).</p>
      <p id="d1e1667">It is noteworthy that the sigmoid and hyperbolic tangent AFs, popular in SNN
applications, meet all aforementioned criteria. Additionally, in the context
of emulation of model physics parameterizations, these AFs provide one of
the lowest training losses compared to other AFs (Chantry et al., 2021).</p>
</sec>
</sec>
<sec id="Ch1.S6" sec-type="conclusions">
  <label>6</label><title>Conclusions</title>
      <p id="d1e1679">One of the major challenges in the development of ML- and/or AI-based parameterizations
for multidimensional nonlinear forward environmental models is ensuring
the stability of the coupling between deterministic and statistical components.
This problem is particularly acute for neural-network-based
parameterizations since, in theory,<?pagebreak page7435?> generalization to out-of-sample data is
not guaranteed, and, in practice, previously unseen inputs may lead to
unphysical outputs of the NN-based parameterization, often destabilizing the
hybrid model even in idealized simulations.</p>
      <p id="d1e1682">Shallow-NN-based emulators of radiative transfer parameterizations developed
almost a decade ago for a state-of-the-art GCM are stable with respect to
substantial structural and parametric change in the host model: when used in
two 7-month-long experiments with the new model, they not only remain
stable, but also generate realistic output. Two types of modifications of the
host model that NN emulators cannot tolerate are the change in the model
vertical resolution and the change in the number of model prognostic variables, in both cases
due to alteration of the dimensionality of the phase space of the
mapping (parameterization) and of the emulating NN. After the changes of
this nature are introduced into the host model, NN emulators must be
retrained.</p>
      <p id="d1e1685">We conjecture that careful control of complexity and nonlinearity of an
AI or ML model component, along with comprehensiveness and realism of its
training data set, are important factors contributing to both the
component's generalization capability and the stability of the model hosting
it.</p>
</sec>

      
      </body>
    <back><notes notes-type="codeavailability"><title>Code availability</title>

      <p id="d1e1692">The source code of the NCEP GFS atmospheric model component using the full
physics NN emulator, including the file with trained NN coefficients, is
available in the GitHub repository at
<uri>https://github.com/AlexBelochitski-NOAA/fv3atm_old_radiation_nn_emulator</uri> (last access: 2 December 2021) and
is also archived on Zenodo: <ext-link xlink:href="https://doi.org/10.5281/zenodo.4663160" ext-link-type="DOI">10.5281/zenodo.4663160</ext-link> (Belochitski, 2021).</p>
  </notes><notes notes-type="dataavailability"><title>Data availability</title>

      <p id="d1e1704">The training, validation, and independent test data sets for shallow-neural-network-based emulators of longwave and shortwave radiative transfer parameterizations used in this paper are available at the Harvard Dataverse: <ext-link xlink:href="https://doi.org/10.7910/DVN/6F74LF" ext-link-type="DOI">10.7910/DVN/6F74LF</ext-link> (Belochitski and Krasnopolsky, 2021).</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d1e1713">AB and VK were responsible for conceptualization, methodology, the analysis of results, and writing. AB was responsible for the training data. VK carried out NN training. AB carried out NN validation. Both authors have read and agreed to the
published version of the paper.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d1e1719">The contact author has declared that neither they nor their co-author has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d1e1725">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p>
  </notes><?xmltex \hack{\newpage}?><ack><title>Acknowledgements</title><p id="d1e1732">The authors would like to thank Edoardo Bucchignani and the anonymous
reviewer for their thoughtful and incisive comments that helped to improve
the paper. We also thank Ruiyu Sun and Jun Wang for valuable help
with practical use of NCEP GFS and for useful discussions and consultations.
We thank Jack Kain, Fanglin Yang, and Vijay Tallapragada for their
support.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d1e1738">This research has been supported by the National Oceanic and Atmospheric Administration (NOAA) Weather Forecast Office (WFO) Improving Forecasting and Assimilation (IFAA) portfolio (contract number EA133W-17-CN-0016).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d1e1744">This paper was edited by Rohitash Chandra and reviewed by Edoardo Bucchignani and one anonymous referee.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bib1"><label>1</label><?label 1?><mixed-citation>Belochitski, A.: AlexBelochitski-NOAA/fv3atm_old_radiation_nn_emulator (v1.0), Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.4663160" ext-link-type="DOI">10.5281/zenodo.4663160</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib2"><label>2</label><?label 1?><mixed-citation>Belochitski, A., Binev, P., DeVore, R., Fox-Rabinovitz, M., Krasnopolsky,
V., and Lamby, P.: Tree approximation of the long wave radiation
parameterization in the NCAR CAM global climate model, J. Comput. Appl. Math., 236, 447–460,
<ext-link xlink:href="https://doi.org/10.1016/j.cam.2011.07.013" ext-link-type="DOI">10.1016/j.cam.2011.07.013</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib3"><label>3</label><?label 1?><mixed-citation>Belochitski, A. and Krasnopolsky, V.: Datasets for “Robustness of neural network emulations of radiative transfer parameterizations in a state-of-the-art general circulation model” [data set], <ext-link xlink:href="https://doi.org/10.7910/DVN/6F74LF" ext-link-type="DOI">10.7910/DVN/6F74LF</ext-link>, Harvard Dataverse, V1, 2021.</mixed-citation></ref>
      <ref id="bib1.bib4"><label>4</label><?label 1?><mixed-citation>Brenowitz, N. D., Beucler, T., Pritchard, M., and Bretherton, C. S.:
Interpreting and Stabilizing Machine-Learning Parametrizations of
Convection, J. Atmos. Sci., 77, 4357–4375,
<ext-link xlink:href="https://doi.org/10.1175/JAS-D-20-0082.1" ext-link-type="DOI">10.1175/JAS-D-20-0082.1</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib5"><label>5</label><?label 1?><mixed-citation>Boukabara, S.-A., Krasnopolsky, V., Stewart, J. Q., Maddy, E. S., Shahroudi, N.,
and Hoffman, R. N.: Leveraging modern artificial intelligence for remote
sensing and NWP: Benefits and Challenges, B. Am. Meteorol. Soc., 100, ES473–ES491, <ext-link xlink:href="https://doi.org/10.1175/BAMS-D-18-0324.1" ext-link-type="DOI">10.1175/BAMS-D-18-0324.1</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib6"><label>6</label><?label 1?><mixed-citation>Chantry, M., Hatfield, S., Dueben, P., Polichtchouk, I., and Palmer, T.: Machine learning emulation of gravity wave drag in numerical weather forecasting, J. Adv. Model. Earth Sy., 13, 1–20, <ext-link xlink:href="https://doi.org/10.1029/2021ms002477" ext-link-type="DOI">10.1029/2021ms002477</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib7"><label>7</label><?label 1?><mixed-citation>Chen, J.-H. and Lin, S.-J.: The remarkable predictability of inter-annual
variability of Atlantic hurricanes during the past decade, Geophys.
Res. Lett., 38, L11804, <ext-link xlink:href="https://doi.org/10.1029/2011GL047629" ext-link-type="DOI">10.1029/2011GL047629</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib8"><label>8</label><?label 1?><mixed-citation>Chevallier, F., Cheruy, F., Scott, N. A., and Chedin, A.: An neural network
approach for a fast and accurate computation of longwave radiative budget,
J. Appl. Meteorol., 37, 1385–1397, <ext-link xlink:href="https://doi.org/10.1175/1520-0450(1998)037&lt;1385:ANNAFA&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0450(1998)037&lt;1385:ANNAFA&gt;2.0.CO;2</ext-link>,
1998.</mixed-citation></ref>
      <ref id="bib1.bib9"><label>9</label><?label 1?><mixed-citation>Chevallier, F., Morcrette, J.-J., Chéruy, F., and Scott, N. A.: Use of
a neural-network-based longwave radiative transfer scheme in the ECMWF
atmospheric model, Q. J. Roy. Meteor. Soc.,
126, 761–776, <ext-link xlink:href="https://doi.org/10.1002/qj.49712656318" ext-link-type="DOI">10.1002/qj.49712656318</ext-link>, 2000.</mixed-citation></ref>
      <ref id="bib1.bib10"><label>10</label><?label 1?><mixed-citation>Edwards, J. M. and Slingo, A.: Studies with a flexible new radiation code:
1. Choosing a configuration for a large-scale model, Q. J. Roy. Meteor. Soc., 122, 689–719, <ext-link xlink:href="https://doi.org/10.1002/qj.49712253107" ext-link-type="DOI">10.1002/qj.49712253107</ext-link>, 1996.</mixed-citation></ref>
      <ref id="bib1.bib11"><label>11</label><?label 1?><mixed-citation>Elbrachter, D., Perekrestenko, D., Grohs, P., and Bölcskei, H.: Deep Neural
Network Approximation Theory, arXiv [preprint], <ext-link xlink:href="https://arxiv.org/abs/1901.02220">arXiv:1901.02220</ext-link>, 12 March 2021.</mixed-citation></ref>
      <ref id="bib1.bib12"><label>12</label><?label 1?><mixed-citation>Fu, Q. and Liou, K. N.: On the correlated k-distribution method for
radiative transfer in nonhomogeneous atmospheres, J. Atmos. Sci., 49, 2139–2156, <ext-link xlink:href="https://doi.org/10.1175/1520-0469(1992)049&lt;2139:OTCDMF&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0469(1992)049&lt;2139:OTCDMF&gt;2.0.CO;2</ext-link>,
1992.</mixed-citation></ref>
      <ref id="bib1.bib13"><label>13</label><?label 1?><mixed-citation>Han, J. and Pan, H.-L.:. Revision of convection and vertical diffusion
schemes in the NCEP global forecast system, Weather Forecast., 26,
520–533, <ext-link xlink:href="https://doi.org/10.1175/WAF-D-10-05038.1" ext-link-type="DOI">10.1175/WAF-D-10-05038.1</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib14"><label>14</label><?label 1?><mixed-citation>Han, J. and Bretherton, C. S.: TKE-based moist eddy-diffusivity mass-flux
(EDMF) parameterization for vertical turbulent mixing, Weather Forecast., 34, 869–886, <ext-link xlink:href="https://doi.org/10.1175/WAF-D-18-0146.1" ext-link-type="DOI">10.1175/WAF-D-18-0146.1</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib15"><label>15</label><?label 1?><mixed-citation>Hogan, R. and Bozzo, A.: Mitigating errors in surface temperature forecasts
using approximate radiation updates, J. Adv. Model. Earth. Syst., 7, 836–853,
<ext-link xlink:href="https://doi.org/10.1002/2015MS000455" ext-link-type="DOI">10.1002/2015MS000455</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib16"><label>16</label><?label 1?><mixed-citation>Hogan, R., Ahlgrimm, M., Balsamo, G., Beljaars, A., Berrisford, P., Bozzo, A., Di Giuseppe, F., Forbes, R. M., Haiden, T., Lang, S., Mayer, M., Polichtchouk, I., Sandu, I., Vitart, F. and Wedi, N.: Radiation in numerical weather prediction, ECMWF Technical Memorandum, 816, 1–49, <ext-link xlink:href="https://doi.org/10.21957/2bd5dkj8x" ext-link-type="DOI">10.21957/2bd5dkj8x</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib17"><label>17</label><?label 1?><mixed-citation>Hornik, K.: Approximation Capabilities of Multilayer Feedforward Network.
Neural Networks, 4, 251–257, <ext-link xlink:href="https://doi.org/10.1016/0893-6080(91)90009-T" ext-link-type="DOI">10.1016/0893-6080(91)90009-T</ext-link>, 1991.</mixed-citation></ref>
      <ref id="bib1.bib18"><label>18</label><?label 1?><mixed-citation>Hornik, K., Stinchcombe, M., and White, H.: Universal approximation of an
unknown mapping and its derivatives using multilayer feedforward network,
Neural Networks, 3, 551–560, <ext-link xlink:href="https://doi.org/10.1016/0893-6080(90)90005-6" ext-link-type="DOI">10.1016/0893-6080(90)90005-6</ext-link>, 1990.</mixed-citation></ref>
      <ref id="bib1.bib19"><label>19</label><?label 1?><mixed-citation>Janiskova, M., Mahfouf, J.-F., Morcrette, J.-J., and Chevallier, F.: Linearized
radiation and cloud schemes in the ECMWF model: Development and evaluation,
Q. J. Roy. Meteor. Soc., 128, 1505–1528, <ext-link xlink:href="https://doi.org/10.1002/qj.200212858306" ext-link-type="DOI">10.1002/qj.200212858306</ext-link>,
2002.</mixed-citation></ref>
      <ref id="bib1.bib20"><label>20</label><?label 1?><mixed-citation>Kain, J. S., Moorthi, S., Yang, F., Yang, R., Wei, H., Wu, Y., Hou, Y.-T.,
Lin, H.-M., Yudin, V. A., Alpert, J. C., Tallapragada, V., and Sun, R.:
Advances in model physics for the next implementation of the GFS (GFSv16),
AMS Annual Meeting, Boston, MA, 6A.3., 2020.</mixed-citation></ref>
      <ref id="bib1.bib21"><label>21</label><?label 1?><mixed-citation>Kashinath, K., Mustafa, M., Albert, A., Wu, J-L., Jiang, C., Esmaeilzadeh, S., Azizzadenesheli, K., Wang, R., Chattopadhyay, A., Singh, A., Manepalli, A., Chirila, D., Yu, R., Walters, R., White, B., Xiao, H., Tchelepi, H. A., Marcus, P., Anandkumar, A., Hassanzadeh, P. and Prabhat: Physics-informed machine learning: case studies for
weather and climate modelling, Phil. Trans. R. Soc. A, 379, 20200093,
<ext-link xlink:href="https://doi.org/10.1098/rsta.2020.0093" ext-link-type="DOI">10.1098/rsta.2020.0093</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib22"><label>22</label><?label 1?><mixed-citation>Kasim, M. F., Watson-Parris, D., Deaconu, L., Oliver, S., Hatfield, P., Froula, D. H., Gregori, G., Jarvis, M., Khatiwala, S., Korenaga, J., Topp-Mugglestone, J., Viezzer, E. and Vinko, S. M.: Building high accuracy emulators for scientific
simulations with deep neural architecture search,  arXiv [preprint], <ext-link xlink:href="https://arxiv.org/abs/2001.08055.pdf">arXiv:2001.08055.pdf</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib23"><label>23</label><?label 1?><mixed-citation>Krasnopolsky, V.: The Application of Neural Networks in the Earth System
Sciences. Neural Network Emulations for Complex Multidimensional Mappings,
Atmospheric and Oceanic Science Library, 46, Dordrecht, Heidelberg,
New York, London, Springer, 200 pp., ISBN 978-9-4007-6072-1, <ext-link xlink:href="https://doi.org/10.1007/978-94-007-6073-8" ext-link-type="DOI">10.1007/978-94-007-6073-8</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib24"><label>24</label><?label 1?><mixed-citation>Krasnopolsky, V., Fox-Rabinovitz, M. S., and Belochitski, A. A.: Decadal
climate simulations using accurate and fast neural network emulation of
full, longwave and shortwave, radiation, Mon. Weather Rev., 136, 3683–3695,
<ext-link xlink:href="https://doi.org/10.1175/2008MWR2385.1" ext-link-type="DOI">10.1175/2008MWR2385.1</ext-link>, 2008a.</mixed-citation></ref>
      <ref id="bib1.bib25"><label>25</label><?label 1?><mixed-citation>Krasnopolsky, V. M., Fox-Rabinovitz, M. S., Tolman, H. L., and Belochitski, A.
A.: Neural network approach for robust and fast calculation of physical
processes in numerical environmental models: Compound parameterization with
a quality control of larger errors, Neural Networks, 21, 535–543,
<ext-link xlink:href="https://doi.org/10.1016/j.neunet.2007.12.019" ext-link-type="DOI">10.1016/j.neunet.2007.12.019</ext-link>, 2008b.</mixed-citation></ref>
      <ref id="bib1.bib26"><label>26</label><?label 1?><mixed-citation>Krasnopolsky, V. M., Fox-Rabinovitz, M. S., Hou, Y.-T., Lord, S. J., and
Belochitski, A. A.: Accurate and Fast Neural Network Emulations of Model
Radiation for the NCEP Coupled Climate Forecast System: Climate Simulations
and Seasonal Predictions, Mon. Weather Rev., 138, 1822–1842, <ext-link xlink:href="https://doi.org/10.1175/2009MWR3149.1" ext-link-type="DOI">10.1175/2009MWR3149.1</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bib27"><label>27</label><?label 1?><mixed-citation>Krasnopolsky, V., Belochitski, A. A., Hou, Y.-T., Lord S., and Yang, F.:
Accurate and fast neural network emulations of long and short-wave radiation
for the NCEP Global Forecast System model, NCEP Office Note, 471, 1–36, available at: <uri>https://repository.library.noaa.gov/view/noaa/6951</uri> (last access: 29 November 2021), 2012.</mixed-citation></ref>
      <ref id="bib1.bib28"><label>28</label><?label 1?><mixed-citation>Krasnopolsky, V. M., Fox-Rabinovitz, M. S., and Belochitski, A. A.: Using
Ensemble of Neural Networks to Learn Stochastic Convection Parameterization
for Climate and Numerical Weather Prediction Models from Data Simulated by
Cloud Resolving Model, Advances in Artificial Neural Systems, 2013, 485913, <ext-link xlink:href="https://doi.org/10.1155/2013/485913" ext-link-type="DOI">10.1155/2013/485913</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib29"><label>29</label><?label 1?><mixed-citation>Lagerquist, R., Turner, D. D., Ebert-Uphoff, I., Hagerty, V., Kumler, C.,
and Stewart, J.: Deep Learning for Parameterization of Shortwave Radiative
Transfer, 20th Conference on Artificial Intelligence for Environmental Science, 101st AMS Annual Meeting, Virtual, 10–15 January 2021, 6.1,
2021.</mixed-citation></ref>
      <ref id="bib1.bib30"><label>30</label><?label 1?><mixed-citation>Leshno M., Lin, V. Ya., Pinkus, A., and Schocken, S.: Multilayer Feedforward
Networks With a Nonpolynomial Activation Function Can Approximate Any
Function, Neural Networks, 6, 861–867, <ext-link xlink:href="https://doi.org/10.1016/S0893-6080(05)80131-5" ext-link-type="DOI">10.1016/S0893-6080(05)80131-5</ext-link>, 1993.</mixed-citation></ref>
      <ref id="bib1.bib31"><label>31</label><?label 1?><mixed-citation>Lin, Y.-L., Farley, R. D., and Orville, H. D.: Bulk parameterization of the
snow field in a cloud model. J. Clim. Appl. Meteorol., 22,
1065–1092,
<ext-link xlink:href="https://doi.org/10.1175/1520-0450(1983)022&lt;1065:BPOTSF&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0450(1983)022&lt;1065:BPOTSF&gt;2.0.CO;2</ext-link>,
1983.</mixed-citation></ref>
      <ref id="bib1.bib32"><label>32</label><?label 1?><mixed-citation>Lu, Z., Pu, H., Wang, F., Hu, Z., and Wang, L.: The Expressive Power of Neural
Networks: A View from the Width, Advances in Neural Information Processing
Systems 30, Curran Associates, Inc.: 6231–6239, arXiv [preprint], <ext-link xlink:href="https://arxiv.org/abs/1709.02540">arXiv:1709.02540</ext-link>, 1 November 2017.</mixed-citation></ref>
      <ref id="bib1.bib33"><label>33</label><?label 1?><mixed-citation>Marshak, A. and Davis, A. B. (Eds.): 3D Radiative Transfer in Cloudy
Atmospheres, Springer, Berlin, 686 pp., ISBN 3-5402-3958-8, 2005.</mixed-citation></ref>
      <?pagebreak page7437?><ref id="bib1.bib34"><label>34</label><?label 1?><mixed-citation>Mlawer, E. J., Taubman, S. J., Brown, P. D., Iacono, M. J., and Clough, S. A.:
Radiative transfer for inhomogeneous atmospheres: RRTM, a validated
correlated-k model for the longwave,  J. Geophys. Res.-Atmos., 102, 16663–16682, <ext-link xlink:href="https://doi.org/10.1029/97JD00237" ext-link-type="DOI">10.1029/97JD00237</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bib35"><label>35</label><?label 1?><mixed-citation>Morcrette, J.-J., Mozdzynski, G., and Leutbecher, M.: A reduced radiation
grid for the ECMWF Integrated Forecasting System, Mon. Weather Rev., 136, 4760–4772,
<ext-link xlink:href="https://doi.org/10.1175/2008MWR2590.1" ext-link-type="DOI">10.1175/2008MWR2590.1</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bib36"><label>36</label><?label 1?><mixed-citation>Oreopoulos, L., Mlawer, E., Delamere, J., Shippert, T., Cole, J., Fomin, B., Iacono, M., Jin, Z., Li, J., Manners, J., Raisa- nen, P., Rose, F., Zhang, Y., Wilson, M. J., and Rossow, W. B.: The continual intercomparison of radiation codes: Results from phase I, J. Geophys. Res., 117, D06118, <ext-link xlink:href="https://doi.org/10.1029/2011JD016821" ext-link-type="DOI">10.1029/2011JD016821</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib37"><label>37</label><?label 1?><mixed-citation>Ott, J., Pritchard, M., Best, N., Linstead, E., Curcic, M., and Baldi, P.: A
Fortran-Keras deep learning bridge for scientific computing, Scientific
Programming, 2020, 8888811, <ext-link xlink:href="https://doi.org/10.1155/2020/8888811" ext-link-type="DOI">10.1155/2020/8888811</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib38"><label>38</label><?label 1?><mixed-citation>Pal, A., Mahajan, S., and Norman, M. R.: Using deep neural networks as
cost-effective surrogate models for Super-Parameterized E3SM radiative
transfer, Geophys. Res. Lett., 46, 6069–6079, <ext-link xlink:href="https://doi.org/10.1029/2018GL081646" ext-link-type="DOI">10.1029/2018GL081646</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib39"><label>39</label><?label 1?><mixed-citation>Pincus, R., Barker, H. W., and Morcrette, J.-J.: A fast, flexible, approximate
technique for computing radiative transfer in inhomogeneous clouds, J. Geophys. Res.-Atmos., 108,
4376, <ext-link xlink:href="https://doi.org/10.1029/2002JD003322" ext-link-type="DOI">10.1029/2002JD003322</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bib40"><label>40</label><?label 1?><mixed-citation>Pincus, R. and Stevens, B.: Monte Carlo Spectral Integration: a Consistent
Approximation for Radiative Transfer in Large Eddy Simulations, J. Adv. Model. Earth Syst., 1, 1,
<ext-link xlink:href="https://doi.org/10.3894/JAMES.2009.1.1" ext-link-type="DOI">10.3894/JAMES.2009.1.1</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bib41"><label>41</label><?label 1?><mixed-citation>Pincus, R. and Stevens, B.: Paths to accuracy for radiation
parameterizations in atmospheric models, J. Adv. Model. Earth Syst., 5, 225–233,
<ext-link xlink:href="https://doi.org/10.1002/jame.20027" ext-link-type="DOI">10.1002/jame.20027</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib42"><label>42</label><?label 1?><mixed-citation>Pincus, R., Mlawer, E. J., and Delamere, J. S.: Balancing accuracy, efficiency, and flexibility in radiation calculations for dynamical models, J. Adv. Model. Earth Sy., 11, 3074–3089, <ext-link xlink:href="https://doi.org/10.1029/2019MS001621" ext-link-type="DOI">10.1029/2019MS001621</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib43"><label>43</label><?label 1?><mixed-citation>Rasp, S.: Coupled online learning as a way to tackle instabilities and biases in neural network parameterizations: general algorithms and Lorenz 96 case study (v1.0), Geosci. Model Dev., 13, 2185–2196, <ext-link xlink:href="https://doi.org/10.5194/gmd-13-2185-2020" ext-link-type="DOI">10.5194/gmd-13-2185-2020</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib44"><label>44</label><?label 1?><mixed-citation>Roh, S. and Song, H.-J.: Evaluation of neural network emulations for
radiation parameterization in cloud resolving model, Geophys. Res. Lett., 47, e2020GL089444,
<ext-link xlink:href="https://doi.org/10.1029/2020GL089444" ext-link-type="DOI">10.1029/2020GL089444</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib45"><label>45</label><?label 1?><mixed-citation>Sussillo, D. and Abbott, L. F.: Random Walk Initialization for Training
Very Deep Feedforward Networks, arXiv [preprint], <ext-link xlink:href="https://arxiv.org/abs/1412.6558">arXiv:1412.6558</ext-link>, 27 February 2015.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bib46"><label>46</label><?label 1?><mixed-citation>Thompson, N. C., Greenewald, K., Lee, K., and Manso, G. F.: The Computational
Limits of Deep Learning, arXiv [preprint], <ext-link xlink:href="https://arxiv.org/abs/2007.05558">arxiv:/2007.05558</ext-link>, 10 July 2020.</mixed-citation></ref>
      <ref id="bib1.bib47"><label>47</label><?label 1?><mixed-citation>Turner, D. D., Tobin, D. C., Clough, S. A., Brown, P. D., Ellingson, R. G.,
Mlawer, E. J., Knuteson, R. O., Revercomb, H. E., Shippert, T. R., Smith, and M. W. Shephard, W. L.: The QME AERI LBLRTM: A closure experiment for
downwelling high spectral resolution infrared radiance, J. Atmos. Sci., 61,
2657–2675, <ext-link xlink:href="https://doi.org/10.1175/JAS3300.1" ext-link-type="DOI">10.1175/JAS3300.1</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bib48"><label>48</label><?label 1?><mixed-citation>Ukkonen, P., Pincus, R., Hogan, R. J., Nielsen, K. P., and Kaas, E.:
Accelerating radiation computations for dynamical models with targeted
machine learning and code optimization, J. Adv. Model. Earth Sy., 12, e2020MS002226, <ext-link xlink:href="https://doi.org/10.1029/2020MS002226" ext-link-type="DOI">10.1029/2020MS002226</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib49"><label>49</label><?label 1?><mixed-citation>Vapnik, V. N.: Complete Statistical Theory of Learning, Automation and
Remote Control, 80, 1949–1975, <ext-link xlink:href="https://doi.org/10.1134/S000511791911002X" ext-link-type="DOI">10.1134/S000511791911002X</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib50"><label>50</label><?label 1?><mixed-citation>Veerman, M. A., Pincus, R., Stoffer, R., van Leeuwen, C. M., Podareanu, D., and van
Heerwaarden, C. C.: Predicting atmospheric optical properties for radiative
transfer computations using neural networks, Phil. Trans. R. Soc. A, 379, 20200095, <ext-link xlink:href="https://doi.org/10.1098/rsta.2020.0095" ext-link-type="DOI">10.1098/rsta.2020.0095</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib51"><label>51</label><?label 1?><mixed-citation>Wu, H.: Global stability analysis of a general class of discontinuous neural
networks with linear growth activation functions, Inform. Sciences, 179, 3432–3441, <ext-link xlink:href="https://doi.org/10.1016/j.ins.2009.06.006" ext-link-type="DOI">10.1016/j.ins.2009.06.006</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bib52"><label>52</label><?label 1?><mixed-citation>Yuval, J. and O'Gorman, P.: Stable machine-learning parameterization of
subgrid processes for climate modeling at a range of resolutions, Nat.
Commun., 11, 3295, <ext-link xlink:href="https://doi.org/10.1038/s41467-020-17142-3" ext-link-type="DOI">10.1038/s41467-020-17142-3</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib53"><label>53</label><?label 1?><mixed-citation>Yuval, J., O'Gorman, P. A., and Hill, C. N.: Use of neural networks for
stable, accurate and physically consistent parameterization of subgrid
atmospheric processes with good performance at reduced precision.
Geophys. Res. Lett., 48, e2020GL091363,
<ext-link xlink:href="https://doi.org/10.1029/2020GL091363" ext-link-type="DOI">10.1029/2020GL091363</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib54"><label>54</label><?label 1?><mixed-citation>Zhao, Q. and Carr, F. H.: A Prognostic Cloud Scheme for Operational NWP
Models, Mon. Weather Rev., 125, 1931–1953, <ext-link xlink:href="https://doi.org/10.1175/1520-0493(1997)125&lt;1931:APCSFO&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0493(1997)125&lt;1931:APCSFO&gt;2.0.CO;2</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bib55"><label>55</label><?label 1?><mixed-citation>Zhou, L., Lin, S., Chen, J., Harris, L. M., Chen, X., and Rees, S. L.:
Toward Convective-Scale Prediction within the Next Generation Global
Prediction System, B. Am. Meteor. Soc., 100, 1225–124,
<ext-link xlink:href="https://doi.org/10.1175/BAMS-D-17-0246.1" ext-link-type="DOI">10.1175/BAMS-D-17-0246.1</ext-link>, 2019.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Robustness of neural network emulations of radiative transfer parameterizations in a state-of-the-art general circulation model</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>1</label><mixed-citation>Belochitski, A.: AlexBelochitski-NOAA/fv3atm_old_radiation_nn_emulator (v1.0), Zenodo [code], <a href="https://doi.org/10.5281/zenodo.4663160" target="_blank">https://doi.org/10.5281/zenodo.4663160</a>, 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>2</label><mixed-citation>Belochitski, A., Binev, P., DeVore, R., Fox-Rabinovitz, M., Krasnopolsky,
V., and Lamby, P.: Tree approximation of the long wave radiation
parameterization in the NCAR CAM global climate model, J. Comput. Appl. Math., 236, 447–460,
<a href="https://doi.org/10.1016/j.cam.2011.07.013" target="_blank">https://doi.org/10.1016/j.cam.2011.07.013</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>3</label><mixed-citation>Belochitski, A. and Krasnopolsky, V.: Datasets for “Robustness of neural network emulations of radiative transfer parameterizations in a state-of-the-art general circulation model” [data set], <a href="https://doi.org/10.7910/DVN/6F74LF" target="_blank">https://doi.org/10.7910/DVN/6F74LF</a>, Harvard Dataverse, V1, 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>4</label><mixed-citation>Brenowitz, N. D., Beucler, T., Pritchard, M., and Bretherton, C. S.:
Interpreting and Stabilizing Machine-Learning Parametrizations of
Convection, J. Atmos. Sci., 77, 4357–4375,
<a href="https://doi.org/10.1175/JAS-D-20-0082.1" target="_blank">https://doi.org/10.1175/JAS-D-20-0082.1</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>5</label><mixed-citation>Boukabara, S.-A., Krasnopolsky, V., Stewart, J. Q., Maddy, E. S., Shahroudi, N.,
and Hoffman, R. N.: Leveraging modern artificial intelligence for remote
sensing and NWP: Benefits and Challenges, B. Am. Meteorol. Soc., 100, ES473–ES491, <a href="https://doi.org/10.1175/BAMS-D-18-0324.1" target="_blank">https://doi.org/10.1175/BAMS-D-18-0324.1</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>6</label><mixed-citation>Chantry, M., Hatfield, S., Dueben, P., Polichtchouk, I., and Palmer, T.: Machine learning emulation of gravity wave drag in numerical weather forecasting, J. Adv. Model. Earth Sy., 13, 1–20, <a href="https://doi.org/10.1029/2021ms002477" target="_blank">https://doi.org/10.1029/2021ms002477</a>, 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>7</label><mixed-citation>Chen, J.-H. and Lin, S.-J.: The remarkable predictability of inter-annual
variability of Atlantic hurricanes during the past decade, Geophys.
Res. Lett., 38, L11804, <a href="https://doi.org/10.1029/2011GL047629" target="_blank">https://doi.org/10.1029/2011GL047629</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>8</label><mixed-citation>Chevallier, F., Cheruy, F., Scott, N. A., and Chedin, A.: An neural network
approach for a fast and accurate computation of longwave radiative budget,
J. Appl. Meteorol., 37, 1385–1397, <a href="https://doi.org/10.1175/1520-0450(1998)037&lt;1385:ANNAFA&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0450(1998)037&lt;1385:ANNAFA&gt;2.0.CO;2</a>,
1998.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>9</label><mixed-citation>Chevallier, F., Morcrette, J.-J., Chéruy, F., and Scott, N. A.: Use of
a neural-network-based longwave radiative transfer scheme in the ECMWF
atmospheric model, Q. J. Roy. Meteor. Soc.,
126, 761–776, <a href="https://doi.org/10.1002/qj.49712656318" target="_blank">https://doi.org/10.1002/qj.49712656318</a>, 2000.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>10</label><mixed-citation>Edwards, J. M. and Slingo, A.: Studies with a flexible new radiation code:
1. Choosing a configuration for a large-scale model, Q. J. Roy. Meteor. Soc., 122, 689–719, <a href="https://doi.org/10.1002/qj.49712253107" target="_blank">https://doi.org/10.1002/qj.49712253107</a>, 1996.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>11</label><mixed-citation>Elbrachter, D., Perekrestenko, D., Grohs, P., and Bölcskei, H.: Deep Neural
Network Approximation Theory, arXiv [preprint], <a href="https://arxiv.org/abs/1901.02220" target="_blank">arXiv:1901.02220</a>, 12 March 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>12</label><mixed-citation>Fu, Q. and Liou, K. N.: On the correlated k-distribution method for
radiative transfer in nonhomogeneous atmospheres, J. Atmos. Sci., 49, 2139–2156, <a href="https://doi.org/10.1175/1520-0469(1992)049&lt;2139:OTCDMF&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0469(1992)049&lt;2139:OTCDMF&gt;2.0.CO;2</a>,
1992.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>13</label><mixed-citation>Han, J. and Pan, H.-L.:. Revision of convection and vertical diffusion
schemes in the NCEP global forecast system, Weather Forecast., 26,
520–533, <a href="https://doi.org/10.1175/WAF-D-10-05038.1" target="_blank">https://doi.org/10.1175/WAF-D-10-05038.1</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>14</label><mixed-citation>Han, J. and Bretherton, C. S.: TKE-based moist eddy-diffusivity mass-flux
(EDMF) parameterization for vertical turbulent mixing, Weather Forecast., 34, 869–886, <a href="https://doi.org/10.1175/WAF-D-18-0146.1" target="_blank">https://doi.org/10.1175/WAF-D-18-0146.1</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>15</label><mixed-citation>Hogan, R. and Bozzo, A.: Mitigating errors in surface temperature forecasts
using approximate radiation updates, J. Adv. Model. Earth. Syst., 7, 836–853,
<a href="https://doi.org/10.1002/2015MS000455" target="_blank">https://doi.org/10.1002/2015MS000455</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>16</label><mixed-citation>Hogan, R., Ahlgrimm, M., Balsamo, G., Beljaars, A., Berrisford, P., Bozzo, A., Di Giuseppe, F., Forbes, R. M., Haiden, T., Lang, S., Mayer, M., Polichtchouk, I., Sandu, I., Vitart, F. and Wedi, N.: Radiation in numerical weather prediction, ECMWF Technical Memorandum, 816, 1–49, <a href="https://doi.org/10.21957/2bd5dkj8x" target="_blank">https://doi.org/10.21957/2bd5dkj8x</a>, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>17</label><mixed-citation>Hornik, K.: Approximation Capabilities of Multilayer Feedforward Network.
Neural Networks, 4, 251–257, <a href="https://doi.org/10.1016/0893-6080(91)90009-T" target="_blank">https://doi.org/10.1016/0893-6080(91)90009-T</a>, 1991.
</mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>18</label><mixed-citation>Hornik, K., Stinchcombe, M., and White, H.: Universal approximation of an
unknown mapping and its derivatives using multilayer feedforward network,
Neural Networks, 3, 551–560, <a href="https://doi.org/10.1016/0893-6080(90)90005-6" target="_blank">https://doi.org/10.1016/0893-6080(90)90005-6</a>, 1990.
</mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>19</label><mixed-citation>Janiskova, M., Mahfouf, J.-F., Morcrette, J.-J., and Chevallier, F.: Linearized
radiation and cloud schemes in the ECMWF model: Development and evaluation,
Q. J. Roy. Meteor. Soc., 128, 1505–1528, <a href="https://doi.org/10.1002/qj.200212858306" target="_blank">https://doi.org/10.1002/qj.200212858306</a>,
2002.
</mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>20</label><mixed-citation>Kain, J. S., Moorthi, S., Yang, F., Yang, R., Wei, H., Wu, Y., Hou, Y.-T.,
Lin, H.-M., Yudin, V. A., Alpert, J. C., Tallapragada, V., and Sun, R.:
Advances in model physics for the next implementation of the GFS (GFSv16),
AMS Annual Meeting, Boston, MA, 6A.3., 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>21</label><mixed-citation>Kashinath, K., Mustafa, M., Albert, A., Wu, J-L., Jiang, C., Esmaeilzadeh, S., Azizzadenesheli, K., Wang, R., Chattopadhyay, A., Singh, A., Manepalli, A., Chirila, D., Yu, R., Walters, R., White, B., Xiao, H., Tchelepi, H. A., Marcus, P., Anandkumar, A., Hassanzadeh, P. and Prabhat: Physics-informed machine learning: case studies for
weather and climate modelling, Phil. Trans. R. Soc. A, 379, 20200093,
<a href="https://doi.org/10.1098/rsta.2020.0093" target="_blank">https://doi.org/10.1098/rsta.2020.0093</a>, 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>22</label><mixed-citation>Kasim, M. F., Watson-Parris, D., Deaconu, L., Oliver, S., Hatfield, P., Froula, D. H., Gregori, G., Jarvis, M., Khatiwala, S., Korenaga, J., Topp-Mugglestone, J., Viezzer, E. and Vinko, S. M.: Building high accuracy emulators for scientific
simulations with deep neural architecture search,  arXiv [preprint], <a href="https://arxiv.org/abs/2001.08055.pdf" target="_blank">arXiv:2001.08055.pdf</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>23</label><mixed-citation>Krasnopolsky, V.: The Application of Neural Networks in the Earth System
Sciences. Neural Network Emulations for Complex Multidimensional Mappings,
Atmospheric and Oceanic Science Library, 46, Dordrecht, Heidelberg,
New York, London, Springer, 200 pp., ISBN 978-9-4007-6072-1, <a href="https://doi.org/10.1007/978-94-007-6073-8" target="_blank">https://doi.org/10.1007/978-94-007-6073-8</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>24</label><mixed-citation>Krasnopolsky, V., Fox-Rabinovitz, M. S., and Belochitski, A. A.: Decadal
climate simulations using accurate and fast neural network emulation of
full, longwave and shortwave, radiation, Mon. Weather Rev., 136, 3683–3695,
<a href="https://doi.org/10.1175/2008MWR2385.1" target="_blank">https://doi.org/10.1175/2008MWR2385.1</a>, 2008a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>25</label><mixed-citation>Krasnopolsky, V. M., Fox-Rabinovitz, M. S., Tolman, H. L., and Belochitski, A.
A.: Neural network approach for robust and fast calculation of physical
processes in numerical environmental models: Compound parameterization with
a quality control of larger errors, Neural Networks, 21, 535–543,
<a href="https://doi.org/10.1016/j.neunet.2007.12.019" target="_blank">https://doi.org/10.1016/j.neunet.2007.12.019</a>, 2008b.
</mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>26</label><mixed-citation>Krasnopolsky, V. M., Fox-Rabinovitz, M. S., Hou, Y.-T., Lord, S. J., and
Belochitski, A. A.: Accurate and Fast Neural Network Emulations of Model
Radiation for the NCEP Coupled Climate Forecast System: Climate Simulations
and Seasonal Predictions, Mon. Weather Rev., 138, 1822–1842, <a href="https://doi.org/10.1175/2009MWR3149.1" target="_blank">https://doi.org/10.1175/2009MWR3149.1</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>27</label><mixed-citation>Krasnopolsky, V., Belochitski, A. A., Hou, Y.-T., Lord S., and Yang, F.:
Accurate and fast neural network emulations of long and short-wave radiation
for the NCEP Global Forecast System model, NCEP Office Note, 471, 1–36, available at: <a href="https://repository.library.noaa.gov/view/noaa/6951" target="_blank"/> (last access: 29 November 2021), 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>28</label><mixed-citation>Krasnopolsky, V. M., Fox-Rabinovitz, M. S., and Belochitski, A. A.: Using
Ensemble of Neural Networks to Learn Stochastic Convection Parameterization
for Climate and Numerical Weather Prediction Models from Data Simulated by
Cloud Resolving Model, Advances in Artificial Neural Systems, 2013, 485913, <a href="https://doi.org/10.1155/2013/485913" target="_blank">https://doi.org/10.1155/2013/485913</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>29</label><mixed-citation>Lagerquist, R., Turner, D. D., Ebert-Uphoff, I., Hagerty, V., Kumler, C.,
and Stewart, J.: Deep Learning for Parameterization of Shortwave Radiative
Transfer, 20th Conference on Artificial Intelligence for Environmental Science, 101st AMS Annual Meeting, Virtual, 10–15 January 2021, 6.1,
2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>30</label><mixed-citation>Leshno M., Lin, V. Ya., Pinkus, A., and Schocken, S.: Multilayer Feedforward
Networks With a Nonpolynomial Activation Function Can Approximate Any
Function, Neural Networks, 6, 861–867, <a href="https://doi.org/10.1016/S0893-6080(05)80131-5" target="_blank">https://doi.org/10.1016/S0893-6080(05)80131-5</a>, 1993.
</mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>31</label><mixed-citation>Lin, Y.-L., Farley, R. D., and Orville, H. D.: Bulk parameterization of the
snow field in a cloud model. J. Clim. Appl. Meteorol., 22,
1065–1092,
<a href="https://doi.org/10.1175/1520-0450(1983)022&lt;1065:BPOTSF&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0450(1983)022&lt;1065:BPOTSF&gt;2.0.CO;2</a>,
1983.
</mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>32</label><mixed-citation>Lu, Z., Pu, H., Wang, F., Hu, Z., and Wang, L.: The Expressive Power of Neural
Networks: A View from the Width, Advances in Neural Information Processing
Systems 30, Curran Associates, Inc.: 6231–6239, arXiv [preprint], <a href="https://arxiv.org/abs/1709.02540" target="_blank">arXiv:1709.02540</a>, 1 November 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>33</label><mixed-citation>Marshak, A. and Davis, A. B. (Eds.): 3D Radiative Transfer in Cloudy
Atmospheres, Springer, Berlin, 686 pp., ISBN 3-5402-3958-8, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>34</label><mixed-citation>Mlawer, E. J., Taubman, S. J., Brown, P. D., Iacono, M. J., and Clough, S. A.:
Radiative transfer for inhomogeneous atmospheres: RRTM, a validated
correlated-k model for the longwave,  J. Geophys. Res.-Atmos., 102, 16663–16682, <a href="https://doi.org/10.1029/97JD00237" target="_blank">https://doi.org/10.1029/97JD00237</a>, 1997.
</mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>35</label><mixed-citation>Morcrette, J.-J., Mozdzynski, G., and Leutbecher, M.: A reduced radiation
grid for the ECMWF Integrated Forecasting System, Mon. Weather Rev., 136, 4760–4772,
<a href="https://doi.org/10.1175/2008MWR2590.1" target="_blank">https://doi.org/10.1175/2008MWR2590.1</a>, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>36</label><mixed-citation>Oreopoulos, L., Mlawer, E., Delamere, J., Shippert, T., Cole, J., Fomin, B., Iacono, M., Jin, Z., Li, J., Manners, J., Raisa- nen, P., Rose, F., Zhang, Y., Wilson, M. J., and Rossow, W. B.: The continual intercomparison of radiation codes: Results from phase I, J. Geophys. Res., 117, D06118, <a href="https://doi.org/10.1029/2011JD016821" target="_blank">https://doi.org/10.1029/2011JD016821</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>37</label><mixed-citation>Ott, J., Pritchard, M., Best, N., Linstead, E., Curcic, M., and Baldi, P.: A
Fortran-Keras deep learning bridge for scientific computing, Scientific
Programming, 2020, 8888811, <a href="https://doi.org/10.1155/2020/8888811" target="_blank">https://doi.org/10.1155/2020/8888811</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>38</label><mixed-citation>Pal, A., Mahajan, S., and Norman, M. R.: Using deep neural networks as
cost-effective surrogate models for Super-Parameterized E3SM radiative
transfer, Geophys. Res. Lett., 46, 6069–6079, <a href="https://doi.org/10.1029/2018GL081646" target="_blank">https://doi.org/10.1029/2018GL081646</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>39</label><mixed-citation>Pincus, R., Barker, H. W., and Morcrette, J.-J.: A fast, flexible, approximate
technique for computing radiative transfer in inhomogeneous clouds, J. Geophys. Res.-Atmos., 108,
4376, <a href="https://doi.org/10.1029/2002JD003322" target="_blank">https://doi.org/10.1029/2002JD003322</a>, 2003.
</mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>40</label><mixed-citation>Pincus, R. and Stevens, B.: Monte Carlo Spectral Integration: a Consistent
Approximation for Radiative Transfer in Large Eddy Simulations, J. Adv. Model. Earth Syst., 1, 1,
<a href="https://doi.org/10.3894/JAMES.2009.1.1" target="_blank">https://doi.org/10.3894/JAMES.2009.1.1</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>41</label><mixed-citation>Pincus, R. and Stevens, B.: Paths to accuracy for radiation
parameterizations in atmospheric models, J. Adv. Model. Earth Syst., 5, 225–233,
<a href="https://doi.org/10.1002/jame.20027" target="_blank">https://doi.org/10.1002/jame.20027</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>42</label><mixed-citation>Pincus, R., Mlawer, E. J., and Delamere, J. S.: Balancing accuracy, efficiency, and flexibility in radiation calculations for dynamical models, J. Adv. Model. Earth Sy., 11, 3074–3089, <a href="https://doi.org/10.1029/2019MS001621" target="_blank">https://doi.org/10.1029/2019MS001621</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>43</label><mixed-citation>Rasp, S.: Coupled online learning as a way to tackle instabilities and biases in neural network parameterizations: general algorithms and Lorenz 96 case study (v1.0), Geosci. Model Dev., 13, 2185–2196, <a href="https://doi.org/10.5194/gmd-13-2185-2020" target="_blank">https://doi.org/10.5194/gmd-13-2185-2020</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>44</label><mixed-citation>Roh, S. and Song, H.-J.: Evaluation of neural network emulations for
radiation parameterization in cloud resolving model, Geophys. Res. Lett., 47, e2020GL089444,
<a href="https://doi.org/10.1029/2020GL089444" target="_blank">https://doi.org/10.1029/2020GL089444</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>45</label><mixed-citation>Sussillo, D. and Abbott, L. F.: Random Walk Initialization for Training
Very Deep Feedforward Networks, arXiv [preprint], <a href="https://arxiv.org/abs/1412.6558" target="_blank">arXiv:1412.6558</a>, 27 February 2015.

</mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>46</label><mixed-citation>Thompson, N. C., Greenewald, K., Lee, K., and Manso, G. F.: The Computational
Limits of Deep Learning, arXiv [preprint], <a href="https://arxiv.org/abs/2007.05558" target="_blank">arxiv:/2007.05558</a>, 10 July 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>47</label><mixed-citation>Turner, D. D., Tobin, D. C., Clough, S. A., Brown, P. D., Ellingson, R. G.,
Mlawer, E. J., Knuteson, R. O., Revercomb, H. E., Shippert, T. R., Smith, and M. W. Shephard, W. L.: The QME AERI LBLRTM: A closure experiment for
downwelling high spectral resolution infrared radiance, J. Atmos. Sci., 61,
2657–2675, <a href="https://doi.org/10.1175/JAS3300.1" target="_blank">https://doi.org/10.1175/JAS3300.1</a>, 2004.
</mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>48</label><mixed-citation>Ukkonen, P., Pincus, R., Hogan, R. J., Nielsen, K. P., and Kaas, E.:
Accelerating radiation computations for dynamical models with targeted
machine learning and code optimization, J. Adv. Model. Earth Sy., 12, e2020MS002226, <a href="https://doi.org/10.1029/2020MS002226" target="_blank">https://doi.org/10.1029/2020MS002226</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>49</label><mixed-citation>Vapnik, V. N.: Complete Statistical Theory of Learning, Automation and
Remote Control, 80, 1949–1975, <a href="https://doi.org/10.1134/S000511791911002X" target="_blank">https://doi.org/10.1134/S000511791911002X</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>50</label><mixed-citation>Veerman, M. A., Pincus, R., Stoffer, R., van Leeuwen, C. M., Podareanu, D., and van
Heerwaarden, C. C.: Predicting atmospheric optical properties for radiative
transfer computations using neural networks, Phil. Trans. R. Soc. A, 379, 20200095, <a href="https://doi.org/10.1098/rsta.2020.0095" target="_blank">https://doi.org/10.1098/rsta.2020.0095</a>, 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>51</label><mixed-citation>Wu, H.: Global stability analysis of a general class of discontinuous neural
networks with linear growth activation functions, Inform. Sciences, 179, 3432–3441, <a href="https://doi.org/10.1016/j.ins.2009.06.006" target="_blank">https://doi.org/10.1016/j.ins.2009.06.006</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>52</label><mixed-citation>Yuval, J. and O'Gorman, P.: Stable machine-learning parameterization of
subgrid processes for climate modeling at a range of resolutions, Nat.
Commun., 11, 3295, <a href="https://doi.org/10.1038/s41467-020-17142-3" target="_blank">https://doi.org/10.1038/s41467-020-17142-3</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>53</label><mixed-citation>Yuval, J., O'Gorman, P. A., and Hill, C. N.: Use of neural networks for
stable, accurate and physically consistent parameterization of subgrid
atmospheric processes with good performance at reduced precision.
Geophys. Res. Lett., 48, e2020GL091363,
<a href="https://doi.org/10.1029/2020GL091363" target="_blank">https://doi.org/10.1029/2020GL091363</a>, 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>54</label><mixed-citation>Zhao, Q. and Carr, F. H.: A Prognostic Cloud Scheme for Operational NWP
Models, Mon. Weather Rev., 125, 1931–1953, <a href="https://doi.org/10.1175/1520-0493(1997)125&lt;1931:APCSFO&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0493(1997)125&lt;1931:APCSFO&gt;2.0.CO;2</a>, 1997.
</mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>55</label><mixed-citation>Zhou, L., Lin, S., Chen, J., Harris, L. M., Chen, X., and Rees, S. L.:
Toward Convective-Scale Prediction within the Next Generation Global
Prediction System, B. Am. Meteor. Soc., 100, 1225–124,
<a href="https://doi.org/10.1175/BAMS-D-17-0246.1" target="_blank">https://doi.org/10.1175/BAMS-D-17-0246.1</a>, 2019.
</mixed-citation></ref-html>--></article>
