<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">GMD</journal-id><journal-title-group>
    <journal-title>Geoscientific Model Development</journal-title>
    <abbrev-journal-title abbrev-type="publisher">GMD</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Geosci. Model Dev.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1991-9603</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/gmd-14-6977-2021</article-id><title-group><article-title>ENSO-ASC 1.0.0: ENSO deep learning forecast model <?xmltex \hack{\break}?>with a multivariate
air–sea coupler</article-title><alt-title>ENSO-ASC 1.0.0</alt-title>
      </title-group><?xmltex \runningtitle{ENSO-ASC 1.0.0}?><?xmltex \runningauthor{B. Mu et al.}?>
      <contrib-group>
        <contrib contrib-type="author" equal-contrib="yes" corresp="no" rid="aff1">
          <name><surname>Mu</surname><given-names>Bin</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" equal-contrib="yes" corresp="no" rid="aff1">
          <name><surname>Qin</surname><given-names>Bo</given-names></name>
          
        <ext-link>https://orcid.org/0000-0001-7093-6531</ext-link></contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Yuan</surname><given-names>Shijin</given-names></name>
          <email>yuanshijin2003@163.com</email>
        </contrib>
        <aff id="aff1"><label>1</label><institution>School of Software engineering, Tongji University, Shanghai, 201804,
China</institution>
        </aff><author-comment content-type="econtrib"><p>These authors contributed equally to this work.</p></author-comment>
      </contrib-group>
      <author-notes><corresp id="corr1">Shijin Yuan (yuanshijin2003@163.com)</corresp></author-notes><pub-date><day>17</day><month>November</month><year>2021</year></pub-date>
      
      <volume>14</volume>
      <issue>11</issue>
      <fpage>6977</fpage><lpage>6999</lpage>
      <history>
        <date date-type="received"><day>24</day><month>June</month><year>2021</year></date>
           <date date-type="rev-request"><day>14</day><month>July</month><year>2021</year></date>
           <date date-type="rev-recd"><day>3</day><month>October</month><year>2021</year></date>
           <date date-type="accepted"><day>10</day><month>October</month><year>2021</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2021 Bin Mu et al.</copyright-statement>
        <copyright-year>2021</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021.html">This article is available from https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021.html</self-uri><self-uri xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021.pdf">The full text article is available as a PDF file from https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d1e102">The El Niño–Southern Oscillation (ENSO) is an extremely complicated ocean–atmosphere
coupling event, the development and decay of which are usually modulated by
the energy interactions between multiple physical variables. In this paper,
we design a multivariate air–sea coupler (ASC) based on the graph using
features of multiple physical variables. On the basis of this coupler, an
ENSO deep learning forecast model (named ENSO-ASC) is proposed, whose
structure is adapted to the characteristics of the ENSO dynamics, including
the encoder and decoder for capturing and restoring the multi-scale spatial–temporal
correlations, and two attention weights for grasping the different air–sea
coupling strengths on different start calendar months and varied effects of
physical variables in ENSO amplitudes. In addition, two datasets modulated
to the same resolutions are used to train the model. We firstly tune the
model performance to optimal and compare it with the other state-of-the-art
ENSO deep learning forecast models. Then, we evaluate the ENSO forecast
skill from the contributions of different predictors, the effective lead
time with different start calendar months, and the forecast spatial
uncertainties, to further analyze the underlying ENSO mechanisms. Finally, we
make ENSO predictions over the validation period from 2014 to 2020.
Experiment results demonstrate that ENSO-ASC outperforms the other models.
Sea surface temperature (SST) and zonal wind are two crucial predictors. The
correlation skill of the Niño 3.4 index is over 0.78, 0.65, and 0.5 within the lead
time of 6, 12, and 18 months respectively. From two heat map analyses, we also discover the
common challenges in ENSO predictability, such as the forecasting skills
declining faster when making forecasts through June–July–August and the
forecast errors being more likely to show up in the western and central tropical
Pacific Ocean in longer-term forecasts. ENSO-ASC can simulate ENSO with
different strengths, and the forecasted SST and wind patterns reflect an
obvious Bjerknes positive feedback mechanism. These results indicate the
effectiveness and superiority of our model with the multivariate air–sea
coupler in predicting ENSO and analyzing the underlying
dynamic mechanisms in a sophisticated way.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d1e114">The El Niño–Southern Oscillation (ENSO) can induce global climate
extremes and ecosystem impacts (Zhang et al., 2016), which are the dominant
sources of interannual climate changes. The El Niño (La Niña) is the
ocean phenomena of ENSO and is usually considered as the large-scale
positive (negative) sea surface temperature (SST) anomalies in the tropical
Pacific Ocean. The Niño 3 (Niño 4) index is the common indicator for ENSO
research to measure the cold tongue (warm pool) variabilities, which is the
averaged SST anomalies covering the Niño 3 (Niño 4) region (see Fig. 1). Besides these two indicators, the ONI (oceanic Niño index, 3-month
running mean of SST anomalies in the Niño 3.4 region) has become the
de facto standard to identify the occurrence of El Niño and La Niña
events: if the ONIs of 5 consecutive months are over 0.5 <inline-formula><mml:math id="M1" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C (below
<inline-formula><mml:math id="M2" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.5 <inline-formula><mml:math id="M3" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C), El Niño (La Niña) occurs.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1"><?xmltex \currentcnt{1}?><?xmltex \def\figurename{Figure}?><label>Figure 1</label><caption><p id="d1e144">Regions most affected by ENSO events. The blue rectangle covers
the Niño 3 region
(<inline-formula><mml:math id="M4" display="inline"><mml:mn mathvariant="normal">5</mml:mn></mml:math></inline-formula><inline-formula><mml:math id="M5" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> N–<inline-formula><mml:math id="M6" display="inline"><mml:mn mathvariant="normal">5</mml:mn></mml:math></inline-formula><inline-formula><mml:math id="M7" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> S,
<inline-formula><mml:math id="M8" display="inline"><mml:mn mathvariant="normal">150</mml:mn></mml:math></inline-formula><inline-formula><mml:math id="M9" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> W–<inline-formula><mml:math id="M10" display="inline"><mml:mn mathvariant="normal">90</mml:mn></mml:math></inline-formula><inline-formula><mml:math id="M11" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> W),
and the green rectangle covers the Niño 4 region
(<inline-formula><mml:math id="M12" display="inline"><mml:mn mathvariant="normal">5</mml:mn></mml:math></inline-formula><inline-formula><mml:math id="M13" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> N–<inline-formula><mml:math id="M14" display="inline"><mml:mn mathvariant="normal">5</mml:mn></mml:math></inline-formula><inline-formula><mml:math id="M15" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> S,
<inline-formula><mml:math id="M16" display="inline"><mml:mn mathvariant="normal">160</mml:mn></mml:math></inline-formula><inline-formula><mml:math id="M17" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> E–<inline-formula><mml:math id="M18" display="inline"><mml:mn mathvariant="normal">150</mml:mn></mml:math></inline-formula><inline-formula><mml:math id="M19" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> W).</p></caption>
        <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f01.png"/>

      </fig>

      <p id="d1e275">Conventional forecast approaches mainly rely on numerical climate models.
However, it is worth noting that the model biases of traditional approach
have always been a problem for accurate ENSO predictions (Xue et al.,<?pagebreak page6978?> 2013).
In addition, many other intrinsic factors also limit the ENSO predictability
such as natural decadal variations in ENSO amplitudes. For example,
predictability tends to be higher when the ENSO cycle is strong than when it
is weak (Barnston et al., 2012; Balmaseda et al., 1995; McPhaden, 2012).
Recently, due to deluges of multi-source real-world geoscience data starting
to accumulate, e.g., remote sensing and buoy observation, meteorological
researchers were inspired to build lightweight and convenient data-driven
models at a low computational cost (Rolnick et al., 2019), which lead to a
wave of formulating ENSO forecast with deep learning techniques, producing
more skilful ENSO predictions (Ham et al., 2019).</p>
      <p id="d1e279">In the field of deep learning, ENSO prediction is usually regarded as
forecasting the future evolution tendency of SST and related Niño
indexes directly, subsequently analyzing the associated sophisticated
mechanisms, and measuring the intrinsic characteristics such as intensity and
duration. Therefore, the simplest but most practical forecast manners can be
divided into two categories intuitively: Niño index forecast and SST
pattern forecast.</p>
      <p id="d1e282">As for Niño index forecasting, many favorable neural networks have made
accurate predictions 6, 9 and 12 months ahead. For instance, ensemble
QESN (McDermott and Wikle, 2017), BAST-RNN (McDermott and Wikle, 2019) and
LSTM (long short-term memory) (Broni-Bedaiko et al., 2019) are
representative works. These studies demonstrate that the deep learning can
well capture the nonlinear characteristics of non-stationary time series and
attain outstanding regressions on Niño index.</p>
      <p id="d1e285">Notwithstanding the successful attempts on the Niño index regression,
there still exist many pitfalls in measuring ENSO forecast skills by only
one single scalar. For example, the important spatial–temporal energy
propagations and teleconnections cannot be described by the indexes. It may
lead to the blind pursuit of the accuracy of a certain indicator while
seriously hampering the grasp of underlying physical mechanisms. Therefore,
many studies are suggestive of exploiting spatial–temporal dependencies and
predicting the evolution of SST patterns. Ham et al. (2019) apply transfer
learning (Yosinski et al., 2014) to historical simulations from CMIP5
(Coupled Model Intercomparison Project phase5, Bellenger et al., 2014) and
reanalysis data with a CNN model to predict ENSO events, resulting in a
robust and long-term forecast for up to 1.5 years, which outperforms the
current numerical predictions. (Though the output of their model is still
the Niño 3.4 index, they construct the model and make forecasts by
absorbing the historical spatial–temporal features from variable patterns
instead of previous index records, so we mark this study as SST pattern
forecasts in this paper.) Mu et al. (2019) and He et al. (2019) built a
ConvLSTM (Shi et al., 2015) model to capture the spatial–temporal
dependencies of ENSO SST patterns over multiple time horizons and obtained
better predictions. Zheng et al. (2020) constructed a purely satellite-data-driven deep learning model to forecast the evolutions of tropical
instability wave, which is closely related to ENSO phenomena, and obtained
accurate and efficient forecasts. These deep learning models tend to
simulate the behaviors of numerical climate models, the inputs of which are
historical geoscience data and the outputs of which are the forecasted SST
patterns.</p>
      <p id="d1e288">The reason for the great progress in these works is no accident. On the one
hand, the deep learning models have much more complex structures and can mine
the complicated features hidden in the samples more effectively, which
allows them to be substantially more expressive with blending the
non-stationarity in temporal and the multi-scale teleconnections in spatial.
On the other hand, it is very convenient to migrate deep learning computer
vision technologies to ENSO forecasting due to the nature analogy between the
format of image/video frame data and meteorological time-series grid data,
which offers promises for extracting spatial–temporal mechanisms of ENSO via
advanced deep learning techniques. Therefore, the data-driven deep learning
can be a reliable alternative to traditional numerical models and a powerful
tool for the ENSO forecasting.</p>
      <p id="d1e291">However, there are still some obstacles in the deep learning modeling
process for ENSO forecasting. Very often, most existing models are confined to
limited or even single input predictors, such as only using historical SST
(and wind) data as the model input. Meanwhile, the climate deep learning
models are rarely adaptively customized to the specific physical mechanisms
of ENSO. These situations lead to poor interpretability and low confidence
of ENSO-related deep learning models. ENSO is an extremely complicated
ocean–atmosphere coupling event, and the development and decay phases are
closely associated with some crucial dynamic mechanisms and Walker
circulation (Bayr et al., 2020), whose status have great impacts. Walker
circulation is<?pagebreak page6979?> usually modulated by multi-physical variables (such as SST,
wind, precipitation, etc.), and there are always coupling interactions
between different variables. More specifically, the varieties of the Walker
circulation have strong temporal-lag effects on ENSO (“memory effects”).
The position of the ascending branch is also a very important climatic
condition for the occurrence of El Niño. Such a priori ENSO knowledge has
not been effectively used in deep learning model.</p>
      <p id="d1e294">Therefore, in order to further improve the ENSO prediction skill, there is
an essential principle that should be reflected in climate deep learning
models: subjectively incorporating the a priori ENSO knowledge into the deep
learning formalization and deriving hand-crafted features to make
predictions.</p>
      <p id="d1e298">In this paper, according to the important synergies of multiple variables in
crucial ENSO dynamic mechanisms and Walker circulation, we select six
indispensable variables (SST, <inline-formula><mml:math id="M20" display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> wind, <inline-formula><mml:math id="M21" display="inline"><mml:mi>v</mml:mi></mml:math></inline-formula> wind, rain, cloud, and vapor) that
are induced from ENSO-related key processes to build a multivariate air–sea
coupler (ASC) based on a graph mathematically, which emphasizes the energy
exchange between multiple variables. We then leverage this coupler to build
up the ENSO deep learning forecast model, named ENSO-ASC, with an
encoder–coupler–decoder structure to extract the multi-scale
spatial–temporal features of multiple physical variables. Two attention
weights are also proposed to grasp the different air–sea coupling strengths
on different start calendar months and varied effects of these variables. A
loss function combining MSE (mean squared error) and MAE (mean absolute
error) is used to guide the model training precisely, and SSIM (structural
similarity) (Wang et al., 2004) and PSNR (peak signal-to-noise ratio) are
used as metrics to evaluate the spatial consistency of the forecasted
patterns.</p>
      <p id="d1e315">Two datasets are applied for model training to ensure that the systematic
forecast errors are fully corrected after tuning by the higher quality
dataset: we first train the ENSO-ASC on the numerous reanalysis samples from
January 1850 to December 2015 and subsequently on the high-quality remote sensing
samples from December 1997 to December 2012 for fine-tuning. This procedure is also
known as transfer learning. These two datasets are modulated to the same
resolution. The validation period is from January 2014 to August 2020 in the remote sensing
dataset. The gap between the fine-tuning set and validation set is used to
remove the possible influence of oceanic memory (Ham et al., 2019).</p>
      <p id="d1e318">This is the first time that a multivariate air–sea coupler has been designed that considers energy interactions. We evaluate the ENSO-ASC from three aspects: firstly,
we evaluate the model performance from the perspective of model structure,
including the input sequence length, the benefits of transfer learning,
multivariate air–sea coupler, and the attention weights, and tune the model
structure to optimal. Then, we analyze the ENSO forecast skill of the
ENSO-ASC from the meteorological aspects, including the contributions of
different input physical variables, the effectiveness of forecast lead time,
the forecast skill changes with different start calendar months, and the
forecast spatial uncertainties. Subsequently, we make the real-world ENSO
simulations during the validation period by tracing the evolutions of
multiple physical variables. From the experiment results, ENSO-ASC performs
better in both SSIM and PSNR of the forecasted SST patterns, which
effectively raises the upper limitation of ENSO forecasts. The forecasted
ENSO events are more consistent with real-world observations and the related
Niño indexes have higher correlations with observations than traditional
methods and current state-of-the-art deep learning models, which is over
<inline-formula><mml:math id="M22" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.78</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">0.65</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">0.5</mml:mn></mml:mrow></mml:math></inline-formula> within the lead time of <inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:mn mathvariant="normal">6</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">12</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">18</mml:mn></mml:mrow></mml:math></inline-formula> months for Niño 3.4 index.
SST and zonal wind are two crucial predictors, which can be considered as
the major triggers of ENSO. A temporal heat map analysis illustrates that
the ENSO forecasting skills decline faster when making forecasts through
June–July–August, and a spatial heat map analysis shows that the forecast
errors are more likely to show up over the central tropical Pacific Ocean in
longer-term forecasts. Meanwhile, in the validation period from 2014
to 2020, the multivariate air–sea coupler can capture the latent ENSO
dynamical mechanisms and provide multivariate evolution simulations with a
high degree of physical consistency: The positive SST anomalies first show
up over the eastern equatorial Pacific with the westerly wind anomalies in
the western and central tropical Pacific Ocean (vice versa in the La
Niña events), which induces Bjerknes positive feedback mechanism. It is
worth noting that for the simulation of the 2015–2016 super El Niño,
ENSO-ASC captures its strong evolutions of SST anomalies over the northeast
subtropical Pacific in the peak phase and successfully predicts its
very-high-intensity and very-long-duration, while many dynamic or
statistical models fail. At the same time, ENSO-ASC can also reduce false
alarm rate such as in 2014. From the mathematical expression, the
multivariate air–sea coupler captures the spatial–temporal multi-scale
oscillations of the Walker circulation and performs the ocean–atmosphere energy
exchange simultaneously, which tries to avoid the interval flux exchange in
geoscience fluid programming of traditional numerical climate models. In
conclusion, the graph-based multivariate air–sea coupler not only exhibits
effectiveness and superiority to predict sophisticated climate phenomena, but is also a promising tool for exploiting the underlying dynamic mechanisms in
the future.</p>
      <p id="d1e353">The remainder of this paper is organized as follows. Section 2 introduces
the proposed multivariate air–sea coupler. Section 3 describes the ENSO deep
learning forecast model with the coupler (ENSO-ASC) in detail. Section 4
illustrates the datasets, experiment schemas and result analyses. Finally,
Sect. 5 offers further discussions and summarizes the paper.</p>
</sec>
<?pagebreak page6980?><sec id="Ch1.S2">
  <label>2</label><title>Multivariate air–sea coupler based on graph</title>
      <p id="d1e364">ENSO is the most dominant phenomenon of air–sea coupling over the equatorial
Pacific, and many complex dynamical mechanisms modulate the ENSO amplitudes.
Bjerknes positive feedback (Bjerknes, 1969) is one of the most significant
effects, the processes of which are highly related to the status of the Walker
circulation. There are energy interactions between the multiple physical
variables influenced by Walker circulation every moment, and the
ENSO-related SST varieties are greatly affected by such air–sea coupling
activities (Gao and Zhang, 2017; Lau et al., 1989; Lau et al., 1996).</p>
      <p id="d1e367">Many atmospheric and oceanic anomalies are known as triggers of ENSO
events, which establish the Bjerknes positive feedback. The warming SST
anomalies propagate to the central and eastern equatorial Pacific gradually.
As SST gradually rises, it is virtually impossible for the equatorial Pacific
to enter a never-ending warm state. Therefore, some negative feedback will
cause turnabouts from warm phases to cold phases (Wang et al., 2017). These
negative feedback mechanisms all emphasize air–sea interactions. For
example, westerly wind anomalies in the central tropical Pacific Ocean induce
the upwelling Rossby and downwelling Kevin oceanic waves, both of which
propagate and reflect on the continental boundary and then tend to push the warm
pool back to its original position in the western Pacific. From the
perspective of ENSO life cycle, atmospheric and oceanic variables play crucial
roles together.</p>
      <p id="d1e370">Meanwhile, during the development and decay phases of ENSO, there
also exist nonlinear interactions between atmospheric and oceanic variables.
Wind anomalies are the most obvious and direct response of the ENSO-driven
large-scale oceanic varieties, and they will change the ocean–atmosphere
heat transmissions (Cheng et al., 2019). Once the ocean status changes, the
thermal energy contained in the sea will escalate or dissipate into the air,
hindering or promoting the precipitation and surface humidity over the
equatorial Pacific. These changes also give feedback on the ENSO.</p>
      <p id="d1e373">Meteorological researchers have already identified the key physical
processes in ENSO in recent years. If such knowledge can be incorporated
into ENSO deep learning forecast modeling subjectively, breaking away from
the current limitation of using single predictors, the accuracy of ENSO
prediction will promise breakthroughs. In this paper, we choose six
ENSO-related indispensable variables from two different multivariate
datasets as shown in Table 1, which all have strong
correlations within the evolution of ENSO events according to Bjerknes
positive feedback and other dynamical processes. Furthermore, in order to
comprehensively represent the coupling interactions, a multivariate air–sea
coupler coupler(<inline-formula><mml:math id="M24" display="inline"><mml:mi>G</mml:mi></mml:math></inline-formula>) is designed to simulate their
synergies with an undirected graph <inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:mi>G</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mi>A</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>
as shown in Fig. 2, where
<inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:mi>V</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:msub><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>
represents the vertices of the graph and <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the feature of every
physical variable <inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. <inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:mi mathvariant="bold">A</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>
is the pre-designed adjacency matrix, where <inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>
represents the existing (non-existent) energy interactions between the
connected variables <inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. The variables exchange energies
simultaneously every moment, and the directions of edges in this graph can
be neglected because the energy interactions are two-way (transfer and
feedback).</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2"><?xmltex \currentcnt{2}?><?xmltex \def\figurename{Figure}?><label>Figure 2</label><caption><p id="d1e590">A description of our proposed multivariate air–sea coupler, which
utilizes the spatial–temporal features of multiple physical variables to
simulate the energy exchanging simultaneously.</p></caption>
        <?xmltex \igopts{width=213.395669pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f02.png"/>

      </fig>

      <p id="d1e599">Here, <inline-formula><mml:math id="M35" display="inline"><mml:mi>V</mml:mi></mml:math></inline-formula> in <inline-formula><mml:math id="M36" display="inline"><mml:mi>G</mml:mi></mml:math></inline-formula> is not the physical variable on a single grid point, but
the features of the entire variable pattern. The reason lies in the following: on the
one hand, the coupler will pay more attention to the global and local
spatial–temporal correlations in the variable fields of ENSO rather than the
variations on an isolated grid. On the other hand, the coupler will provide
a higher computational efficiency and consume a lower calculation resource
for ENSO forecasting. Improvements to the couplers, such as designing
individual graph for smaller-scale regions and even a single grid, are
future considerations.</p>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>ENSO-ASC: ENSO deep learning forecast model with the multivariate air–sea
coupler</title>
      <?pagebreak page6981?><p id="d1e624">Inspired by previous ENSO deep learning forecast models, we can define the
ENSO forecast as a multivariate spatial–temporal sequence forecast problem
as illustrated in Eq. (1),
          <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M37" display="block"><mml:mrow><?xmltex \hack{\hbox\bgroup\fontsize{9.5}{9.5}\selectfont$\displaystyle}?><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msup><mml:mi>s</mml:mi><mml:mi mathvariant="normal">scm</mml:mi></mml:msup><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mtext>sst,uwind,vwind,rain,cloud,vapor</mml:mtext><mml:mo mathvariant="italic">}</mml:mo><mml:mo>⊆</mml:mo><mml:msup><mml:mi>s</mml:mi><mml:mi mathvariant="normal">scm</mml:mi></mml:msup><?xmltex \hack{$\egroup}?><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
        where <inline-formula><mml:math id="M38" display="inline"><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mi mathvariant="normal">scm</mml:mi></mml:msup><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is <inline-formula><mml:math id="M39" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> multivariate observations in
historical <inline-formula><mml:math id="M40" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> months (<inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">6</mml:mn></mml:mrow></mml:math></inline-formula>), and <inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>H</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is the
prediction result for future <inline-formula><mml:math id="M43" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> months (<inline-formula><mml:math id="M44" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> can be also treated as forecast
lead time). scm<inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mtext>Jan</mml:mtext><mml:mo>,</mml:mo><mml:mtext>Feb</mml:mtext><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mtext>Dec</mml:mtext><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula> (start calendar
month) represents the last month in the input series <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mi mathvariant="normal">scm</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>. <inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> represents the forecast system (<inline-formula><mml:math id="M48" display="inline"><mml:mi mathvariant="italic">θ</mml:mi></mml:math></inline-formula> denotes the trainable parameters in
the system).</p>
      <p id="d1e817">In order to incorporate the multivariate coupler, we break down the
conventional formulation and redefine the multivariate ENSO forecast model
as the encoder–coupler–decoder structure shown as in Eqs. (2) to (4),

              <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M49" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E2"><mml:mtd><mml:mtext>2</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mtext>encoder:</mml:mtext><mml:mspace linebreak="nobreak" width="0.25em"/><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="normal">encoder</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msubsup><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>c</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E3"><mml:mtd><mml:mtext>3</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mtable rowspacing="0.2ex" class="split" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mtext>coupler:</mml:mtext><mml:mspace linebreak="nobreak" width="0.25em"/><mml:msub><mml:mi>f</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mtext>coupler</mml:mtext><mml:mfenced close=")" open="("><mml:mi>G</mml:mi></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mtext>coupler</mml:mtext><mml:mo>(</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:mi mathvariant="normal">…</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>A</mml:mi><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E4"><mml:mtd><mml:mtext>4</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mtext>decoder:</mml:mtext><mml:mspace linebreak="nobreak" width="0.25em"/><mml:msub><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="normal">decoder</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          where the <inline-formula><mml:math id="M50" display="inline"><mml:mrow><mml:msubsup><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi mathvariant="normal">scm</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> represents the individual
physical data and <inline-formula><mml:math id="M51" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the corresponding extracted
features by respective encoders. The coupler (<inline-formula><mml:math id="M52" display="inline"><mml:mi class="Radical" mathvariant="normal">⚫</mml:mi></mml:math></inline-formula>) simulates the
latent multivariate interactions on the physical features and the
pre-designed interaction graph <inline-formula><mml:math id="M53" display="inline"><mml:mi>A</mml:mi></mml:math></inline-formula>, where the operator <inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:mi mathvariant="normal">|</mml:mi><mml:mi mathvariant="normal">|</mml:mi></mml:mrow></mml:math></inline-formula>
represents the concatenation of features of different physical variables.
Then the respective decoders will restore the physical features end-to-end
by the coupled multivariate features <inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> and original physical
features <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, the concatenation of which can be regarded as
skip-layer connections. These connections can propagate the low-level
feature to high levels of the model directly, preserving the raw information
and accelerating feature transfers to some extent. The sub-modules
<inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="normal">encoder</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="normal" class="Radical">⚫</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:mtext>coupler</mml:mtext><mml:mo>(</mml:mo><mml:mi mathvariant="normal" class="Radical">⚫</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="normal">decoder</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi class="Radical" mathvariant="normal">⚫</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> form the ENSO-ASC together.</p>
      <p id="d1e1171">As mentioned before, the strength of the multivariate coupling and the
effects of multivariate temporal memories in ENSO are changing with
different input sequence length <inline-formula><mml:math id="M60" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> and forecast start calendar month scm.
In order to grasp such effects on forecasts, we design two self-supervised
attention weights, <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mi>M</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi>N</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, in the encoder and coupler respectively
to capture the dynamic time-series non-stationarity and re-weight the
multivariate contributions. The final formulation of the forecast model can be
written as shown in Eqs. (5) to  (7) and in Fig. 3, where <inline-formula><mml:math id="M63" display="inline"><mml:mo>∘</mml:mo></mml:math></inline-formula>
represents the element-wise multiplication.

              <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M64" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E5"><mml:mtd><mml:mtext>5</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mtext>encoder:</mml:mtext><mml:mspace width="0.25em" linebreak="nobreak"/><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="italic">α</mml:mi><mml:mo>∘</mml:mo><mml:msub><mml:mtext>encoder</mml:mtext><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msubsup><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>c</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E6"><mml:mtd><mml:mtext>6</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mtable class="split" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mtext>coupler:</mml:mtext><mml:mspace width="0.25em" linebreak="nobreak"/><mml:msub><mml:mi>f</mml:mi><mml:mi mathvariant="normal">c</mml:mi></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mtext>coupler</mml:mtext><mml:mfenced open="(" close=")"><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mo>∘</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mtext>coupler</mml:mtext><mml:mo>(</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>b</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:mi mathvariant="normal">…</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>b</mml:mi><mml:mi>N</mml:mi></mml:msub><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>A</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E7"><mml:mtd><mml:mtext>7</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mtext>decoder:</mml:mtext><mml:mspace width="0.25em" linebreak="nobreak"/><mml:msub><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="normal">decoder</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula></p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3" specific-use="star"><?xmltex \currentcnt{3}?><?xmltex \def\figurename{Figure}?><label>Figure 3</label><caption><p id="d1e1479">The structure of ENSO-ASC. There are six encoders for the chosen
variables to extract spatial–temporal information and a multivariate air–sea
coupler to simulate interactions. After interactions, we design six decoders
to restore the major variable SST and other variables. The training loss and
performance metrics are also displayed.</p></caption>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f03.png"/>

      </fig>

      <p id="d1e1488">In addition, there are basically two forecast strategies for ENSO
prediction: direct multi-step (DMS) and iterated multi-step (IMS)
(Chevillon, 2007). The former means predicting the future <inline-formula><mml:math id="M65" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula>th-month
multivariate pattern directly, and the latter means utilizing the forecast
output result as the input for future iterated predictions. Figure 4
displays the differences between DMS and IMS. In general, DMS is often
unstable and more difficult to train for a deep learning model (Shi and
Yeung, 2018). Therefore, we choose IMS to handle chaos data and provide more
accuracy predictions, that is, forecasting the next 1-month (<inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:mi>H</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>)
multivariate data in the model, and then using this output as model input to
continuously predict the future evolutions. We also design a combined loss
function to train our model and use two spatial metrics to evaluate the
forecast results. The intentions and detailed implementations of every part
in the model are interpreted as the following sections.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4" specific-use="star"><?xmltex \currentcnt{4}?><?xmltex \def\figurename{Figure}?><label>Figure 4</label><caption><p id="d1e1512">Two common ways in sequence prediction for ENSO forecasts: direct
multi-step (DMS) and iterated multi-step (IMS).</p></caption>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f04.png"/>

      </fig>

<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Encoder: stacked ConvLSTM layers for extracting spatial–temporal
features</title>
      <p id="d1e1528">The ENSO evolution has a strong correlation with historical
atmospheric and oceanic memory (W. Zhang et al., 2019). An ENSO deep learning
forecast model should be able to simultaneously extract the long-term
spatial–temporal features from multivariate geoscience grid data and
effectively mine the complicated nonlinearity hidden in the data. Stacked
ConvLSTM layers are constructed as the skeleton of the encoder (see orange
arrows in Fig. 5) for each chosen physical variable individually.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F5"><?xmltex \currentcnt{5}?><?xmltex \def\figurename{Figure}?><label>Figure 5</label><caption><p id="d1e1533">A detailed structure for encoder: a stacked ConvLSTM encoder for
extracting spatial–temporal features simultaneously. There is also temporal
attention weight for skip-layer connections in the grey box.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f05.png"/>

        </fig>

      <p id="d1e1542">In order to capture the multi-scale spatial teleconnections in ENSO
amplitudes, we set a 3D max-pool layer between two ConvLSTM layers
respectively (as blue arrows in Fig. 5), the stride of which on the time axis is
set to 1 to retain the sequence length <inline-formula><mml:math id="M67" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>. Considering these obtained
multi-scale spatial features after every 3D max-pool layers, we design the
skip-layer connections shown in the grey box in Fig. 5. These layers
propagate and cascade the raw features of the same variable from its encoder
(lower levels) to its decoder (higher levels) directly (See Fig. 9) like
dense connections (Huang et al., 2017). Such a structure can preserve more
details at multi-scale spatial teleconnections and also solve the problem of
gradient disappearance. In addition, we design the encoders and decoders to
be symmetric, ensuring that the feature maps in these connections have the
same shape.</p>
      <p id="d1e1553">Since we set <inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:mi>H</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> as the IMS forecast strategy, the feature maps on the
encoder all have a time axis, which the decoders do not have. The memory
effects on the forecast are mutative with different input sequence lengths
and forecast start calendar months; if we propagate all time steps' feature
maps from the encoder to the coupler and the decoder, it is too redundant
and even causes over-averaged forecast results, which hinders the
descriptions of the special seasonal amplitudes. Therefore, before the
skip-layer connections, we first determine the attention weights to
dynamically fuse multiple time steps' feature maps in the encoder, which can
capture the seasonal periodicity hidden in the physical variables and is
also called temporal attention weight <inline-formula><mml:math id="M69" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula> (shown as <inline-formula><mml:math id="M70" display="inline"><mml:mo>⊕</mml:mo></mml:math></inline-formula> symbols in Fig. 5).</p>
      <?pagebreak page6982?><p id="d1e1582">After obtaining sequential feature maps <inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mfenced close="]" open="["><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula> from each 3D max-pool layer, we first
flatten every time step feature map <inline-formula><mml:math id="M72" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>w</mml:mi><mml:mo>×</mml:mo><mml:mi>h</mml:mi><mml:mo>×</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> by the
width <inline-formula><mml:math id="M73" display="inline"><mml:mi>w</mml:mi></mml:math></inline-formula>, height <inline-formula><mml:math id="M74" display="inline"><mml:mi>h</mml:mi></mml:math></inline-formula>, and channel <inline-formula><mml:math id="M75" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula> as <inline-formula><mml:math id="M76" display="inline"><mml:mrow><mml:msubsup><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mo>′</mml:mo></mml:msubsup><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:mo>(</mml:mo><mml:mi>w</mml:mi><mml:mo>×</mml:mo><mml:mi>h</mml:mi><mml:mo>×</mml:mo><mml:mi>c</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and then cascade them together along the time axis as <inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:msup><mml:mi>T</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mfenced close="]" open="["><mml:mrow><mml:msubsup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">1</mml:mn><mml:mo>′</mml:mo></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2</mml:mn><mml:mo>′</mml:mo></mml:msubsup><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msubsup><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:mfenced><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula>,
where <inline-formula><mml:math id="M78" display="inline"><mml:mrow><mml:msup><mml:mi>T</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mo>×</mml:mo><mml:mo>(</mml:mo><mml:mi>w</mml:mi><mml:mo>×</mml:mo><mml:mi>h</mml:mi><mml:mo>×</mml:mo><mml:mi>c</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. We apply Eq. (8) on
<inline-formula><mml:math id="M79" display="inline"><mml:mrow><mml:msup><mml:mi>T</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> to determine the self-supervised attentive weight <inline-formula><mml:math id="M80" display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mi>M</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>
for each time step's feature map <inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>.<?xmltex \hack{\newpage}?>
            <disp-formula id="Ch1.E8" content-type="numbered"><label>8</label><mml:math id="M82" display="block"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mtext>softmax</mml:mtext><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:mrow></mml:msub><mml:mi>tanh⁡</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:msup><mml:mi>T</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M83" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>×</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:mrow></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> are
transformation matrices, <inline-formula><mml:math id="M85" display="inline"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> is a hyper parameter, and <inline-formula><mml:math id="M86" display="inline"><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>
and <inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>∈</mml:mo><mml:mi>R</mml:mi></mml:mrow></mml:math></inline-formula> are biases. Every dimension in <inline-formula><mml:math id="M88" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula> represents
the contribution to the forecast of corresponding time step, and we use Eq. (9) to fuse the original feature <inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mfenced close="]" open="["><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula>.
            <disp-formula id="Ch1.E9" content-type="numbered"><label>9</label><mml:math id="M90" display="block"><mml:mrow><mml:mover accent="true"><mml:mi>T</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mi>h</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>,</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:munderover><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:mo>∘</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M91" display="inline"><mml:mover accent="true"><mml:mi>T</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover></mml:math></inline-formula> is the aggregated feature map for skip-layer connections, and function <inline-formula><mml:math id="M92" display="inline"><mml:mrow><mml:mi>h</mml:mi><mml:mfenced close=")" open="("><mml:mi mathvariant="normal" class="Radical">⚫</mml:mi></mml:mfenced></mml:mrow></mml:math></inline-formula> represents the summary of element-wise
multiplication.</p>
      <?pagebreak page6983?><p id="d1e2161">The feature map sizes are described in Fig. 5 in detail. The sizes of
ConvLSTM kernels are all <inline-formula><mml:math id="M93" display="inline"><mml:mrow><mml:mn mathvariant="normal">3</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula> and the channel sizes are
<inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">4</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">8</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">16</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> during forward propagation, where the changes between
adjacent layers are smooth and small. The final output (with size of
<inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:mn mathvariant="normal">16</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">40</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">55</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> of the encoder is generated by a
convolution layer of <inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:mn mathvariant="normal">32</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> with stride 5 and output
a feature map with size of <inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:mn mathvariant="normal">8</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">11</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">32</mml:mn></mml:mrow></mml:math></inline-formula>, which is used to
filter the noise derived by such the deep-layer structure.
<?xmltex \hack{\newpage}?></p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Multivariate air–sea coupler: learning multivariate synergies via graph
convolution</title>
      <p id="d1e2259">From the perspective of ENSO dynamics, the occurrences of ENSO are
accompanied by energy interactions. Based on our formalization and chosen
physical variables, we define the corresponding adjacency matrix <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:mi mathvariant="bold">A</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mn mathvariant="normal">6</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">6</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and degree matrix <inline-formula><mml:math id="M99" display="inline"><mml:mrow><mml:mi mathvariant="bold">D</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mn mathvariant="normal">6</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">6</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> (the diagonal
matrix, the value on the diagonal indicates the number of other vertices
connected to this vertices) as in Eq. (10), which means all physical variables
have coupling interactions with each other (fully connected).
            <disp-formula id="Ch1.E10" content-type="numbered"><label>10</label><mml:math id="M100" display="block"><mml:mrow><mml:mi mathvariant="bold">A</mml:mi><mml:mo>=</mml:mo><mml:mfenced open="[" close="]"><mml:mtable class="array" columnalign="center center center"><mml:mtr><mml:mtd><mml:mn mathvariant="normal">1</mml:mn></mml:mtd><mml:mtd><mml:mi mathvariant="normal">⋯</mml:mi></mml:mtd><mml:mtd><mml:mn mathvariant="normal">1</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi mathvariant="normal">⋮</mml:mi></mml:mtd><mml:mtd><mml:mi mathvariant="normal">⋱</mml:mi></mml:mtd><mml:mtd><mml:mi mathvariant="normal">⋮</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn mathvariant="normal">1</mml:mn></mml:mtd><mml:mtd><mml:mi mathvariant="normal">⋯</mml:mi></mml:mtd><mml:mtd><mml:mn mathvariant="normal">1</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mfenced><mml:mi mathvariant="bold">D</mml:mi><mml:mo>=</mml:mo><mml:mfenced close="]" open="["><mml:mtable class="array" columnalign="center center center"><mml:mtr><mml:mtd><mml:mn mathvariant="normal">6</mml:mn></mml:mtd><mml:mtd><mml:mi mathvariant="normal">⋯</mml:mi></mml:mtd><mml:mtd><mml:mn mathvariant="normal">0</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi mathvariant="normal">⋮</mml:mi></mml:mtd><mml:mtd><mml:mi mathvariant="normal">⋱</mml:mi></mml:mtd><mml:mtd><mml:mi mathvariant="normal">⋮</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn mathvariant="normal">0</mml:mn></mml:mtd><mml:mtd><mml:mi mathvariant="normal">⋯</mml:mi></mml:mtd><mml:mtd><mml:mn mathvariant="normal">6</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:mfenced></mml:mrow></mml:math></disp-formula>
          In practical implementation mathematically, we use a graph Laplacian matrix
<inline-formula><mml:math id="M101" display="inline"><mml:mi mathvariant="bold">L</mml:mi></mml:math></inline-formula> to normalize the energy flow of original adjacency matrix <inline-formula><mml:math id="M102" display="inline"><mml:mi mathvariant="bold">A</mml:mi></mml:math></inline-formula> as in Eq.
(11), where <inline-formula><mml:math id="M103" display="inline"><mml:mrow><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is an identity matrix with the order <inline-formula><mml:math id="M104" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>×</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula>. <inline-formula><mml:math id="M105" display="inline"><mml:mi mathvariant="bold">L</mml:mi></mml:math></inline-formula>
can be considered as the directions in which the excess unstable energy will
propagate to other variables when the entire system is perturbed (such as
external wind forcing).
            <disp-formula id="Ch1.E11" content-type="numbered"><label>11</label><mml:math id="M106" display="block"><mml:mrow><mml:mi mathvariant="bold">L</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msup><mml:mi mathvariant="bold">D</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mstyle scriptlevel="+1"><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle></mml:mrow></mml:msup><mml:msup><mml:mi mathvariant="bold">AD</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mstyle scriptlevel="+1"><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle></mml:mrow></mml:msup></mml:mrow></mml:math></disp-formula></p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F6"><?xmltex \currentcnt{6}?><?xmltex \def\figurename{Figure}?><label>Figure 6</label><caption><p id="d1e2467">Multivariate coupling interactions within <inline-formula><mml:math id="M107" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-order neighbors
(taking SST as the center).</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f06.png"/>

        </fig>

      <p id="d1e2483">Meanwhile, the interactions between ENSO-related variables are cascaded,
which means that the effects between physical variables are multi-order as
depicted in Fig. 6. For example, the precipitation anomalies affect the wind
anomalies, which in turn affect the evolutions of SST, as depicted in Fig. 6
(right). According to the properties of the Laplacian matrix, <inline-formula><mml:math id="M108" display="inline"><mml:mrow><mml:msup><mml:mi>L</mml:mi><mml:mi>K</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> is
employed to determine the cascaded interactions between <inline-formula><mml:math id="M109" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-order neighbors.
So, if we consider <inline-formula><mml:math id="M110" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-order effects, the whole process is defined by Eq.
(12),
            <disp-formula id="Ch1.E12" content-type="numbered"><label>12</label><mml:math id="M111" display="block"><mml:mrow><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>K</mml:mi></mml:munderover><mml:msup><mml:mi mathvariant="normal">Θ</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup><mml:msup><mml:mi>L</mml:mi><mml:mi>K</mml:mi></mml:msup><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M112" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">Θ</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> represents the latent trainable multivariate
interactions. <inline-formula><mml:math id="M113" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula> represents the truncated order of effects concerned.
<inline-formula><mml:math id="M114" display="inline"><mml:mrow><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> represents the input features before coupling interactions and
<inline-formula><mml:math id="M115" display="inline"><mml:mrow><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> represents the coupled features. Each row in both <inline-formula><mml:math id="M116" display="inline"><mml:mrow><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and
<inline-formula><mml:math id="M117" display="inline"><mml:mrow><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> represents the same variables. Activation function <inline-formula><mml:math id="M118" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula>
increases the nonlinearity. Figure 7 illustrates the above process
mathematically, which is named as the <inline-formula><mml:math id="M119" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-order graph convolution layer.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F7" specific-use="star"><?xmltex \currentcnt{7}?><?xmltex \def\figurename{Figure}?><label>Figure 7</label><caption><p id="d1e2690"><inline-formula><mml:math id="M120" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-order graph convolution layer. <inline-formula><mml:math id="M121" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula> is the
truncated order and <inline-formula><mml:math id="M122" display="inline"><mml:mrow><mml:msup><mml:mi>L</mml:mi><mml:mi>K</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> represents the interactions
between <inline-formula><mml:math id="M123" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-order neighbors. Each row in
<inline-formula><mml:math id="M124" display="inline"><mml:mrow><mml:msup><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M125" display="inline"><mml:mrow><mml:msup><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> represents
features of different physical variables, and their positions are not changed
during forward propagation.</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f07.png"/>

        </fig>

      <p id="d1e2766">The <inline-formula><mml:math id="M126" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-order graph convolution network (GCN) in Eq. (12) is actually the
higher-order extension of original GCN (Bruna et al., 2013). Furthermore, we
use Chebyshev polynomial <inline-formula><mml:math id="M127" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi>K</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mi>L</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> to approximate the higher-order
polynomial <inline-formula><mml:math id="M128" display="inline"><mml:mrow><mml:mfenced open="[" close="]"><mml:mrow><mml:msup><mml:mi>L</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi>L</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>L</mml:mi><mml:mi>K</mml:mi></mml:msup></mml:mrow></mml:mfenced><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:math></inline-formula> to accelerate calculation, where
<inline-formula><mml:math id="M129" display="inline"><mml:mrow><mml:mover accent="true"><mml:mi>L</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mi>L</mml:mi><mml:mo>/</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>max⁡</mml:mo></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> scales <inline-formula><mml:math id="M130" display="inline"><mml:mi>L</mml:mi></mml:math></inline-formula> within
<inline-formula><mml:math id="M131" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> to satisfy the Chebyshev polynomial and <inline-formula><mml:math id="M132" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>max⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula> is
the maximum Eigen value of <inline-formula><mml:math id="M133" display="inline"><mml:mi>L</mml:mi></mml:math></inline-formula> (Hammond et al., 2011; Defferrard et al.,
2016). This approximation accelerates the calculation of the <inline-formula><mml:math id="M134" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-order GCN
by reducing the computational complexity from <inline-formula><mml:math id="M135" display="inline"><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>
to <inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:mi mathvariant="normal">O</mml:mi><mml:mo>(</mml:mo><mml:mi>K</mml:mi><mml:mfenced close="|" open="|"><mml:mi mathvariant="italic">ε</mml:mi></mml:mfenced><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M137" display="inline"><mml:mrow><mml:mfenced close="|" open="|"><mml:mi mathvariant="italic">ε</mml:mi></mml:mfenced></mml:mrow></mml:math></inline-formula> is the edge count in the graph). Based on such neural structure, we
construct the multivariate air–sea coupler (ASC) to learn synergies related
to ENSO as in Fig. 8.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F8"><?xmltex \currentcnt{8}?><?xmltex \def\figurename{Figure}?><label>Figure 8</label><caption><p id="d1e2973">A detailed structure for multivariate air–sea coupler between
encoder and decoder: pre-processes for input (upper row), and dual-layer
structure for residual learning (lower row). There is also a multivariate
attention weight in the coupler.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f08.png"/>

        </fig>

      <p id="d1e2982">After obtaining the spatial–temporal features (Such as the colored feature
maps in Fig. 8) from multivariate encoders respectively, we first flatten and
cascade them as <inline-formula><mml:math id="M138" display="inline"><mml:mrow><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> like the blue box in Fig. 8. As mentioned above,
each row of <inline-formula><mml:math id="M139" display="inline"><mml:mrow><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> represents different variables. <inline-formula><mml:math id="M140" display="inline"><mml:mrow><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is marked as
multivariate feature map and acts as the input of the coupler. The coupler
is designed as a dual-layer summation structure like the yellow box in Fig. 8. The input for the second layer is the sum of the input and the output of
the first layer, and the output of the second layer is determined by the
weighted fusion of the outputs of these two layers in the manner designed by
Chen et al. (2019), which is the residual learning to enhance the
generalization ability of the network (He et al., 2016).</p>
      <p id="d1e3033">Because the variables contribute differently to the ENSO forecast, especially
in different start calendar months, we propose the multivariate
self-supervised attention weight for determining the effects for the input
physical variables as <inline-formula><mml:math id="M141" display="inline"><mml:mo>⊗</mml:mo></mml:math></inline-formula> symbols in Fig. 8. Before <inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>
passes into the multivariate coupler, the weight <inline-formula><mml:math id="M143" display="inline"><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mi>N</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> for each
variable is determined by Eq. (13).
            <disp-formula id="Ch1.E13" content-type="numbered"><label>13</label><mml:math id="M144" display="block"><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mo>=</mml:mo><mml:mtext>softmax</mml:mtext><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mi>tanh⁡</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi>p</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M145" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M146" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> are
transformation matrices, <inline-formula><mml:math id="M147" display="inline"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> is a hyper parameter, and <inline-formula><mml:math id="M148" display="inline"><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>
and <inline-formula><mml:math id="M149" display="inline"><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>∈</mml:mo><mml:mi>R</mml:mi></mml:mrow></mml:math></inline-formula> are biases. Then we use Eq. (14) to calculate the
modulated multi-physical variable feature map, where <inline-formula><mml:math id="M150" display="inline"><mml:mrow><mml:mi>g</mml:mi><mml:mo>(</mml:mo><mml:mi class="Radical" mathvariant="normal">⚫</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>
represents the element-wise multiplication.
            <disp-formula id="Ch1.E14" content-type="numbered"><label>14</label><mml:math id="M151" display="block"><mml:mrow><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>g</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:mi mathvariant="italic">β</mml:mi><mml:mo movablelimits="false">⊡</mml:mo><mml:msup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></disp-formula>
          In the multivariate air–sea coupler, the corresponding locations of physical
variables on the input feature map and output feature map are fixed. For
example, if we set the SST feature in the last row of the input multivariate
feature map<?pagebreak page6984?> of the coupler, the SST feature will be in the last row of the
output multivariate feature map as shown in Fig. 7 and will be propagated
to the decoder for pattern prediction later.</p>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Decoder: end-to-end learning to restore the forecasted multivariate
patterns</title>
      <p id="d1e3312">ENSO evolution is considered as a hydrodynamic process. Meteorologists
usually use linear methods, such as empirical orthogonal function (EOF) or singular value decomposition (SVD) methods to extract features and then analyze the potential characteristics and predict the future evolution
of ENSO. In these methods, complex dynamical processes are usually simplified
to facilitate calculations while unknown detailed processes are not
comprehensively revealed or even neglected, which leads to low prediction
accuracy. Therefore, we use the end-to-end learning to restore the
evolutions of multi-physical variable patterns. The multi-scale
spatial–temporal correlations should be also considered in this process, so
the decoder consists of stacked transform-convolution layers and up-sampling
layers.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F9"><?xmltex \currentcnt{9}?><?xmltex \def\figurename{Figure}?><label>Figure 9</label><caption><p id="d1e3317">A detailed structure for decoder, skip-layer connections from
encoder for helping end-to-end learning to restore the forecasted patterns
at different spatial scales.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f09.png"/>

        </fig>

      <p id="d1e3326">From the output feature map of the multivariate air–sea coupler, we pick up
the corresponding row (taking SST as an example) as <inline-formula><mml:math id="M152" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi mathvariant="normal">SST</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (such as the red row and red circle in Fig. 8) and reshape it into original shape
<inline-formula><mml:math id="M153" display="inline"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi mathvariant="normal">SST</mml:mi><mml:mo>′</mml:mo></mml:msubsup><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>w</mml:mi><mml:mo>×</mml:mo><mml:mi>h</mml:mi><mml:mo>×</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. Then <inline-formula><mml:math id="M154" display="inline"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi mathvariant="normal">SST</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> is gradually
amplified and restored in the decoder by the stacked transform–convolution
layers and up-sampling layers (see Fig. 9). Skip-layer feature maps from the
encoder are cascaded with corresponding layers with the same shape. The
sizes of convolution kernels are all <inline-formula><mml:math id="M155" display="inline"><mml:mrow><mml:mn mathvariant="normal">3</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula> which is the same with
that in the encoder, and the channel sizes are <inline-formula><mml:math id="M156" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">16</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">8</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">4</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> to shrink
the channel size gradually during forward propagation.</p>
</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>Loss functions for model training</title>
      <?pagebreak page6985?><p id="d1e3430">Our goal is to predict the evolutions of multiple physical variables (marked
as <inline-formula><mml:math id="M157" display="inline"><mml:mrow><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> as accurately as possible compared with the real-world
observation <inline-formula><mml:math id="M158" display="inline"><mml:mi>s</mml:mi></mml:math></inline-formula>. Therefore, we combine two different measurements together
as the loss function <inline-formula><mml:math id="M159" display="inline"><mml:mi>l</mml:mi></mml:math></inline-formula> in Eq. (15) to ensure the result precision
of multivariate patterns grid by grid,
            <disp-formula id="Ch1.E15" content-type="numbered"><label>15</label><mml:math id="M160" display="block"><mml:mrow><mml:mfenced close="" open="{"><mml:mrow><mml:mtable class="array" columnalign="left left"><mml:mtr><mml:mtd><mml:mrow/></mml:mtd><mml:mtd><mml:mrow><mml:mtext>MSE</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mi mathvariant="normal">Ω</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mstyle><mml:msub><mml:mo>∑</mml:mo><mml:mi>N</mml:mi></mml:msub><mml:msub><mml:mo>∑</mml:mo><mml:mi mathvariant="normal">Ω</mml:mi></mml:msub><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mtext>MAE</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mi mathvariant="normal">Ω</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mstyle><mml:msub><mml:mo>∑</mml:mo><mml:mi>N</mml:mi></mml:msub><mml:msub><mml:mo>∑</mml:mo><mml:mi mathvariant="normal">Ω</mml:mi></mml:msub><mml:mfenced open="|" close="|"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="normal">MSE</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="normal">MAE</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo>,</mml:mo><mml:mfenced open="(" close=")"><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:mfenced><mml:mo>∈</mml:mo><mml:mi mathvariant="normal">Ω</mml:mi></mml:mrow></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M161" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> is the number of variables and <inline-formula><mml:math id="M162" display="inline"><mml:mi mathvariant="normal">Ω</mml:mi></mml:math></inline-formula> represents the number of
grid points for every physical pattern; <inline-formula><mml:math id="M163" display="inline"><mml:mrow><mml:mfenced close=")" open="("><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:math></inline-formula> represent
different latitude and longitude; <inline-formula><mml:math id="M164" display="inline"><mml:mi>l</mml:mi></mml:math></inline-formula> is the sum of MSE and
MAE, where MSE is used to preserve the smoothness of the forecasted
patterns; and MAE is used to retain the peak distribution of all grid
points.</p>
</sec>
<sec id="Ch1.S3.SS5">
  <label>3.5</label><title>Metrics to evaluate the forecast results</title>
      <p id="d1e3657">According to the loss function, the calculation processes in Eq. (15) mainly
focus on the comparisons of a single grid in fields. However, the detailed
spatial distributions of every physical variable, such as the location of
the max value region for SST and wind anomalies, are more important in the
ENSO forecast. Therefore, we use the following two common spatial
metrics for the forecasted patterns to evaluate the ENSO forecast skill:
PSNR and SSIM as in Eqs. (16) and (17).

                <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M165" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E16"><mml:mtd><mml:mtext>16</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mtext>PSNR</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">10</mml:mn><mml:mo>×</mml:mo><mml:msub><mml:mi>log⁡</mml:mi><mml:mn mathvariant="normal">10</mml:mn></mml:msub><mml:mfenced close=")" open="("><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msup><mml:mi mathvariant="normal">MAX</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mi mathvariant="normal">MSE</mml:mi></mml:mfrac></mml:mstyle></mml:mfenced></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E17"><mml:mtd><mml:mtext>17</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mfenced close="" open="{"><mml:mtable class="array" columnalign="left left"><mml:mtr><mml:mtd><mml:mrow/></mml:mtd><mml:mtd><mml:mrow><mml:mtext>luminance</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:msubsup><mml:mi mathvariant="italic">μ</mml:mi><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:msubsup><mml:mi mathvariant="italic">μ</mml:mi><mml:mi mathvariant="normal">s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">μ</mml:mi><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">μ</mml:mi><mml:mi mathvariant="normal">s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mtext>contrast</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mtext>structure</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:msub><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">s</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mtext>SSIM</mml:mtext><mml:mo>=</mml:mo><mml:msup><mml:mtext>luminance</mml:mtext><mml:mi>a</mml:mi></mml:msup><mml:mo>⋅</mml:mo><mml:msup><mml:mtext>contrast</mml:mtext><mml:mi>b</mml:mi></mml:msup><mml:mo>⋅</mml:mo><mml:msup><mml:mtext>structure</mml:mtext><mml:mi>c</mml:mi></mml:msup></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mfenced></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

            In PSNR, MAX is the maximum among all grids. In ENSO-ASC, before the
historical multivariate data are propagated into the model, we first
normalize them in the range <inline-formula><mml:math id="M166" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> as in Eq. (18). Therefore, MAX
is set to 1.
            <disp-formula id="Ch1.E18" content-type="numbered"><label>18</label><mml:math id="M167" display="block"><mml:mrow><mml:msup><mml:mi>x</mml:mi><mml:mo>∗</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mo>max⁡</mml:mo></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:math></disp-formula>
          SSIM is a combined metric of luminance, contrast, and structure between two
patterns. <inline-formula><mml:math id="M168" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:msub></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M169" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi mathvariant="normal">s</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the average for
<inline-formula><mml:math id="M170" display="inline"><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:math></inline-formula> (<inline-formula><mml:math id="M171" display="inline"><mml:mi>s</mml:mi></mml:math></inline-formula>), and <inline-formula><mml:math id="M172" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover></mml:msub></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M173" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">s</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the
corresponding standard deviations. <inline-formula><mml:math id="M174" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mrow><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the
covariance, and <inline-formula><mml:math id="M175" display="inline"><mml:mrow><mml:mi>a</mml:mi><mml:mo>=</mml:mo><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> for fair measurement of every ingredient of SSIM.
<inline-formula><mml:math id="M176" display="inline"><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M177" display="inline"><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M178" display="inline"><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> are all trivial values for preventing the
denominator from being 0.</p>
      <p id="d1e4103">Besides these two metrics, the correlations between the calculated and the
official Niño indexes will be also used to evaluate forecast skills.</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Experiment results and analysis</title>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Dataset description</title>
      <p id="d1e4122">After the deep learning model structure is determined, the quantity and
quality of the training dataset affect the forecast performance decisively. As
the improvement of observation ability, there are growing ways to provide
multiple real-world observations, such as remote sensing satellites and buoy
observation, which is more and more beneficial to building our ENSO forecast
model. However, one of the biggest limitations in high-quality climate
datasets is that the real-world observation period is too short to provide
adequate samples. For example, extensive satellite observations have started
in the 1980s, and the number of El Niño that occurred ever since then is
also small, which can easily lead to the under-fitting of the deep learning
network.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1" specific-use="star"><?xmltex \currentcnt{1}?><label>Table 1</label><caption><p id="d1e4128">Multi-physical variables in the corresponding two datasets.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left" colsep="1"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry namest="col1" nameend="col2" align="center" colsep="1">NOAA/CIRES </oasis:entry>
         <oasis:entry namest="col3" nameend="col4" align="center">REMSS </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Variable</oasis:entry>
         <oasis:entry colname="col2">Description</oasis:entry>
         <oasis:entry colname="col3">Variable</oasis:entry>
         <oasis:entry colname="col4">Description</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">SST</oasis:entry>
         <oasis:entry colname="col2">Sea surface temperature</oasis:entry>
         <oasis:entry colname="col3">SST</oasis:entry>
         <oasis:entry colname="col4">Sea surface temperature</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">PWAT</oasis:entry>
         <oasis:entry colname="col2">Precipitation water (atmospheric column)</oasis:entry>
         <oasis:entry colname="col3">RAIN</oasis:entry>
         <oasis:entry colname="col4">Rate of liquid water precipitation</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CWAT</oasis:entry>
         <oasis:entry colname="col2">Cloud water (atmospheric column)</oasis:entry>
         <oasis:entry colname="col3">CLOUD</oasis:entry>
         <oasis:entry colname="col4">Total cloud liquid water (atmospheric column)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">RH</oasis:entry>
         <oasis:entry colname="col2">Surface relative humidity</oasis:entry>
         <oasis:entry colname="col3">VAPOR</oasis:entry>
         <oasis:entry colname="col4">Total gaseous water (atmospheric column)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">UWIND</oasis:entry>
         <oasis:entry colname="col2">Surface zonal wind speed</oasis:entry>
         <oasis:entry colname="col3">UWIND</oasis:entry>
         <oasis:entry colname="col4">Zonal wind speed</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">VWIND</oasis:entry>
         <oasis:entry colname="col2">Surface meridional wind speed</oasis:entry>
         <oasis:entry colname="col3">VWIND</oasis:entry>
         <oasis:entry colname="col4">Meridional wind speed</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d1e4131">Note: The reanalysis dataset is provided by NOAA/CIRES, which is from January 1850
to December 2015 with 2 by 2<inline-formula><mml:math id="M179" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>, and the remote sensing dataset is provided by
REMSS, which is from December 1997 to August 2020 with 0.25 by 0.25<inline-formula><mml:math id="M180" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>. In REMSS,
UWIND and VWIND is collected from REMSS/CCMP
(<uri>https://www.remss.com/measurements/ccmp/</uri>, last access: 15 November 2021), and other variables are
collected from REMSS/TMI (1997–2012, <uri>http://www.remss.com/missions/tmi/</uri>, last access: 15 November 2021) and
REMSS/AMSR2 (2012–2020, <uri>http://www.remss.com/missions/amsr/</uri>, last access: 15 November 2021). We try to
choose physical variables in NOAA/CIRES with the same meaning as that in
REMSS, such as CWAT, CLOUD, RH, and VAPOR. Limited by these two datasets,
some variables can only find the closest match though they describe the
different characteristics in ocean–atmosphere cycle, such as PWAT and RAIN.</p></table-wrap-foot></table-wrap>

      <p id="d1e4296">To greatly increase the quantity of training data, we utilize the transfer
learning technique to train our model with long-term climate reanalysis data
and high-resolution remote sensing data progressively. These two datasets
both provide multivariate global gridded data. The reanalysis data are
supported by NOAA/CIRES (<uri>https://rda.ucar.edu/datasets/ds131.2/index.html</uri>, last access: 15 November 2021),
which is a 6-hourly multivariate global climate dataset from January 1850 to
December 2015 with 2<inline-formula><mml:math id="M181" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>. The remote sensing data are obtained from Remote
Sensing Systems (REMSS, <uri>http://www.remss.com/</uri>, last access: 15 November 2021), which is a daily multivariate
global climate dataset from December 1997 to August 2020, and the resolution is much
higher than reanalysis data with 0.25<inline-formula><mml:math id="M182" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>. According to our chosen
physical variables, we obtain the corresponding sub-datasets, and all the
variables are preprocessed and averaged monthly. The detailed dataset
descriptions are shown in Table 1. Note that we try to choose physical
variables in NOAA/CIRES with the same meaning as that in REMSS, such as
CWAT, CLOUD, RH and VAPOR. Some variables can only find the closest match
in these two datasets though they describe the slightly different
characteristics in ocean–atmosphere cycle, such as PWAT and RAIN.</p>
      <p id="d1e4324">In addition, we also collect the historical Niño 3, Niño 4, and
Niño 3.4 index data from the China Meteorological Administration National
Climate Centre (<uri>https://cmdp.ncc-cma.net/</uri>, last access: 15 November 2021). We pick up the records from
January 2014 to August 2020 for the result analysis of following experiments.</p>
      <p id="d1e4330">The major active region of ENSO is concentrated in the tropical Pacific, so
we crop the multivariate data with the region
(<inline-formula><mml:math id="M183" display="inline"><mml:mrow><mml:mn mathvariant="normal">40</mml:mn><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> N–<inline-formula><mml:math id="M184" display="inline"><mml:mrow><mml:mn mathvariant="normal">40</mml:mn><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> S,
<inline-formula><mml:math id="M185" display="inline"><mml:mrow><mml:mn mathvariant="normal">160</mml:mn><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> E–<inline-formula><mml:math id="M186" display="inline"><mml:mrow><mml:mn mathvariant="normal">90</mml:mn><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> W) as the geographic
boundaries of ENSO-ASC, which covers Niño 3 and Niño 4 regions. The
reanalysis data have the size (40, 55) for every single-month variable, and
the remote sensing data have the size (320, 440). In order to unify and
improve dataset quality, we use bicubic interpolation (Keys, 1981) to
enlarge the reanalysis data by 8<inline-formula><mml:math id="M187" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> magnification and a
soft-impute algorithm (Mazumder et al., 2010) to fill up missing values in
both datasets. We train the model first on the whole reanalysis dataset and
subsequently on the remote sensing dataset from December 1997 to December 2012 for
fine-tuning. The samples from January 2014 to August 2020 in the remote sensing dataset are
considered as the validation set. There is<?pagebreak page6986?> a 1-year gap between the fine-tuning
set and validation set to reduce the possible influence of oceanic memory.</p>
</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Experiment setting</title>
      <p id="d1e4396">We train and evaluate the ENSO-ASC on a high-performance server. Based on
our proposed model, some hyper-parameter settings are determined by referring to
the existing computing resources as following: <inline-formula><mml:math id="M188" display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M189" display="inline"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">16</mml:mn></mml:mrow></mml:math></inline-formula>,
which is the optimal parameter combination after extensive experiments (this
process has been ignored because it is not the focus in this paper).
Adjacency matrix <inline-formula><mml:math id="M190" display="inline"><mml:mi mathvariant="bold">A</mml:mi></mml:math></inline-formula> and corresponding Laplacian matrix <inline-formula><mml:math id="M191" display="inline"><mml:mover accent="true"><mml:mi mathvariant="bold">L</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover></mml:math></inline-formula> are
designed as in Sect. 3. All the following analyses are based on the stable
results through repeated experiments.</p>
      <p id="d1e4450">We evaluate the ENSO-ASC from three aspects. Firstly, according to our
proposed ENSO forecast formalization in Eqs. (5) to (7), there are
several factors that may influence the performance from the perspective of
model structure: the input sequence length <inline-formula><mml:math id="M192" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>, the multivariate coupler
<inline-formula><mml:math id="M193" display="inline"><mml:mrow><mml:mtext>coupler</mml:mtext><mml:mo>(</mml:mo><mml:mi mathvariant="normal" class="Radical">⚫</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, the attention weights <inline-formula><mml:math id="M194" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M195" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula>, and the
benefits of transfer training. We design some comparison experiments to
investigate the model performance and determine the optimal model structure.
A comparison with the other state-of-the-art models is also included.
Secondly, we evaluate the forecast skill of the ENSO-ASC from the
meteorological aspects according to Eqs. (5) to (7): the contributions of
different input physical variables <inline-formula><mml:math id="M196" display="inline"><mml:mi>V</mml:mi></mml:math></inline-formula> in the pre-designed coupling graph
<inline-formula><mml:math id="M197" display="inline"><mml:mi>G</mml:mi></mml:math></inline-formula>, the effective forecast lead month in IMS strategy, the forecast skill
with different start calendar month scm, and the spatial uncertainties in
longer-term forecasts. Finally, we forecast the real-world ENSO over the
validation period and compare our results with the observations.
<?xmltex \hack{\newpage}?></p>
</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Evaluation of model performance</title>
<sec id="Ch1.S4.SS3.SSS1">
  <label>4.3.1</label><title>Influence of the input sequence length</title>
      <p id="d1e4519">Input sequence length <inline-formula><mml:math id="M198" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> is very important for the forecasting model, due to
the rich spatial–temporal information contained in it. In general, the longer sequence
length <inline-formula><mml:math id="M199" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>, the better the ENSO forecast skill. However, a longer input
sequence will also increase the computational burden and raise the
requirements for data quantity and quality in the training and calculation of
complex deep learning networks, especially under such a high resolution of
our model. Therefore, the balance between forecast performance and
efficiency must be considered. We gradually increase the sequence length <inline-formula><mml:math id="M200" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>
to detect the changes in forecast skills. Figure 10 displays the results. As
the sequence length gradually increases, two metrics become better (larger).
When the sequence length is greater than 3 months, the growth rate slows
down. While the sequence length is less than 3 months, the forecast skill
increases rapidly.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F10" specific-use="star"><?xmltex \currentcnt{10}?><?xmltex \def\figurename{Figure}?><label>Figure 10</label><caption><p id="d1e4545">The performances of the ENSO-ASC when the input sequence length
increases under IMS forecast strategy.</p></caption>
            <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f10.png"/>

          </fig>

      <?pagebreak page6987?><p id="d1e4554">It is obvious that the increase in sequence length cannot lead to an
unlimited improvement in forecast skill. In ENSO-ASC, making predictions
with the previous 3 months' multivariate data is a more efficient choice. In
fact, lots of successful works imply that a climate deep learning model does
not require a longer input sequence to make skilful predictions, such as
using previous 2 continuous time-step data to estimate the intensity of
tropical cyclone (R. Zhang et al., 2019) and using previous 3-month ocean heat
content and wind to predict ENSO evolution (Ham et al., 2019). A long-term
temporal sequence contains strong trends and periodicities, but the
underlying chaos is more dominant, which seriously hinders the prediction.
The subsequent experiments will all apply the historical 3-month
multivariate sequence (<inline-formula><mml:math id="M201" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula>) as model input.
<?xmltex \hack{\newpage}?></p>
</sec>
<sec id="Ch1.S4.SS3.SSS2">
  <label>4.3.2</label><title>Benefit of the transfer learning</title>
      <p id="d1e4578">For the model training, we use the transfer learning to overcome the
insufficient sample challenge and obtain the optimally trained model. More
specifically, we first train the model on a reanalysis dataset with 90 epochs
and subsequently on a remote sensing dataset until total convergence (about
110 epochs). Here, 90 epochs are enough for training the ENSO-ASC on the
reanalysis dataset until convergence, because the interpolated reanalysis
data are smoother and lack details, which leads to easy training. In
order to verify the benefit of the transfer learning, we also make
comparative experiments by only training our model on a remote sensing
dataset. The training process needs more epochs (such as 200 epochs),
because the remote sensing dataset contains much more detailed high-level
climate information.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F11"><?xmltex \currentcnt{11}?><?xmltex \def\figurename{Figure}?><label>Figure 11</label><caption><p id="d1e4583">The loss changes when training with only reanalysis dataset (blue
line), with only remote sensing dataset (green line) and transfer learning
on these two datasets in order (black arrow and yellow line after 90
epochs).</p></caption>
            <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f11.png"/>

          </fig>

      <p id="d1e4592">The averaged loss changes for different training sets are depicted in Fig. 11. We can see that when training with the reanalysis dataset, the loss
drops quickly, while when training with the remote sensing dataset, the model
converges slowly and the loss is large. After using transfer learning, the
loss on the remote sensing dataset are improved at least 15 %, which
demonstrates that the systematic errors of ENSO-ASC are indeed corrected to
some extent.</p>
      <p id="d1e4596">Comparing with remote sensing dataset, training with reanalysis dataset
always yields a much smaller loss. It is due to the smoothness and
lack of details of the reanalysis dataset as mentioned above that to
the model can learn the characteristics more easily. However, the
high-resolution remote sensing dataset reflects the real-world status more
accurately, which contains more comprehensive and nonlinear details and
amplitudes under a high resolution. If we have efficient remote sensing
data, the forecast skill will be further improved.</p>
</sec>
<sec id="Ch1.S4.SS3.SSS3">
  <label>4.3.3</label><title>Effectiveness of the multivariate air–sea coupler</title>
      <p id="d1e4607">We subjectively incorporate a priori ENSO coupled interactions into the
graph-based multivariate coupler and select six critical physical variables as
the predictors of the ENSO-ASC. The formalization not only treats each
physical variable as a separate individual but emphasizes the nonlinear
interactions between them. However, it is not clear whether such graph
formalization is the reason for the improvement of ENSO forecast
performance. In order to validate the effectiveness of our proposed
formalization, we design two other deep learning couplers for ENSO forecast
with the same datasets and transfer learning and then compare the
performance with the ENSO-ASC.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F12" specific-use="star"><?xmltex \currentcnt{12}?><?xmltex \def\figurename{Figure}?><label>Figure 12</label><caption><p id="d1e4612">The performances of the ENSO-ASC when replacing the multivariate
air–sea coupler with other deep learning structures.</p></caption>
            <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f12.png"/>

          </fig>

      <p id="d1e4621">The first coupler replaces GCN with a dual-layer 3D-convolution block,
which treats all variables as a whole system and ignores the specific
directions and neighbor-orders in coupling interactions between them. The
second coupler just replaces the GCN with the concatenation of features from
multivariate encoders, which treats the multiple variables as the channel
stacking (or data overlay) and simply extracts global features of them
together (the cascaded multivariate features are propagated to the decoders
for every<?pagebreak page6988?> variable directly). The results are illustrated in Fig. 12.
Obviously, our graph formalization achieves the best performance with
measurements of SSIM and PSNR; the Conv3D coupler is slightly worse. The
results indicate that using a graph to simulate multivariate interactions is
a more reasonable approach, which can learn more ENSO-related dynamical
interactions and underlying physical processes than other formalizations.
Besides this, the comparative experiments also exhibit some inspiration:
when building a climate forecast deep learning network that incorporates
physical mechanisms, it is necessary to customize a suitable structure to
represent and reflect specific mechanisms mathematically.</p>
      <p id="d1e4625">We design a fully connected adjacency matrix <inline-formula><mml:math id="M202" display="inline"><mml:mi mathvariant="bold">A</mml:mi></mml:math></inline-formula> in a GCN coupler, which means
we consider that all physical variables have interactions with each other.
Conv3D also entirely extracts the features of all variables. Under these
circumstances, why does a GCN coupler have better performance than Conv3D coupler?
From the perspective of mathematics, GCN will consider the pair-wise
coupling between variables and learn the features of every coupled
interaction according to the hidden nonlinearity in samples individually as in Eq. (12). But the Conv3D coupler inherits the characteristics of the
global sharing and local connection in classic convolution, which rather
treats all variables equally and lack descriptions of the special
interactions.</p>
</sec>
<sec id="Ch1.S4.SS3.SSS4">
  <label>4.3.4</label><title>Effects of attention weights</title>
      <p id="d1e4643">We customize two attention weights in the model to dynamically represent the
effects of different temporal memories and multiple variables. Here, we
analyze the influences of two proposed self-supervised attention weights by
removing one of them from ENSO-ASC; the results are illustrated in Fig. 13.
The results suggest the prediction skill will decline when one of the
attention weights is removed. More specifically, the reduction of
performance is larger when the multivariate attention is removed for shorter
forecasts (less than about 9 months), and when the temporal attention is
removed for longer forecasts (more than about 15 months). This is because of
higher multivariate correlations and lower temporal non-stationarity in the
short term. But the temporal memory effects dominate the long-term
evolution.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F13" specific-use="star"><?xmltex \currentcnt{13}?><?xmltex \def\figurename{Figure}?><label>Figure 13</label><caption><p id="d1e4648">The performances of the ENSO-ASC when removing one of the
attention weights.</p></caption>
            <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f13.png"/>

          </fig>

      <p id="d1e4657">In fact, due to the self-supervised attentive weights <inline-formula><mml:math id="M203" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M204" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula>, though the multivariate graph and model structure are fixed, the forecast
skills will not change too much with the different start calendar months
and variable combinations (see Sect. 4.4). However, if these two weights
are unset, the model will not be able to distinguish the contributions of
multivariate oceanic memories in different forecast start months adaptively,
seriously misleading the forecasts.</p>
      <p id="d1e4675">Indeed, it may be a better choice for ENSO forecast to establish and filter
the optimal model for different start calendar months, forecast lead times,
and various predictors, but it also consumes more resources and time. These two attentive weights <inline-formula><mml:math id="M205" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M206" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula> can reasonably
prune the model within the acceptable range of prediction errors. In
operational forecasts, separate modeling for different scenarios can be
used to pursue higher accuracy and skills.</p>
</sec>
<sec id="Ch1.S4.SS3.SSS5">
  <label>4.3.5</label><title>Comparison with other state-of-the-art ENSO deep learning models</title>
      <p id="d1e4701">We compare the ENSO-ASC with other state-of-the-art data-driven ENSO
forecast models, including (1) convolutional neural networks
(encoder–decoder structure with 12 layers, which has the same trainable
layer number with our model); (2) long short-term memory networks (6 LSTM
layers and final fully connect layer); (3) a ConvLSTM network (CL-6 means a
6-layer structure, CL-12 means a 12-layer structure); and (4) a 3D-convolution
coupler to simulate multivariate interactions as mentioned above (Conv3D).
In order to ensure the fairness of the comparison, we utilize the same input
physical variables, training and validation datasets, and training criteria for
the above models, then train them via transfer learning with plenty of
epochs to achieve their optimal performances. Table 2 displays the
comparative results with 12-, 15-, and 18-month forecasts. In general, the forecast
models considering ENSO spatial–temporal correlations (e.g.,<?pagebreak page6989?> ConvLSTM,
Conv3D) outperform the basic deep learning models (e.g., CNN, LSTM), which
implies that the complicated network structures can mine the sophisticated
dependencies deeply hidden in long-term ENSO evolutions more effectively. As
the lead time increases, the performances of models gradually decrease.
However, the ENSO-ASC still maintains high accuracy and is always better
than other models with an improvement of about 5 %, which indicates the
superiority of our model.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T2"><?xmltex \currentcnt{2}?><label>Table 2</label><caption><p id="d1e4707">Performance comparisons with other state-of-the-art deep learning
models.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Model</oasis:entry>
         <oasis:entry rowsep="1" colname="col2">12-month</oasis:entry>
         <oasis:entry rowsep="1" colname="col3">15-month</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">18-month</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">SSIM<inline-formula><mml:math id="M207" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>PSNR</oasis:entry>
         <oasis:entry colname="col3">SSIM<inline-formula><mml:math id="M208" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>PSNR</oasis:entry>
         <oasis:entry colname="col4">SSIM<inline-formula><mml:math id="M209" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>PSNR</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">CNN</oasis:entry>
         <oasis:entry colname="col2">86.32<inline-formula><mml:math id="M210" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>17.47</oasis:entry>
         <oasis:entry colname="col3">83.97<inline-formula><mml:math id="M211" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>14.40</oasis:entry>
         <oasis:entry colname="col4">79.85<inline-formula><mml:math id="M212" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>12.20</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">LSTM</oasis:entry>
         <oasis:entry colname="col2">88.57<inline-formula><mml:math id="M213" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>18.08</oasis:entry>
         <oasis:entry colname="col3">84.19<inline-formula><mml:math id="M214" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>15.41</oasis:entry>
         <oasis:entry colname="col4">81.59<inline-formula><mml:math id="M215" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>13.58</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CL-6</oasis:entry>
         <oasis:entry colname="col2">88.70<inline-formula><mml:math id="M216" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>18.37</oasis:entry>
         <oasis:entry colname="col3">84.36<inline-formula><mml:math id="M217" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>16.25</oasis:entry>
         <oasis:entry colname="col4">81.73<inline-formula><mml:math id="M218" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>14.04</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CL-12</oasis:entry>
         <oasis:entry colname="col2">89.78<inline-formula><mml:math id="M219" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>19.45</oasis:entry>
         <oasis:entry colname="col3">84.74<inline-formula><mml:math id="M220" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>17.34</oasis:entry>
         <oasis:entry colname="col4">82.03<inline-formula><mml:math id="M221" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>15.37</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Conv3D</oasis:entry>
         <oasis:entry colname="col2">90.93<inline-formula><mml:math id="M222" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>20.16</oasis:entry>
         <oasis:entry colname="col3">85.59<inline-formula><mml:math id="M223" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>18.01</oasis:entry>
         <oasis:entry colname="col4">82.50<inline-formula><mml:math id="M224" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>15.98</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">ENSO-ASC</oasis:entry>
         <oasis:entry colname="col2">92.65<inline-formula><mml:math id="M225" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>22.05</oasis:entry>
         <oasis:entry colname="col3">90.31<inline-formula><mml:math id="M226" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>20.97</oasis:entry>
         <oasis:entry colname="col4">87.53<inline-formula><mml:math id="M227" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>17.17</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e4997">Considering the calculation ingredients of PSNR and SSIM according to
Eq. (11): the calculation of PSNR contains MSE, which is the metric for
individual grids, while SSIM measures the spatial characteristics and
distributions of two patterns from many correlation coefficients, which can
represent a measurement for evolution tendency and physical consistency to
some extent. Based on the above analysis, Table 2 also indicates that the
forecast results of ENSO-ASC exhibit an excellent physical consistency
beyond other models, the SSIM of which is much better, especially in the
longer lead time. In addition, the ENSO-ASC pays more attention to the
detailed spatial distributions, which is beneficial for the further analysis
of ENSO dynamical mechanisms.</p>
      <p id="d1e5001">On the other hand, the ENSO-ASC is the first attempt to forecast ENSO at
such a high resolution (0.25<inline-formula><mml:math id="M228" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>). Despite the difficulty of
training increasing, our model still achieves good results. Interestingly,
though the ENSO-ASC involves the most trainable parameters, its convergence
epoch is one-fifth on average of other forecast models.</p>
</sec>
</sec>
<sec id="Ch1.S4.SS4">
  <label>4.4</label><title>Analysis of ENSO forecast skill</title>
<sec id="Ch1.S4.SS4.SSS1">
  <label>4.4.1</label><title>Contributions of different predictors to the forecast skill</title>
      <p id="d1e5029">The superiority of our proposed model derives from the graph formalization,
and the special multivariate coupler can effectively express the processes
of synergies between multi-physical variables. From another perspective, the
improvement of the forecast skill is not only due to graph
formalization, but also the utilization of multiple variables highly
related to ENSO compared to using limited variables to predict ENSO as in
previous works. For ENSO forecasting, SST is definitely the most critical
predictor. Besides SST, other variables have different contributions to the
forecast results. Therefore, we design an ablation experiment by removing
one of predictors from our proposed model and detect the reduction of
forecast skill (Table 3 above). Meanwhile, we also add one extra
predictor (from surface air temperature, surface pressure, and ocean heat
content respectively) into our proposed model to investigate the improvement
of forecast skill (Table 3 below). Here, the input sequence length is still
set to 3.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T3" specific-use="star"><?xmltex \currentcnt{3}?><label>Table 3</label><caption><p id="d1e5035">Model performance when one existing variable is removed or one extra
variable is added.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Removed variable</oasis:entry>
         <oasis:entry rowsep="1" colname="col2">12-month</oasis:entry>
         <oasis:entry rowsep="1" colname="col3">15-month</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">18-month</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">SSIM<inline-formula><mml:math id="M229" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>PSNR</oasis:entry>
         <oasis:entry colname="col3">SSIM<inline-formula><mml:math id="M230" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>PSNR</oasis:entry>
         <oasis:entry colname="col4">SSIM<inline-formula><mml:math id="M231" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>PSNR</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M232" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">92.65<inline-formula><mml:math id="M233" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>22.05</oasis:entry>
         <oasis:entry colname="col3">90.31<inline-formula><mml:math id="M234" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>20.97</oasis:entry>
         <oasis:entry colname="col4">87.53<inline-formula><mml:math id="M235" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>18.17</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">RAIN</oasis:entry>
         <oasis:entry colname="col2">91.46<inline-formula><mml:math id="M236" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>21.34</oasis:entry>
         <oasis:entry colname="col3">88.74<inline-formula><mml:math id="M237" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>18.32</oasis:entry>
         <oasis:entry colname="col4">85.86<inline-formula><mml:math id="M238" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>17.35</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CLOUD</oasis:entry>
         <oasis:entry colname="col2">91.53<inline-formula><mml:math id="M239" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>21.65</oasis:entry>
         <oasis:entry colname="col3">88.81<inline-formula><mml:math id="M240" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>18.54</oasis:entry>
         <oasis:entry colname="col4">85.93<inline-formula><mml:math id="M241" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>16.16</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">VAPOR</oasis:entry>
         <oasis:entry colname="col2">91.52<inline-formula><mml:math id="M242" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>21.65</oasis:entry>
         <oasis:entry colname="col3">88.82<inline-formula><mml:math id="M243" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>18.53</oasis:entry>
         <oasis:entry colname="col4">85.92<inline-formula><mml:math id="M244" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>16.16</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">UWIND</oasis:entry>
         <oasis:entry colname="col2">90.08<inline-formula><mml:math id="M245" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>20.93</oasis:entry>
         <oasis:entry colname="col3">87.03<inline-formula><mml:math id="M246" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>17.81</oasis:entry>
         <oasis:entry colname="col4">83.72<inline-formula><mml:math id="M247" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>13.58</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">VWIND</oasis:entry>
         <oasis:entry colname="col2">91.47<inline-formula><mml:math id="M248" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>21.62</oasis:entry>
         <oasis:entry colname="col3">88.65<inline-formula><mml:math id="M249" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>18.42</oasis:entry>
         <oasis:entry colname="col4">85.31<inline-formula><mml:math id="M250" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>15.07</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Added variable</oasis:entry>
         <oasis:entry rowsep="1" colname="col2">12-month</oasis:entry>
         <oasis:entry rowsep="1" colname="col3">15-month</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">18-month</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">SSIM<inline-formula><mml:math id="M251" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>PSNR</oasis:entry>
         <oasis:entry colname="col3">SSIM<inline-formula><mml:math id="M252" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>PSNR</oasis:entry>
         <oasis:entry colname="col4">SSIM<inline-formula><mml:math id="M253" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>PSNR</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Surface pressure</oasis:entry>
         <oasis:entry colname="col2">92.74<inline-formula><mml:math id="M254" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>22.13</oasis:entry>
         <oasis:entry colname="col3">90.33<inline-formula><mml:math id="M255" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>20.99</oasis:entry>
         <oasis:entry colname="col4">87.64<inline-formula><mml:math id="M256" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>17.26</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Surface air temperature</oasis:entry>
         <oasis:entry colname="col2">92.75<inline-formula><mml:math id="M257" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>22.15</oasis:entry>
         <oasis:entry colname="col3">90.40<inline-formula><mml:math id="M258" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>21.07</oasis:entry>
         <oasis:entry colname="col4">87.71<inline-formula><mml:math id="M259" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>17.25</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Upper ocean heat content</oasis:entry>
         <oasis:entry colname="col2">92.98<inline-formula><mml:math id="M260" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>22.14</oasis:entry>
         <oasis:entry colname="col3">90.45<inline-formula><mml:math id="M261" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>21.10</oasis:entry>
         <oasis:entry colname="col4">87.79<inline-formula><mml:math id="M262" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>17.34</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e5488">Table 3 (above) shows that when a variable is removed from the input of the
deep learning model, the ENSO forecast skill will be reduced. More
specifically, when the zonal wind speed (UWIND) is removed, the reduction is
the largest. From the perspective of ENSO physical mechanism, zonal wind
anomalies (ZWAs) always play a necessary role and are even considered as the
co-trigger or driver of ENSO events. As an atmospheric variable, ZWA often
gives a direct feedback on oceanic varieties with a shorter response time
than oceanic memory. ENSO-ASC uses historical 3-month<?pagebreak page6990?> multivariate data to
predict ENSO evolution, which is quite a short sequence length. Under such
sequence length, wind speed (including <inline-formula><mml:math id="M263" display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> wind and <inline-formula><mml:math id="M264" display="inline"><mml:mi>v</mml:mi></mml:math></inline-formula> wind) has a relatively
high correlation with SST. In addition, RAIN is another variable that
slightly affects the forecast. This is because the precipitation process has
a straightforward contact with the sea surface, and the energy transfer is
easier.</p>
      <p id="d1e5506">Table 3 indicates that the model performance improves a little when
adding surface air temperature, surface pressure, and ocean heat content into the
multivariate coupler. This is because the multivariate graph with
existing variables in the ENSO-ASC can almost describe a relatively complete
energy loop in the Walker circulation, so the effects of the extra added
variables to the ENSO forecasts are not obvious. It is worth noting that the
input sequence length should be longer when feeding the ocean heat content
into the multivariate coupler, because this predictor has long memory
(Ham et al., 2019; McPhaden, 2003; Jin, 1997; Meinen and McPhaden, 2000).
However, as the input sequence length varies from 3 to 9 months, the
forecast skills of ENSO-ASC have not actually changed much. This is mainly
because the global spatial teleconnections and temporal lagged
correlations by the Walker circulation and ocean waves (such as Kelvin and
Rossby waves) (Exarchou et al., 2021; Dommenget et al., 2006) are not
caught in the model, the input region of which mainly covers the equatorial
Pacific. In addition, the model contains only one long memory predictor
besides SST.</p>
      <p id="d1e5509">Among the three extra added physical variables, the upper ocean heat content
is a very concerning variable, which can reflect the vertical and horizontal
propagations of ocean waves and help interpret the dynamical mechanisms.
Therefore, we conduct the comparison via two modified ENSO-ASC models with the
same output of SST <inline-formula><mml:math id="M265" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> <inline-formula><mml:math id="M266" display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> wind, <inline-formula><mml:math id="M267" display="inline"><mml:mi>v</mml:mi></mml:math></inline-formula> wind, rain, cloud, and vapor, while with
the different input. One uses upper ocean heat content <inline-formula><mml:math id="M268" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> <inline-formula><mml:math id="M269" display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> wind, <inline-formula><mml:math id="M270" display="inline"><mml:mi>v</mml:mi></mml:math></inline-formula> wind,
rain, cloud, and vapor, marked as EXAM; another uses SST <inline-formula><mml:math id="M271" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula>
<inline-formula><mml:math id="M272" display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> wind, <inline-formula><mml:math id="M273" display="inline"><mml:mi>v</mml:mi></mml:math></inline-formula> wind, rain, cloud, and vapor, marked as CTRL. The results
are shown in Table 4.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T4" specific-use="star"><?xmltex \currentcnt{4}?><label>Table 4</label><caption><p id="d1e5579">Model performance comparison when using upper ocean heat content to
replace SST in the input.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Model paradigm</oasis:entry>
         <oasis:entry rowsep="1" colname="col2">12-month</oasis:entry>
         <oasis:entry rowsep="1" colname="col3">15-month</oasis:entry>
         <oasis:entry rowsep="1" colname="col4">18-month</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">SSIM<inline-formula><mml:math id="M277" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>PSNR</oasis:entry>
         <oasis:entry colname="col3">SSIM<inline-formula><mml:math id="M278" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>PSNR</oasis:entry>
         <oasis:entry colname="col4">SSIM<inline-formula><mml:math id="M279" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>PSNR</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">CTRL: SST <inline-formula><mml:math id="M280" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> others <inline-formula><mml:math id="M281" display="inline"><mml:mo>→</mml:mo></mml:math></inline-formula><?xmltex \hack{\hfill\break}?>SST <inline-formula><mml:math id="M282" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> others</oasis:entry>
         <oasis:entry colname="col2">92.65<inline-formula><mml:math id="M283" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>22.05</oasis:entry>
         <oasis:entry colname="col3">90.31<inline-formula><mml:math id="M284" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>20.97</oasis:entry>
         <oasis:entry colname="col4">87.53<inline-formula><mml:math id="M285" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>18.17</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">EXAM: upper ocean heat content <inline-formula><mml:math id="M286" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> others <inline-formula><mml:math id="M287" display="inline"><mml:mo>→</mml:mo></mml:math></inline-formula><?xmltex \hack{\hfill\break}?>SST <inline-formula><mml:math id="M288" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> others</oasis:entry>
         <oasis:entry colname="col2">90.96<inline-formula><mml:math id="M289" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>20.87</oasis:entry>
         <oasis:entry colname="col3">88.45<inline-formula><mml:math id="M290" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>18.23</oasis:entry>
         <oasis:entry colname="col4">84.76<inline-formula><mml:math id="M291" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula>14.90</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d1e5582">Note: Model paradigm represents the input and the output for the ENSO-ASC,
where <inline-formula><mml:math id="M274" display="inline"><mml:mo>→</mml:mo></mml:math></inline-formula> means “forecast”. “Others” represents five variables,
including <inline-formula><mml:math id="M275" display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> wind, <inline-formula><mml:math id="M276" display="inline"><mml:mi>v</mml:mi></mml:math></inline-formula> wind, rain, cloud, and vapor. The first row is the
control experiment, which is the same with the result in Table 3, and the
second row is the examined experiment, which replaces SST by the upper ocean
heat content in the model input.</p></table-wrap-foot></table-wrap>

      <p id="d1e5792">The forecast skill of EXAM is slightly lower than CTRL.
The upper ocean heat content is the average of the oceanic temperature from the sea surface to the upper 300 m. When using it as a predictor to forecast SST, our
model will extract the features of oceanic temperature not only from sea
surface but also from the deeper ocean, which inevitably introduces more noise.
This may be a reason for the above result. Therefore, we still use SST
instead of the upper ocean heat content as the key predictor which would
bring higher forecast skills.</p>
      <p id="d1e5795">In the subsequent experiments, the model will use the chosen 6 variables
(SST, <inline-formula><mml:math id="M292" display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> wind, <inline-formula><mml:math id="M293" display="inline"><mml:mi>v</mml:mi></mml:math></inline-formula> wind, rain, cloud, and vapor), and the input sequence length
is set to 3.</p>
</sec>
<sec id="Ch1.S4.SS4.SSS2">
  <label>4.4.2</label><title>Analysis of effective forecast lead month</title>
      <p id="d1e5820">The accuracy of long-term prediction is the most crucial issue for
meteorological research. In ENSO events, though the periodicity dominates the
amplitude, the intrinsic intensity and duration often induce large
uncertainties and forecast errors. Therefore, over the validation period, we
make predictions with multiple lead times and calculate the corresponding
Niño indexes from the forecasted SST patterns to investigate the
effective forecast lead month of our model. The correlations between
forecasted Niño indexes and the official records are depicted in Fig. 14. As the lead time gradually increases to 24 months, the correlation skill
slowly decreases. It is worth noting that when the lead time is from 10 to
13 months, the reduction of the forecast skills slows down a little. This is
because the periodicity in ENSO events becomes stronger after a 1-year
iteration in IMS strategy. These results demonstrate that the ENSO-ASC can
provide reliable predictions up to at least 18 to 20 months on average (with
correlations over 0.5). Within a 6-month lead time, the<?pagebreak page6991?> correlation skill is
over about 0.78, and from a 6- to 12-month lead time, correlation skill is
over about 0.65.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F14"><?xmltex \currentcnt{14}?><?xmltex \def\figurename{Figure}?><label>Figure 14</label><caption><p id="d1e5825">The correlation skills between the forecast results of the
ENSO-ASC and real-world observations on three Niño indexes with the
forecast lead time increasing over the validation period.</p></caption>
            <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f14.png"/>

          </fig>

      <p id="d1e5834">In addition, the forecast skills for the Niño 3 index and Niño 3.4 index
are a little higher than that of Niño 4 index. This indicates that our
model has higher forecast skill for the EP-El Niño events (the active
area of SST anomalies is mainly over the eastern tropical Pacific Ocean)
than the CP-El Niño events (the active area of SST anomalies is mainly
over the western and central tropical Pacific Ocean). This is because
the input area of the model mainly covers the entire tropical Pacific,
which can be considered as the sensitive area for EP-El Niño events and
is more favorable for the prediction of EP-El Niño events. As for the
prediction of CP-El Niño events, extratropical Pacific or other oceanic
regions may have stronger impacts on the western–central equatorial Pacific
(Park et al., 2018).</p>
</sec>
<sec id="Ch1.S4.SS4.SSS3">
  <label>4.4.3</label><title>Temporal persistence barrier with different start calendar months</title>
      <p id="d1e5845">Deep learning models can extend the effective lead time of ENSO forecasts,
which means it can raise the upper limitation of ENSO prediction to some
extent. From the perspective of IMS strategy, if a well-trained model can
predict next-month SST perfectly (in other words, with a very low prediction
error), the model can iterate a lot theoretically. However, our proposed
model is affected by a variety of factors, which leads to performance
degradation.</p>
      <p id="d1e5848">One of the disadvantages in IMS strategy is that once a relatively large
forecast error shows up in a certain iteration, such a forecast error will be
continuously amplified in subsequent iterations. In ENSO forecasting, such a
forecast skill decline is regarded as a persistence barrier and usually occurs
in spring (i.e., spring predictability barrier, SPB) (Webster, 1995; Zheng
and Zhu, 2010). SPB limits the long-term forecast skill in not only
numerical models but some other statistical models (Kirtman et al., 2001).
For further investigation into the performance degradation, we firstly make
continuous predictions over the validation period from different start
calendar months with different lead times and then calculate the
correlations between the calculated Niño 3.4 index with the official
records. Figure 15 shows the results.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F15" specific-use="star"><?xmltex \currentcnt{15}?><?xmltex \def\figurename{Figure}?><label>Figure 15</label><caption><p id="d1e5853">The correlation skill heat map between the forecast results of
the ENSO-ASC and real-world observations on Niño 3.4 index with different
forecast start months over the validation period. The hatching cells
represent the correlations exceed 0.5, and the white numbers on the cells
mean the calendar months.</p></caption>
            <?xmltex \igopts{width=355.659449pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f15.png"/>

          </fig>

      <p id="d1e5863">In Fig. 15, the darker the cells' color, the higher the correlations between
the forecast and observation, the higher the forecast skill. The hatching
cells represent that the correlations exceed 0.5. Overall, making ENSO
forecast with start calendar months MAM (March, April, and May) is not very
reliable, while the long-term forecast of ENSO is more accurate with start
months JAS (July, August, and September). In addition, there exist two
obvious color change zones among all cells, which means the
correlations drop significantly in such zones (cell color becomes lighter),
and both of them occur in the months JJA (June, July, and August) depicted as
the white numbers on the cells. The first zone reduces the correlations by
about 0.03, and the second zone makes the reduction by about 0.06. It
demonstrates that when making forecasts through the months JJA, the ENSO
predictions tend to be much less successful. This is why the model exhibits
more skilful forecasts with the start months JAS, which avoids forecasting
through the months JJA as much as possible and preserves more accurate
features during iterations, resulting in a relatively long and efficient lead
time. Analogous to SPB in traditional ENSO forecasting, our proposed ENSO-ASC
has a forecast persistence barrier in boreal summer (JJA). This may be
because the real-world dataset contains more frequency CP-El Niño
samples after 1990s (Kao<?pagebreak page6992?> and Yu, 2009; Kug et al., 2009), which are
significantly impacted by the summer predictability barrier (Ren et al., 2019;
Ren et al., 2016). At the same time, it also implies that there are still
forecast obstacles that need to be circumvented in the ENSO deep learning forecast
model, and more unknown key factors need to be considered and explored, such as
more variables, larger input regions, more complex mechanisms, etc. Great progress will be made by building deep learning models based on prior
meteorological knowledge in the future.</p>
</sec>
<sec id="Ch1.S4.SS4.SSS4">
  <label>4.4.4</label><title>Spatial uncertainties with a longer lead time</title>
      <p id="d1e5874">In ENSO forecasts, the areas where the forecast uncertainties occur are
usually not randomly distributed, and such areas should be given more
attention in operational target observation. Over the validation period, we
make 12-month and 18-month forecasts and then compare the forecast results
with observations. More specifically, we calculate covariance between
forecast sequence <inline-formula><mml:math id="M294" display="inline"><mml:mover accent="true"><mml:mi>s</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover></mml:math></inline-formula> and observation <inline-formula><mml:math id="M295" display="inline"><mml:mi>s</mml:mi></mml:math></inline-formula> for every grid point and
combine them as a spatial heat map.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F16" specific-use="star"><?xmltex \currentcnt{16}?><?xmltex \def\figurename{Figure}?><label>Figure 16</label><caption><p id="d1e5896">The spatial covariance heat map between the forecast results of
the ENSO-ASC and real-world observations with 12 and 18 lead months over the
validation period.</p></caption>
            <?xmltex \igopts{width=355.659449pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f16.png"/>

          </fig>

      <p id="d1e5905">The results are shown in Fig. 16. The spatial uncertainties first show up
over the western equatorial Pacific, and then as the lead time increases, the
uncertainty area gradually expands eastward to the central equatorial
Pacific. It indicates that the Niño 3 and Niño 3.4 regions both have
very high forecasting skills with a short forecast lead time (See Fig. 14 in
12-month forecast), while the predictability for the central equatorial
Pacific gradually drops with a longer lead time, which leads to a rapid
reduction of forecast skill for Niño 3.4 and Niño 4 regions shown as
a 15-month forecast in Fig. 14. This reminds us that the areas with larger
forecast uncertainties should be observed using a higher frequency. Besides this,
another possible reason is that the multivariate input region is confined to
the Pacific, but the ocean–atmosphere coupling interactions in the western
tropical Pacific can be profoundly influenced by extratropical Pacific areas
and other ocean basins as mentioned above. Therefore, our proposed model has
relatively weak ability to capture the development of SST over the
western–central equatorial Pacific.</p>
</sec>
</sec>
<sec id="Ch1.S4.SS5">
  <label>4.5</label><title>Simulation of the real-world ENSO events</title>
      <p id="d1e5918">Since the 21st century, the occurrences of ENSO are more and more
frequent. In particular, the duration and intensity of ENSO have largely
changed. For example, many numerical climate models failed to forecast the
2015–2016 super El Niño. We simulate several ENSO events during the
validation period and compare the forecast results with real-world
observations. As mentioned above, wind (<inline-formula><mml:math id="M296" display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> wind and <inline-formula><mml:math id="M297" display="inline"><mml:mi>v</mml:mi></mml:math></inline-formula> wind) is also a
relatively important and sensitive predictor in the ENSO-ASC for ENSO
forecasts. Therefore, we make long-term forecasts and mainly trace the
evolutions of SST and wind (<inline-formula><mml:math id="M298" display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> wind and <inline-formula><mml:math id="M299" display="inline"><mml:mi>v</mml:mi></mml:math></inline-formula> wind). Note that all of the
following patterns describe the evolutions of SST and wind anomalies by
subtracting the climatology (climate mean state, the monthly averaged SST
and wind from 1981 to 2010) of that month from the forecasted SST and wind
patterns.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F17" specific-use="star"><?xmltex \currentcnt{17}?><?xmltex \def\figurename{Figure}?><label>Figure 17</label><caption><p id="d1e5951">The growth phase of SST and wind anomalies in the 2015–2016 super
Niño event from April 2015 to June 2015. Panels <bold>(a)</bold>–<bold>(c)</bold> show the forecast results of the
ENSO-ASC and <bold>(d)</bold>–<bold>(f)</bold> show real-world observations.</p></caption>
          <?xmltex \igopts{width=469.470472pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f17.png"/>

        </fig>

      <p id="d1e5972">Figure 17 displays the evolutions of SST and wind anomalies in the growth phase
of the 2015–2016 super El Niño event from April  to June 2015, where (a)–(c)
sub-figures are forecasts and (d)–(f) sub-figures are observations. Figure 18
displays the peak phase from September 2015 to February 2016, where (a)–(f) sub-figures are
forecasts and (g)–(l) sub-figures are observations. These two results are both
with the forecast start time of January 2015. During the growth phase, the warming
SST anomalies first show up over the eastern tropical Pacific Ocean, which
reduce the east–west gradient of SST. Meanwhile, the westerly wind
anomalies over the western–central equatorial Pacific further enhance the
SST anomalies over the central–eastern equatorial Pacific and weaken the Walker
circulation (Fig. 17a–c). The SST and wind anomalies trigger the Bjerknes
positive feedback together, which causes SST anomalies to be continuously
amplified. During the peak phase, in addition to the<?pagebreak page6993?> local evolutions of the
equatorial Pacific SST anomalies, there are obvious warm SST anomalies over
the northeast subtropical Pacific near Baja, California, induced by the
extratropical atmospheric varieties (Yu et al., 2010; Yu and Kim, 2011),
which gradually propagate southwestward and merge with the warm SST
anomalies over the central equatorial Pacific (Fig. 18a–d). In conclusion,
the ENSO-ASC can track the large-scale oceanic–atmospheric varieties
steadily and can successfully predict the ENSO with strong intensity and
long duration, while many dynamic or statistical models fail. At the same
time, our proposed model makes the prediction at the beginning of the
calendar year and produces a quite low prediction error, which demonstrates
that the model can overcome or eliminate the negative impacts of SPB to some
extent.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F18" specific-use="star"><?xmltex \currentcnt{18}?><?xmltex \def\figurename{Figure}?><label>Figure 18</label><caption><p id="d1e5978">The peak phase of SST and wind anomalies in the 2015–2016 super
Niño event from September 2015 to February 2016. Panels <bold>(a)</bold>–<bold>(f)</bold> show the forecast results of the
ENSO-ASC and <bold>(g)</bold>–<bold>(l)</bold> show real-world observations.</p></caption>
          <?xmltex \igopts{width=469.470472pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f18.png"/>

        </fig>

      <?xmltex \floatpos{t}?><fig id="Ch1.F19" specific-use="star"><?xmltex \currentcnt{19}?><?xmltex \def\figurename{Figure}?><label>Figure 19</label><caption><p id="d1e6001">The same with Fig. 17 but for the growth phase of SST and wind
anomalies of 2017 weak La Niña event from September  to November 2017.</p></caption>
          <?xmltex \igopts{width=469.470472pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f19.png"/>

        </fig>

      <p id="d1e6010">Besides the super El Niño event, the ENSO-ASC also has high simulation
capabilities for weak nonlinear unstable evolutions. In reality, neutral or
weak events actually account for most of the time. Judging from the saliency
of the extracted features, neutral or weak events may contain more “mediocre
and fuzzy” characteristics, which lead to some difficulties in accurately grasping their
meta features during evolutions. For example, it is much easier
to overestimate or underestimate their intensities. Therefore, we chose a
hindcast over the validation period. Figure 19 shows the peak phase of a
weak La Niña event from September to November 2017 with the forecast start time
of June 2016, where (a)–(c) sub-figures are forecasts and (d)–(f) sub-figures are
observations. From its evolution, there are negative SST anomalies over the
eastern equatorial Pacific and easterly wind anomalies in the western
tropical Pacific Ocean, which will enhance the Walker circulation. In
addition, Bjerknes positive feedback is the dominant factor favoring the
rapid anomaly growth in this simulation.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F20" specific-use="star"><?xmltex \currentcnt{20}?><?xmltex \def\figurename{Figure}?><label>Figure 20</label><caption><p id="d1e6015"> The same with Fig. 17 but for the neural SST and wind anomalies
evolutions in January to March 2020.</p></caption>
          <?xmltex \igopts{width=469.470472pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/14/6977/2021/gmd-14-6977-2021-f20.png"/>

        </fig>

      <p id="d1e6024">Another ENSO forecast limitation is to predict the neural year as the event
of El Niño (or La Niña), which is also<?pagebreak page6994?> known as the false alarm rate.
Figure 20 displays the neutral event from January to March 2020 with the forecast
start time of January 2019 (its ONI has not yet reached the intensity of El
Niño). After calculating the corresponding Niño index, we can
determine that the ENSO-ASC can accurately avoid the false alarm and
credibly reflect the real magnitude of the development of important
variables such as SST. We have also verified the case in 2014 and the result
is consistent with the facts. Many operational centers erroneously predicted
that an El Niño would develop in 2014, but it did not.</p>
      <p id="d1e6028">The forecasted SST and wind anomaly patterns have a consistent intensity and
tendency with the observations. Our model can achieve better forecast
skills in a variety of situations because our proposed deep learning
coupler comprehensively absorbs the sophisticated oceanic and atmospheric
varieties, and its deep and intricate structure can almost simulate the
air–sea energy exchange simultaneously, while traditional geoscience fluid
programming in numerical climate models usually applies interval flux
exchange and parameterized approximation for unknown mechanisms, blocking
the continuous interactions.</p>
</sec>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <label>5</label><title>Discussions and conclusions</title>
      <p id="d1e6041">ENSO is a very complicated air–sea coupled phenomenon, the life cycle of
which is closely related to the large-scale<?pagebreak page6995?> nonlinear interactions between
various oceanic and atmospheric processes. ENSO is one of the most critical
factors that cause extreme climatic and socioeconomic effects. Therefore,
meteorological researchers are starting to find more accurate and less
consuming data-driven models to forecast ENSO, especially deep learning
methods. There are already many successful attempts that have extended the
effective forecast lead time of ENSO up to 1.5 years. They all
extract the rich spatial–temporal information deeply hidden in the
historical geoscience data.</p>
      <p id="d1e6044">However, most of the models use limited variables or even a single variable to
predict ENSO, ignoring the coupling multivariate interactions in ENSO
events. At the same time, the generic ENSO deep learning forecast models
seem to have reached the performance bottleneck, which means deeper or more
complex model structures can neither extend the effective forecast lead time
nor provide a more detailed description for dynamical evolutions. In order
to overcome these two barriers, we subjectively incorporate a priori ENSO
knowledge into the deep learning formalization and derive hand-crafted
features into models to make predictions.<?pagebreak page6996?> More specifically, considering the
multivariate coupling in the Walker circulation related to ENSO amplitudes,
we select six indispensable physical variables and focus on the synergies
between them in ENSO events. Instead of simple variable stacking, we treat
them as separate individuals and ingeniously formulate the nonlinear
interactions between them on a graph. Based on such formalization, we design
a multivariate air–sea coupler (ASC) by graph convolution mathematically,
which can mine the coupling interactions between every physical variable in
pairs and perform the multivariate synergies simultaneously.</p>
      <p id="d1e6047">We then implement an ENSO deep learning forecast model (ENSO-ASC) with the
encoder–coupler–decoder structure, and two self-supervised attention weights
are also designed. The multivariate time-series data are firstly propagated
to the encoder to extract spatial–temporal features respectively. Then the
multivariate features are aggregated together for interactions in the
multivariate air–sea coupler. Finally, the coupled features are divided
separately, and the corresponding feature of a certain variable is restored
to forecast patterns in the decoder. IMS strategy is applied to
make predictions, which is a more stable forecasting method. We use transfer
learning to provide a better model initialization and overcome the problem
of a lack of observation samples. The model is first trained on the reanalysis
dataset and subsequently on the remote sensing dataset. After constructing the
model structure, we design extensive experiments to investigate the model
performance and ENSO forecast skill. Several successful simulations in the
validation period are also provided. Some conclusions can be summarized as
follows:
<list list-type="order"><list-item>
      <p id="d1e6052">According to the forecast model described in Eqs. (5) to  (7), we adjust
the model settings of the input sequence length <inline-formula><mml:math id="M300" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>, the multivariate
coupler <inline-formula><mml:math id="M301" display="inline"><mml:mrow><mml:mtext>coupler</mml:mtext><mml:mo>(</mml:mo><mml:mi mathvariant="normal" class="Radical">⚫</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, the attention weights <inline-formula><mml:math id="M302" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M303" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula>, and the transfer training and then investigate the performance changes. The
optimal input sequence length of the model is 3 according to the trade-off
between forecast skill and computational resource consumption. It implies that
the ENSO deep learning forecast model does not need a relatively
long-sequence input. Although the long sequence contains a rich tendency and
periodicity of ENSO events, the meteorological chaos is more dominant, which
seriously hinders the prediction. Transfer learning is a practical method.
Training the model on the reanalysis dataset and subsequently on the remote
sensing dataset can effectively reduce the systematic forecast errors by at
least 15 %. When replacing the graph-based multivariate air–sea coupler in
ENSO forecast model with other deep learning structures, the forecast skill
drops obviously. This demonstrates that the graph formalization is a
powerful expression for simulating underlying air–sea interactions, and a
corresponding the ENSO forecast model with novel multivariate air–sea coupler
can forecast more realistic meteorological details. This also demonstrates
that it is critical to choose suitable deep learning structures to
incorporate prior climate mechanisms for improving forecast skills. The
self-supervised attention weights are also promising tools to grasp the
contributions of different predictors and memory varieties of different
forecast start calendar months. In addition, in comparison with other
state-of-the-art ENSO forecast models, the ENSO-ASC achieves at least 5 %
improvement in SSIM and PSNR for long-term forecasts.</p></list-item><list-item>
      <p id="d1e6091">By performing the ablation experiment, the forecast skill drops
significantly when removing the zonal wind from the model input, which is
because it is a co-trigger of Bjerknes positive feedback in ENSO events and
gives a direct feedback on oceanic varieties with a shorter lag time. Adding
extra predictors can slightly improve the performance; this is because
the existing multivariate graph can almost describe a relatively complete
energy loop in the Walker circulation. By tracing the upper limitation of
forecast lead time, the ENSO-ASC can provide a reliable high-resolution ENSO
forecast up to at least 18 to 20 months on average judging from the
correlation skills of Niño indexes greater than 0.5. Within a 6-month lead
time, the correlation skill is over about 0.78, and from a 6- to 12-month lead
time, correlation skill is over about 0.65. The corresponding correlation
skills decline slowly from a 10- to 13-month lead time and then declined
rapidly. This is because of the stronger periodicity in ENSO events after a
1-year iteration of IMS strategy. At the same time, the different forecast
start calendar months also influence the forecast skills. The temporal heat
map analysis shows that an obvious skill reduction usually shows up in JJA
and produces a boreal summer persistence barrier in our model. In addition,
from the spatial uncertainty heat map, our model exhibits larger forecast
uncertainties over the western–central equatorial Pacific. Such
spatial–temporal predictability barriers are widely present in dynamic or
statistical models, but the ENSO-ASC effectively prolongs the forecast lead
time and reduces corresponding uncertainties to a large extent.</p></list-item><list-item>
      <p id="d1e6095">Some successful simulations exhibit the effectiveness and superiority of the
ENSO-ASC. We make real-world ENSO simulations during the validation period
by tracing the evolutions of SST and wind anomalies (<inline-formula><mml:math id="M304" display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> wind and <inline-formula><mml:math id="M305" display="inline"><mml:mi>v</mml:mi></mml:math></inline-formula> wind). In
the forecasted El Niño (La Niña) events, the sea–air patterns
clearly display that the positive (negative) SST anomalies first show up
over the eastern equatorial Pacific with westerly (easterly) wind anomalies
in the western–central tropical Pacific Ocean, which induces the Bjerknes
positive feedback mechanism. As for the 2015–2016 super El Niño, the
ENSO-ASC captures the strong evolutions of SST anomalies over the<?pagebreak page6997?> northeast
subtropical Pacific in the peak phase and successfully predicts its
very high intensity and very long duration, while many dynamic or
statistical models fail. ENSO-ASC can also credibly reflect the real
situation and reduce the false alarm rate of ENSO such as that in 2014. In
conclusion, our model can track the large-scale oceanic and atmospheric
varieties and simulate the air–sea energy exchange simultaneously. It
demonstrates that the multivariate air–sea coupler effectively simulates the
oscillations of the Walker circulation and reveals more complex dynamic
mechanisms such as Bjerknes positive feedback.</p></list-item></list>
The extensive experiments demonstrate that the ENSO forecast model with a
multivariate air–sea coupler (ENSO-ASC) is a powerful tool for analysis of
ENSO-related complex mechanisms. Meteorological research does not only
pursue skilful models and accurate forecasts but also requires a comprehensive
understanding of the potential dynamical mechanisms. In the future, we will
extend our model to more global physical variables with informative vertical
layers, such as the thermocline depth, and the ocean temperature heat
content, to explore the global spatial remote teleconnections, temporal
lagged correlations, and the optimal precursor etc.</p>
</sec>

      
      </body>
    <back><notes notes-type="codeavailability"><title>Code availability</title>

      <p id="d1e6117">The source code of the ENSO-ASC is available in the GitHub repository:
<uri>https://github.com/BrunoQin/ENSO-ASC</uri> (last access: 14 August 2021), which is
implemented using Python 3.6 (or 3.7) and CUDA 11.0. The present version of
ENSO-ASC 1.0.0 is available at <ext-link xlink:href="https://doi.org/10.5281/zenodo.5081793" ext-link-type="DOI">10.5281/zenodo.5081793</ext-link> (Qin, 2021a).</p>
  </notes><notes notes-type="dataavailability"><title>Data availability</title>

      <p id="d1e6129">Thanks to NOAA/CIRES, Remote Sensing Systems, and the China Meteorological
Administration for providing the historical geoscience data and analysis
tools (<uri>https://rda.ucar.edu/</uri> (Compo et al., 2015); <uri>http://www.remss.com/</uri> (Wentz et al., 2015, 2014, 2013);
<uri>https://cmdp.ncc-cma.net</uri>, Behringer and  Xue, 2004, last access: 8 July 2021). The related
training and validation datasets can be also accessed at
<ext-link xlink:href="https://doi.org/10.5281/zenodo.5179867" ext-link-type="DOI">10.5281/zenodo.5179867</ext-link> (Qin, 2021b).</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d1e6147">All authors designed the experiments and carried them out. BQ developed the model
code and performed the simulations. BQ and SY prepared the
paper with contributions from all co-authors.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d1e6153">The contact author has declared that neither they nor their co-authors have any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d1e6159">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p>
  </notes><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d1e6165">This study is supported in part by the National Key Research and Development Program of China under grant 2020YFA0608002, in part by the National Natural Science Foundation of China under Grant 42075141, in part by the Key Project Fund of Shanghai 2020 “Science and Technology Innovation Action Plan” for Social Development under grant 20dz1200702, and in part by the Fundamental Research Funds for the Central Universities under grant 13502150039/003.</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d1e6171">This paper was edited by Xiaomeng Huang and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bib1"><label>1</label><?label 1?><mixed-citation>
Balmaseda, M. A., Davey, M. K., and Anderson, D. L.: Decadal and seasonal
dependence of ENSO prediction skill, J. Climate, 8, 2705–2715,
1995.</mixed-citation></ref>
      <ref id="bib1.bib2"><label>2</label><?label 1?><mixed-citation>Barnston, A. G., Tippett, M. K., L'Heureux, M. L., Li, S., and DeWitt, D.
G.: Skill of real-time seasonal ENSO model predictions during 2002–11: Is
our capability increasing?, B. Am. Meteorol. Soc.,
93, 631–651, 2012.</mixed-citation></ref>
      <ref id="bib1.bib3"><label>3</label><?label 1?><mixed-citation>Bayr, T., Dommenget, D., and Latif, M.: Walker circulation controls ENSO
atmospheric feedbacks in uncoupled and coupled climate model simulations,
Clim. Dynam., 54, 2831–2846, <ext-link xlink:href="https://doi.org/10.1007/s00382-020-05152-2" ext-link-type="DOI">10.1007/s00382-020-05152-2</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib4"><label>4</label><?label 1?><mixed-citation>
Behringer, D. W. and  Xue, Y.: Evaluation of the global ocean data assimilation system at NCEP: The Pacific Ocean, in: Proc. Eighth Symp. on Integrated Observing and Assimilation Systems for Atmosphere, Oceans, and Land Surface, 2004.</mixed-citation></ref>
      <ref id="bib1.bib5"><label>5</label><?label 1?><mixed-citation>Bellenger, H., Guilyardi, É., Leloup, J., Lengaigne, M., and Vialard,
J.: ENSO representation in climate models: From CMIP3 to CMIP5, Clim.
Dynam., 42, 1999–2018, 2014.</mixed-citation></ref>
      <ref id="bib1.bib6"><label>6</label><?label 1?><mixed-citation>Bjerknes, J.: Atmospheric teleconnections from the equatorial Pacific,
Mon. Weather Rev., 97, 163–172, 1969.</mixed-citation></ref>
      <ref id="bib1.bib7"><label>7</label><?label 1?><mixed-citation>Broni-Bedaiko, C., Katsriku, F. A., Unemi, T., Atsumi, M., Abdulai, J.-D.,
Shinomiya, N., and Owusu, E.: El Niño-Southern Oscillation forecasting
using complex networks analysis of LSTM neural networks, Artificial Life and
Robotics, 24, 445–451, 2019.</mixed-citation></ref>
      <ref id="bib1.bib8"><label>8</label><?label 1?><mixed-citation>Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y.: Spectral networks and
locally connected networks on graphs, arXiv [preprint], <ext-link xlink:href="https://arxiv.org/abs/1312.6203">arXiv:1312.6203</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib9"><label>9</label><?label 1?><mixed-citation>Chen, F., Pan, S., Jiang, J., Huo, H., and Long, G.: DAGCN: dual attention graph convolutional networks, in: 2019 International Joint Conference on Neural Networks (IJCNN),  IEEE, 1–8, 2019.</mixed-citation></ref>
      <ref id="bib1.bib10"><label>10</label><?label 1?><mixed-citation>Cheng, L., Trenberth, K. E., Fasullo, J. T., Mayer, M., Balmaseda, M., and
Zhu, J.: Evolution of ocean heat content related to ENSO, J.
Climate, 32, 3529–3556, 2019.</mixed-citation></ref>
      <ref id="bib1.bib11"><label>11</label><?label 1?><mixed-citation>Chevillon, G.: Direct multi-step estimation and forecasting, J.
Econ. Surv., 21, 746–785, 2007.</mixed-citation></ref>
      <ref id="bib1.bib12"><label>12</label><?label 1?><mixed-citation>Compo, G. P., Whitaker, J. S., Sardeshmukh, P. D., Allan, R. J., McColl, C., Yin, X., Giese, B. S., Vose, R. S., Matsui, N., Ashcroft, L., Auchmann, R., Benoy, M., Bessemoulin, P., Brandsma, T., Brohan, P., Brunet, M., Comeaux, J., Cram, T., Crouthamel, R., Groisman, P. Y., Hersbach, H., Jones, P. D., Jonsson, T., Jourdain, S., Kelly, G., Knapp, K. R., Kruger, A., Kubota, H., Lentini, G., Lorrey, A., Lott, N., Lubker, S. J., Luterbacher, J., Marshall, G<?pagebreak page6998?>. J., Maugeri, M., Mock, C. J., Mok, H. Y., Nordli, O., Przybylak, R., Rodwell, M. J., Ross, T. F., Schuster, D., Srnec, L., Valente, M. A., Vizi, Z., Wang, X. L., Westcott, N., Woollen, J. S., and Worley, S. J.: NOAA/CIRES Twentieth Century Global Reanalysis Version 2c, Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory [data set], <ext-link xlink:href="https://doi.org/10.5065/D6N877TW" ext-link-type="DOI">10.5065/D6N877TW</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib13"><label>13</label><?label 1?><mixed-citation>Defferrard, M., Bresson, X., and Vandergheynst, P.: Convolutional neural
networks on graphs with fast localized spectral filtering, arXiv [preprint], <ext-link xlink:href="https://arxiv.org/abs/1606.09375">arXiv:1606.09375</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib14"><label>14</label><?label 1?><mixed-citation>Dommenget, D., Semenov, V., and Latif, M.: Impacts of the tropical Indian
and Atlantic Oceans on ENSO, Geophys. Res. Lett., 33, L11701, <ext-link xlink:href="https://doi.org/10.1029/2006GL025871" ext-link-type="DOI">10.1029/2006GL025871</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bib15"><label>15</label><?label 1?><mixed-citation>Exarchou, E., Ortega, P., Rodríguez-Fonseca, B., Losada, T., Polo, I.,
and Prodhomme, C.: Impact of equatorial Atlantic variability on ENSO
predictive skill, Nat. Commun., 12, 1–8, 2021.</mixed-citation></ref>
      <ref id="bib1.bib16"><label>16</label><?label 1?><mixed-citation>Gao, C. and Zhang, R.-H.: The roles of atmospheric wind and entrained water
temperature (T e) in the second-year cooling of the 2010–12 La Niña
event, Clim. Dynam., 48, 597–617, 2017.</mixed-citation></ref>
      <ref id="bib1.bib17"><label>17</label><?label 1?><mixed-citation>Ham, Y.-G., Kim, J.-H., and Luo, J.-J.: Deep learning for multi-year ENSO
forecasts, Nature, 573, 568–572, 2019.</mixed-citation></ref>
      <ref id="bib1.bib18"><label>18</label><?label 1?><mixed-citation>Hammond, D. K., Vandergheynst, P., and Gribonval, R.:Wavelets on graphs via
spectral graph theory, Appl. Comput. Harmon. A., 30,
129–150, 2011.</mixed-citation></ref>
      <ref id="bib1.bib19"><label>19</label><?label 1?><mixed-citation>He, D., Lin, P., Liu, H., Ding, L., and Jiang, J.: Dlenso: A deep learning
enso forecasting model, in: Pacific Rim International Conference on
Artificial Intelligence,  Springer,  12–23, 2019.</mixed-citation></ref>
      <ref id="bib1.bib20"><label>20</label><?label 1?><mixed-citation>He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image
recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR),  770–778,  <ext-link xlink:href="https://doi.org/10.1109/CVPR.2016.90" ext-link-type="DOI">10.1109/CVPR.2016.90</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib21"><label>21</label><?label 1?><mixed-citation>Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q.: Densely
connected convolutional networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
4700–4708, <ext-link xlink:href="https://doi.org/10.1109/CVPR.2017.243" ext-link-type="DOI">10.1109/CVPR.2017.243</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib22"><label>22</label><?label 1?><mixed-citation>Jin, F.-F.: An equatorial ocean recharge paradigm for ENSO. Part I:
Conceptual model, J. Atmos. Sci., 54, 811–829, 1997.</mixed-citation></ref>
      <ref id="bib1.bib23"><label>23</label><?label 1?><mixed-citation>Kao, H.-Y. and Yu, J.-Y.: Contrasting eastern-Pacific and central-Pacific
types of ENSO, J. Climate, 22, 615–632, 2009.</mixed-citation></ref>
      <ref id="bib1.bib24"><label>24</label><?label 1?><mixed-citation>Keys, R.: Cubic convolution interpolation for digital image processing, IEEE
T. Acoust., 29, 1153–1160,
1981.</mixed-citation></ref>
      <ref id="bib1.bib25"><label>25</label><?label 1?><mixed-citation>Kirtman, B., Shukla, J., Balmaseda, M., Graham, N., Penland, C., Xue, Y.,
and Zebiak, S.: Current status of ENSO forecast skill: A report to the
CLIVAR Working Group on Seasonal to Interannual Prediction, available at: <uri>http://nora.nerc.ac.uk/id/eprint/144128/1/nino3.pdf</uri> (last access: 15 November 2021),
2001.</mixed-citation></ref>
      <ref id="bib1.bib26"><label>26</label><?label 1?><mixed-citation>Kug, J.-S., Jin, F.-F., and An, S.-I.: Two types of El Niño events: cold
tongue El Niño and warm pool El Niño, J. Climate, 22,
1499–1515, 2009.</mixed-citation></ref>
      <ref id="bib1.bib27"><label>27</label><?label 1?><mixed-citation>Lau, K.-M., Li, P., and Nakazawa, T.: Dynamics of super cloud clusters,
westerly wind bursts, 30-60 day oscillations and ENSO: An unified view,
J. Meteorol. Soc. Jpn., 67, 205–219, 1989.</mixed-citation></ref>
      <ref id="bib1.bib28"><label>28</label><?label 1?><mixed-citation>Lau, K.-M., Ho, C.-H., and Chou, M.-D.: Water vapor and cloud feedback over
the tropical oceans: Can we use ENSO as a surrogate for climate change?,
Geophys. Res. Lett., 23, 2971–2974, 1996.</mixed-citation></ref>
      <ref id="bib1.bib29"><label>29</label><?label 1?><mixed-citation>Mazumder, R., Hastie, T., and Tibshirani, R.: Spectral regularization
algorithms for learning large incomplete matrices,   J. Mach.
Learn. Res., 11, 2287–2322, 2010.</mixed-citation></ref>
      <ref id="bib1.bib30"><label>30</label><?label 1?><mixed-citation>McDermott, P. L. and Wikle, C. K.: An ensemble quadratic echo state network
for non-linear spatio-temporal forecasting, Stat, 6, 315–330, <ext-link xlink:href="https://doi.org/doi.org/10.1002/sta4.160" ext-link-type="DOI">doi.org/10.1002/sta4.160</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib31"><label>31</label><?label 1?><mixed-citation>McDermott, P. L. and Wikle, C. K.: Bayesian recurrent neural network models
for forecasting and quantifying uncertainty in spatial-temporal data,
Entropy, 21,  184, <ext-link xlink:href="https://doi.org/10.3390/e21020184" ext-link-type="DOI">10.3390/e21020184</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib32"><label>32</label><?label 1?><mixed-citation>McPhaden, M. J.: Tropical Pacific Ocean heat content variations and ENSO
persistence barriers, Geophys. Res. Lett., 30, 1480, <ext-link xlink:href="https://doi.org/10.1029/2003GL016872" ext-link-type="DOI">10.1029/2003GL016872</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bib33"><label>33</label><?label 1?><mixed-citation>McPhaden, M. J.: A 21st century shift in the relationship between ENSO SST
and warm water volume anomalies, Geophys. Res. Lett., 39,   L09706, <ext-link xlink:href="https://doi.org/10.1029/2012GL051826" ext-link-type="DOI">10.1029/2012GL051826</ext-link>,
2012.</mixed-citation></ref>
      <ref id="bib1.bib34"><label>34</label><?label 1?><mixed-citation>Meinen, C. S. and McPhaden, M. J.: Observations of warm water volume changes
in the equatorial Pacific and their relationship to El Niño and La
Niña, J. Climate, 13, 3551–3559, 2000.</mixed-citation></ref>
      <ref id="bib1.bib35"><label>35</label><?label 1?><mixed-citation>Mu, B., Peng, C., Yuan, S., and Chen, L.: ENSO forecasting over multiple
time horizons using ConvLSTM network and rolling mechanism, in: 2019
International Joint Conference on Neural Networks (IJCNN),  IEEE, 1–8,
2019.</mixed-citation></ref>
      <ref id="bib1.bib36"><label>36</label><?label 1?><mixed-citation>Park, J.-H., Kug, J.-S., Li, T., and Behera, S. K.: Predicting El Niño
beyond 1-year lead: effect of the Western Hemisphere warm pool, Sci.
Rep., 8, 1–8, 2018.</mixed-citation></ref>
      <ref id="bib1.bib37"><label>37</label><?label 1?><mixed-citation>Qin, B.: BrunoQin/ENSO-ASC: ENSO-ASC 1.0.1 (1.0.1), Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.5201715" ext-link-type="DOI">10.5281/zenodo.5201715</ext-link>, 2021a.</mixed-citation></ref>
      <ref id="bib1.bib38"><label>38</label><?label 1?><mixed-citation>Qin, B.:  The training and validation dataset for ENSO-ASC model, Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.5179867" ext-link-type="DOI">10.5281/zenodo.5179867</ext-link>, 2021b.</mixed-citation></ref>
      <ref id="bib1.bib39"><label>39</label><?label 1?><mixed-citation>Ren, H.-L., Jin, F.-F., Tian, B., and Scaife, A. A.: Distinct persistence
barriers in two types of ENSO, Geophys. Res. Lett., 43, 10–973,
2016.</mixed-citation></ref>
      <ref id="bib1.bib40"><label>40</label><?label 1?><mixed-citation>Ren, H.-L., Zuo, J., and Deng, Y.: Statistical predictability of Niño
indices for two types of ENSO, Clim. Dynam., 52, 5361–5382, 2019.</mixed-citation></ref>
      <ref id="bib1.bib41"><label>41</label><?label 1?><mixed-citation>Rolnick, D., Donti, P. L., Kaack, L. H., Kochanski, K., Lacoste, A., Sankaran, K., Ross, A. S., Milojevic-Dupont, N., Jaques, N., Waldman-Brown, A., Luccioni, A., Maharaj, T., Sherwin, E. D., Mukkavilli, S. K., Kording, K. P., Gomes, C., Ng, A. Y., Hassabis, D., Platt, J. C., Creutzig, F., Chayes, J., and Bengio, Y.: Tackling climate change with machine learning, arXiv [preprint], <ext-link xlink:href="https://arxiv.org/abs/1906.05433">arXiv:1906.05433</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib42"><label>42</label><?label 1?><mixed-citation>Shi, X. and Yeung, D.-Y.: Machine learning for spatiotemporal sequence
forecasting: A survey, arXiv [preprint], <ext-link xlink:href="https://arxiv.org/abs/1808.06865">arXiv:1808.06865</ext-link> 2018.</mixed-citation></ref>
      <ref id="bib1.bib43"><label>43</label><?label 1?><mixed-citation>Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., and Woo, W.-C.:
Convolutional LSTM network: A machine learning approach for precipitation
nowcasting,    arXiv [preprint], <ext-link xlink:href="https://arxiv.org/abs/1506.04214">arXiv:1506.04214</ext-link> 2015.</mixed-citation></ref>
      <ref id="bib1.bib44"><label>44</label><?label 1?><mixed-citation>Wang, C., Deser, C., Yu, J.-Y., DiNezio, P., and Clement, A.: El Niño
and southern oscillation (ENSO): a review, Coral reefs of the eastern
tropical Pacific, 85–106, <ext-link xlink:href="https://doi.org/10.1007/978-94-017-7499-4_4" ext-link-type="DOI">10.1007/978-94-017-7499-4_4</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib45"><label>45</label><?label 1?><mixed-citation>Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P.: Image quality
assessment: from error visibility to structural similarity, IEEE
T. Image Process., 13, 600–612, 2004.</mixed-citation></ref>
      <?pagebreak page6999?><ref id="bib1.bib46"><label>46</label><?label 1?><mixed-citation>Webster, P.: The annual cycle and the predictability of the tropical coupled
ocean-atmosphere system, Meteorol. Atmos. Phys., 56, 33–55,
1995.</mixed-citation></ref>
      <ref id="bib1.bib47"><label>47</label><?label 1?><mixed-citation>Wentz, F. J., Ricciardulli, L., Gentemann, C.,  Meissner, T.,  Hilburn, K. A., and  Scott, J.:  Remote Sensing Systems Coriolis WindSat Environmental Suite on 0.25 deg grid, Version 7.0.1. Remote Sensing Systems, Santa Rosa, CA [data set], available   at: <uri>http://www.remss.com/missions/windsat</uri> (last access: 15 November 2021),  2013.</mixed-citation></ref>
      <ref id="bib1.bib48"><label>48</label><?label 1?><mixed-citation>Wentz, F. J.,  Meissner, T.,  Gentemann, C.,  Hilburn, K. A., and  Scott, J.:  Remote Sensing Systems GCOM-W1 AMSR2 Environmental Suite on 0.25 deg grid, Version 8.0. Remote Sensing Systems, Santa Rosa, CA [data set], available  at: <uri>http://www.remss.com/missions/amsr</uri> (last access: 15 November 2021),  2014.</mixed-citation></ref>
      <ref id="bib1.bib49"><label>49</label><?label 1?><mixed-citation>Wentz, F. J., Gentemann, C., and  Hilburn, K. A.:  Remote Sensing Systems TRMM TMI  Environmental Suite on 0.25 deg grid, Version 7.1, Remote Sensing Systems, Santa Rosa, CA [data set], available at <uri>http://www.remss.com/missions/tmi</uri> (last access: 15 November 2021),  2015.</mixed-citation></ref>
      <ref id="bib1.bib50"><label>50</label><?label 1?><mixed-citation>Xue, Y., Chen, M., Kumar, A., Hu, Z.-Z., and Wang, W.: Prediction skill and
bias of tropical Pacific sea surface temperatures in the NCEP Climate
Forecast System version 2, J. Climate, 26, 5358–5378, 2013.</mixed-citation></ref>
      <ref id="bib1.bib51"><label>51</label><?label 1?><mixed-citation>Yosinski, J., Clune, J., Bengio, Y., and Lipson, H.: How transferable are
features in deep neural networks?,   arXiv [preprint], <ext-link xlink:href="https://arxiv.org/abs/1411.1792">arXiv:1411.1792</ext-link> 2014.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bib52"><label>52</label><?label 1?><mixed-citation>Yu, J.-Y. and Kim, S. T.: Relationships between extratropical sea level
pressure variations and the central Pacific and eastern Pacific types of
ENSO, J. Climate, 24, 708–720, 2011.</mixed-citation></ref>
      <ref id="bib1.bib53"><label>53</label><?label 1?><mixed-citation>Yu, J.-Y., Kao, H.-Y., and Lee, T.: Subtropics-related interannual sea
surface temperature variability in the central equatorial Pacific, J.
Climate, 23, 2869–2884, 2010.</mixed-citation></ref>
      <ref id="bib1.bib54"><label>54</label><?label 1?><mixed-citation>Zhang, R., Liu, Q., and Hang, R.: Tropical Cyclone Intensity Estimation
Using Two-Branch Convolutional Neural Network From Infrared and Water Vapor
Images, IEEE T. Geosci. Remote S., 58, 586–597,
2019.</mixed-citation></ref>
      <ref id="bib1.bib55"><label>55</label><?label 1?><mixed-citation>Zhang, W., Jin, F.-F., Stuecker, M. F., Wittenberg, A. T., Timmermann, A.,
Ren, H.-L., Kug, J.-S., Cai, W., and Cane, M.: Unraveling El Niño's
impact on the East Asian monsoon and Yangtze River summer flooding,
Geophys. Res. Lett., 43, 11–375, 2016.</mixed-citation></ref>
      <ref id="bib1.bib56"><label>56</label><?label 1?><mixed-citation>Zhang, W., Li, S., Jin, F.-F., Xie, R., Liu, C., Stuecker, M. F., and Xue,
A.: ENSO regime changes responsible for decadal phase relationship
variations between ENSO sea surface temperature and warm water volume,
Geophys. Res. Lett., 46, 7546–7553, 2019.</mixed-citation></ref>
      <ref id="bib1.bib57"><label>57</label><?label 1?><mixed-citation>Zheng, F. and Zhu, J.: Spring predictability barrier of ENSO events from the
perspective of an ensemble prediction system, Global Planet. Change,
72, 108–117, 2010.</mixed-citation></ref>
      <ref id="bib1.bib58"><label>58</label><?label 1?><mixed-citation>Zheng, G., Li, X., Zhang, R.-H., and Liu, B.: Purely satellite data–driven
deep learning forecast of complicated tropical instability waves, Sci.
Adv., 6, eaba1482, <ext-link xlink:href="https://doi.org/10.1126/sciadv.aba1482" ext-link-type="DOI">10.1126/sciadv.aba1482</ext-link>, 2020.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>ENSO-ASC 1.0.0: ENSO deep learning forecast model with a multivariate air–sea coupler</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>1</label><mixed-citation>
Balmaseda, M. A., Davey, M. K., and Anderson, D. L.: Decadal and seasonal
dependence of ENSO prediction skill, J. Climate, 8, 2705–2715,
1995.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>2</label><mixed-citation>Barnston, A. G., Tippett, M. K., L'Heureux, M. L., Li, S., and DeWitt, D.
G.: Skill of real-time seasonal ENSO model predictions during 2002–11: Is
our capability increasing?, B. Am. Meteorol. Soc.,
93, 631–651, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>3</label><mixed-citation>Bayr, T., Dommenget, D., and Latif, M.: Walker circulation controls ENSO
atmospheric feedbacks in uncoupled and coupled climate model simulations,
Clim. Dynam., 54, 2831–2846, <a href="https://doi.org/10.1007/s00382-020-05152-2" target="_blank">https://doi.org/10.1007/s00382-020-05152-2</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>4</label><mixed-citation>
Behringer, D. W. and  Xue, Y.: Evaluation of the global ocean data assimilation system at NCEP: The Pacific Ocean, in: Proc. Eighth Symp. on Integrated Observing and Assimilation Systems for Atmosphere, Oceans, and Land Surface, 2004.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>5</label><mixed-citation>Bellenger, H., Guilyardi, É., Leloup, J., Lengaigne, M., and Vialard,
J.: ENSO representation in climate models: From CMIP3 to CMIP5, Clim.
Dynam., 42, 1999–2018, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>6</label><mixed-citation>Bjerknes, J.: Atmospheric teleconnections from the equatorial Pacific,
Mon. Weather Rev., 97, 163–172, 1969.
</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>7</label><mixed-citation>Broni-Bedaiko, C., Katsriku, F. A., Unemi, T., Atsumi, M., Abdulai, J.-D.,
Shinomiya, N., and Owusu, E.: El Niño-Southern Oscillation forecasting
using complex networks analysis of LSTM neural networks, Artificial Life and
Robotics, 24, 445–451, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>8</label><mixed-citation>Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y.: Spectral networks and
locally connected networks on graphs, arXiv [preprint], <a href="https://arxiv.org/abs/1312.6203" target="_blank">arXiv:1312.6203</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>9</label><mixed-citation>Chen, F., Pan, S., Jiang, J., Huo, H., and Long, G.: DAGCN: dual attention graph convolutional networks, in: 2019 International Joint Conference on Neural Networks (IJCNN),  IEEE, 1–8, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>10</label><mixed-citation>Cheng, L., Trenberth, K. E., Fasullo, J. T., Mayer, M., Balmaseda, M., and
Zhu, J.: Evolution of ocean heat content related to ENSO, J.
Climate, 32, 3529–3556, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>11</label><mixed-citation>Chevillon, G.: Direct multi-step estimation and forecasting, J.
Econ. Surv., 21, 746–785, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>12</label><mixed-citation>
Compo, G. P., Whitaker, J. S., Sardeshmukh, P. D., Allan, R. J., McColl, C., Yin, X., Giese, B. S., Vose, R. S., Matsui, N., Ashcroft, L., Auchmann, R., Benoy, M., Bessemoulin, P., Brandsma, T., Brohan, P., Brunet, M., Comeaux, J., Cram, T., Crouthamel, R., Groisman, P. Y., Hersbach, H., Jones, P. D., Jonsson, T., Jourdain, S., Kelly, G., Knapp, K. R., Kruger, A., Kubota, H., Lentini, G., Lorrey, A., Lott, N., Lubker, S. J., Luterbacher, J., Marshall, G. J., Maugeri, M., Mock, C. J., Mok, H. Y., Nordli, O., Przybylak, R., Rodwell, M. J., Ross, T. F., Schuster, D., Srnec, L., Valente, M. A., Vizi, Z., Wang, X. L., Westcott, N., Woollen, J. S., and Worley, S. J.: NOAA/CIRES Twentieth Century Global Reanalysis Version 2c, Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory [data set], <a href="https://doi.org/10.5065/D6N877TW" target="_blank">https://doi.org/10.5065/D6N877TW</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>13</label><mixed-citation>Defferrard, M., Bresson, X., and Vandergheynst, P.: Convolutional neural
networks on graphs with fast localized spectral filtering, arXiv [preprint], <a href="https://arxiv.org/abs/1606.09375" target="_blank">arXiv:1606.09375</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>14</label><mixed-citation>Dommenget, D., Semenov, V., and Latif, M.: Impacts of the tropical Indian
and Atlantic Oceans on ENSO, Geophys. Res. Lett., 33, L11701, <a href="https://doi.org/10.1029/2006GL025871" target="_blank">https://doi.org/10.1029/2006GL025871</a>, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>15</label><mixed-citation>Exarchou, E., Ortega, P., Rodríguez-Fonseca, B., Losada, T., Polo, I.,
and Prodhomme, C.: Impact of equatorial Atlantic variability on ENSO
predictive skill, Nat. Commun., 12, 1–8, 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>16</label><mixed-citation>Gao, C. and Zhang, R.-H.: The roles of atmospheric wind and entrained water
temperature (T e) in the second-year cooling of the 2010–12 La Niña
event, Clim. Dynam., 48, 597–617, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>17</label><mixed-citation>Ham, Y.-G., Kim, J.-H., and Luo, J.-J.: Deep learning for multi-year ENSO
forecasts, Nature, 573, 568–572, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>18</label><mixed-citation>Hammond, D. K., Vandergheynst, P., and Gribonval, R.:Wavelets on graphs via
spectral graph theory, Appl. Comput. Harmon. A., 30,
129–150, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>19</label><mixed-citation>He, D., Lin, P., Liu, H., Ding, L., and Jiang, J.: Dlenso: A deep learning
enso forecasting model, in: Pacific Rim International Conference on
Artificial Intelligence,  Springer,  12–23, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>20</label><mixed-citation>He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image
recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR),  770–778,  <a href="https://doi.org/10.1109/CVPR.2016.90" target="_blank">https://doi.org/10.1109/CVPR.2016.90</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>21</label><mixed-citation>Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q.: Densely
connected convolutional networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
4700–4708, <a href="https://doi.org/10.1109/CVPR.2017.243" target="_blank">https://doi.org/10.1109/CVPR.2017.243</a>, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>22</label><mixed-citation>Jin, F.-F.: An equatorial ocean recharge paradigm for ENSO. Part I:
Conceptual model, J. Atmos. Sci., 54, 811–829, 1997.
</mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>23</label><mixed-citation>Kao, H.-Y. and Yu, J.-Y.: Contrasting eastern-Pacific and central-Pacific
types of ENSO, J. Climate, 22, 615–632, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>24</label><mixed-citation>Keys, R.: Cubic convolution interpolation for digital image processing, IEEE
T. Acoust., 29, 1153–1160,
1981.
</mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>25</label><mixed-citation>Kirtman, B., Shukla, J., Balmaseda, M., Graham, N., Penland, C., Xue, Y.,
and Zebiak, S.: Current status of ENSO forecast skill: A report to the
CLIVAR Working Group on Seasonal to Interannual Prediction, available at: <a href="http://nora.nerc.ac.uk/id/eprint/144128/1/nino3.pdf" target="_blank"/> (last access: 15 November 2021),
2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>26</label><mixed-citation>Kug, J.-S., Jin, F.-F., and An, S.-I.: Two types of El Niño events: cold
tongue El Niño and warm pool El Niño, J. Climate, 22,
1499–1515, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>27</label><mixed-citation>Lau, K.-M., Li, P., and Nakazawa, T.: Dynamics of super cloud clusters,
westerly wind bursts, 30-60 day oscillations and ENSO: An unified view,
J. Meteorol. Soc. Jpn., 67, 205–219, 1989.
</mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>28</label><mixed-citation>Lau, K.-M., Ho, C.-H., and Chou, M.-D.: Water vapor and cloud feedback over
the tropical oceans: Can we use ENSO as a surrogate for climate change?,
Geophys. Res. Lett., 23, 2971–2974, 1996.
</mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>29</label><mixed-citation>Mazumder, R., Hastie, T., and Tibshirani, R.: Spectral regularization
algorithms for learning large incomplete matrices,   J. Mach.
Learn. Res., 11, 2287–2322, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>30</label><mixed-citation>McDermott, P. L. and Wikle, C. K.: An ensemble quadratic echo state network
for non-linear spatio-temporal forecasting, Stat, 6, 315–330, <a href="https://doi.org/doi.org/10.1002/sta4.160" target="_blank">https://doi.org/doi.org/10.1002/sta4.160</a>, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>31</label><mixed-citation>McDermott, P. L. and Wikle, C. K.: Bayesian recurrent neural network models
for forecasting and quantifying uncertainty in spatial-temporal data,
Entropy, 21,  184, <a href="https://doi.org/10.3390/e21020184" target="_blank">https://doi.org/10.3390/e21020184</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>32</label><mixed-citation>McPhaden, M. J.: Tropical Pacific Ocean heat content variations and ENSO
persistence barriers, Geophys. Res. Lett., 30, 1480, <a href="https://doi.org/10.1029/2003GL016872" target="_blank">https://doi.org/10.1029/2003GL016872</a>, 2003.
</mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>33</label><mixed-citation>McPhaden, M. J.: A 21st century shift in the relationship between ENSO SST
and warm water volume anomalies, Geophys. Res. Lett., 39,   L09706, <a href="https://doi.org/10.1029/2012GL051826" target="_blank">https://doi.org/10.1029/2012GL051826</a>,
2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>34</label><mixed-citation>Meinen, C. S. and McPhaden, M. J.: Observations of warm water volume changes
in the equatorial Pacific and their relationship to El Niño and La
Niña, J. Climate, 13, 3551–3559, 2000.
</mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>35</label><mixed-citation>Mu, B., Peng, C., Yuan, S., and Chen, L.: ENSO forecasting over multiple
time horizons using ConvLSTM network and rolling mechanism, in: 2019
International Joint Conference on Neural Networks (IJCNN),  IEEE, 1–8,
2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>36</label><mixed-citation>Park, J.-H., Kug, J.-S., Li, T., and Behera, S. K.: Predicting El Niño
beyond 1-year lead: effect of the Western Hemisphere warm pool, Sci.
Rep., 8, 1–8, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>37</label><mixed-citation>
Qin, B.: BrunoQin/ENSO-ASC: ENSO-ASC 1.0.1 (1.0.1), Zenodo [code], <a href="https://doi.org/10.5281/zenodo.5201715" target="_blank">https://doi.org/10.5281/zenodo.5201715</a>, 2021a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>38</label><mixed-citation>
Qin, B.:  The training and validation dataset for ENSO-ASC model, Zenodo [data set], <a href="https://doi.org/10.5281/zenodo.5179867" target="_blank">https://doi.org/10.5281/zenodo.5179867</a>, 2021b.
</mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>39</label><mixed-citation>Ren, H.-L., Jin, F.-F., Tian, B., and Scaife, A. A.: Distinct persistence
barriers in two types of ENSO, Geophys. Res. Lett., 43, 10–973,
2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>40</label><mixed-citation>Ren, H.-L., Zuo, J., and Deng, Y.: Statistical predictability of Niño
indices for two types of ENSO, Clim. Dynam., 52, 5361–5382, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>41</label><mixed-citation>Rolnick, D., Donti, P. L., Kaack, L. H., Kochanski, K., Lacoste, A., Sankaran, K., Ross, A. S., Milojevic-Dupont, N., Jaques, N., Waldman-Brown, A., Luccioni, A., Maharaj, T., Sherwin, E. D., Mukkavilli, S. K., Kording, K. P., Gomes, C., Ng, A. Y., Hassabis, D., Platt, J. C., Creutzig, F., Chayes, J., and Bengio, Y.: Tackling climate change with machine learning, arXiv [preprint], <a href="https://arxiv.org/abs/1906.05433" target="_blank">arXiv:1906.05433</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>42</label><mixed-citation>Shi, X. and Yeung, D.-Y.: Machine learning for spatiotemporal sequence
forecasting: A survey, arXiv [preprint], <a href="https://arxiv.org/abs/1808.06865" target="_blank">arXiv:1808.06865</a> 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>43</label><mixed-citation>Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., and Woo, W.-C.:
Convolutional LSTM network: A machine learning approach for precipitation
nowcasting,    arXiv [preprint], <a href="https://arxiv.org/abs/1506.04214" target="_blank">arXiv:1506.04214</a> 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>44</label><mixed-citation>Wang, C., Deser, C., Yu, J.-Y., DiNezio, P., and Clement, A.: El Niño
and southern oscillation (ENSO): a review, Coral reefs of the eastern
tropical Pacific, 85–106, <a href="https://doi.org/10.1007/978-94-017-7499-4_4" target="_blank">https://doi.org/10.1007/978-94-017-7499-4_4</a>, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>45</label><mixed-citation>Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P.: Image quality
assessment: from error visibility to structural similarity, IEEE
T. Image Process., 13, 600–612, 2004.
</mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>46</label><mixed-citation>Webster, P.: The annual cycle and the predictability of the tropical coupled
ocean-atmosphere system, Meteorol. Atmos. Phys., 56, 33–55,
1995.
</mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>47</label><mixed-citation>
Wentz, F. J., Ricciardulli, L., Gentemann, C.,  Meissner, T.,  Hilburn, K. A., and  Scott, J.:  Remote Sensing Systems Coriolis WindSat Environmental Suite on 0.25 deg grid, Version 7.0.1. Remote Sensing Systems, Santa Rosa, CA [data set], available   at: <a href="http://www.remss.com/missions/windsat" target="_blank"/> (last access: 15 November 2021),  2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>48</label><mixed-citation>
Wentz, F. J.,  Meissner, T.,  Gentemann, C.,  Hilburn, K. A., and  Scott, J.:  Remote Sensing Systems GCOM-W1 AMSR2 Environmental Suite on 0.25 deg grid, Version 8.0. Remote Sensing Systems, Santa Rosa, CA [data set], available  at: <a href="http://www.remss.com/missions/amsr" target="_blank"/> (last access: 15 November 2021),  2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>49</label><mixed-citation>
Wentz, F. J., Gentemann, C., and  Hilburn, K. A.:  Remote Sensing Systems TRMM TMI  Environmental Suite on 0.25 deg grid, Version 7.1, Remote Sensing Systems, Santa Rosa, CA [data set], available at <a href="http://www.remss.com/missions/tmi" target="_blank"/> (last access: 15 November 2021),  2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>50</label><mixed-citation>Xue, Y., Chen, M., Kumar, A., Hu, Z.-Z., and Wang, W.: Prediction skill and
bias of tropical Pacific sea surface temperatures in the NCEP Climate
Forecast System version 2, J. Climate, 26, 5358–5378, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>51</label><mixed-citation>Yosinski, J., Clune, J., Bengio, Y., and Lipson, H.: How transferable are
features in deep neural networks?,   arXiv [preprint], <a href="https://arxiv.org/abs/1411.1792" target="_blank">arXiv:1411.1792</a> 2014.

</mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>52</label><mixed-citation>Yu, J.-Y. and Kim, S. T.: Relationships between extratropical sea level
pressure variations and the central Pacific and eastern Pacific types of
ENSO, J. Climate, 24, 708–720, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>53</label><mixed-citation>Yu, J.-Y., Kao, H.-Y., and Lee, T.: Subtropics-related interannual sea
surface temperature variability in the central equatorial Pacific, J.
Climate, 23, 2869–2884, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>54</label><mixed-citation>Zhang, R., Liu, Q., and Hang, R.: Tropical Cyclone Intensity Estimation
Using Two-Branch Convolutional Neural Network From Infrared and Water Vapor
Images, IEEE T. Geosci. Remote S., 58, 586–597,
2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>55</label><mixed-citation>Zhang, W., Jin, F.-F., Stuecker, M. F., Wittenberg, A. T., Timmermann, A.,
Ren, H.-L., Kug, J.-S., Cai, W., and Cane, M.: Unraveling El Niño's
impact on the East Asian monsoon and Yangtze River summer flooding,
Geophys. Res. Lett., 43, 11–375, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>56</label><mixed-citation>Zhang, W., Li, S., Jin, F.-F., Xie, R., Liu, C., Stuecker, M. F., and Xue,
A.: ENSO regime changes responsible for decadal phase relationship
variations between ENSO sea surface temperature and warm water volume,
Geophys. Res. Lett., 46, 7546–7553, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>57</label><mixed-citation>Zheng, F. and Zhu, J.: Spring predictability barrier of ENSO events from the
perspective of an ensemble prediction system, Global Planet. Change,
72, 108–117, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>58</label><mixed-citation>Zheng, G., Li, X., Zhang, R.-H., and Liu, B.: Purely satellite data–driven
deep learning forecast of complicated tropical instability waves, Sci.
Adv., 6, eaba1482, <a href="https://doi.org/10.1126/sciadv.aba1482" target="_blank">https://doi.org/10.1126/sciadv.aba1482</a>, 2020.
</mixed-citation></ref-html>--></article>
