<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">GMD</journal-id><journal-title-group>
    <journal-title>Geoscientific Model Development</journal-title>
    <abbrev-journal-title abbrev-type="publisher">GMD</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Geosci. Model Dev.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1991-9603</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/gmd-19-4009-2026</article-id><title-group><article-title>From reanalysis to climatology: deep learning reconstruction  of tropical cyclogenesis in the western North Pacific</article-title><alt-title>Reconstructing tropical cyclogenesis climatology using deep learning</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Le</surname><given-names>Duc-Trong</given-names></name>
          <email>trongld@vnu.edu.vn</email>
        <ext-link>https://orcid.org/0000-0003-4621-8956</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Dang</surname><given-names>Tran-Binh</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Hoang Gia</surname><given-names>Anh-Duc</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Nguyen</surname><given-names>Duc-Hai</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Tien</surname><given-names>Minh-Hoa</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Ngo</surname><given-names>Xuan-Truong</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff3">
          <name><surname>Luu</surname><given-names>Quang-Trung</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff4">
          <name><surname>Luu</surname><given-names>Quang-Lap</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff4">
          <name><surname>Nguyen</surname><given-names>Tai-Hung</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-6098-2136</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Nguyen</surname><given-names>Thanh T. N.</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff2">
          <name><surname>Kieu</surname><given-names>Chanh</given-names></name>
          <email>ckieu@iu.edu</email>
        <ext-link>https://orcid.org/0000-0001-8947-8534</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Department of Earth and Atmospheric Sciences, Indiana University, Bloomington, IN 47405, USA</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>Université Paris-Saclay – CNRS – CentraleSupélec – L2S, Gif-sur-Yvette, 91192, France</institution>
        </aff>
        <aff id="aff4"><label>4</label><institution>School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Duc-Trong Le (trongld@vnu.edu.vn) and Chanh Kieu (ckieu@iu.edu)</corresp></author-notes><pub-date><day>18</day><month>May</month><year>2026</year></pub-date>
      
      <volume>19</volume>
      <issue>10</issue>
      <fpage>4009</fpage><lpage>4030</lpage>
      <history>
        <date date-type="received"><day>4</day><month>September</month><year>2025</year></date>
           <date date-type="rev-request"><day>9</day><month>December</month><year>2025</year></date>
           <date date-type="rev-recd"><day>4</day><month>March</month><year>2026</year></date>
           <date date-type="accepted"><day>19</day><month>April</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Duc-Trong Le et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026.html">This article is available from https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026.html</self-uri><self-uri xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026.pdf">The full text article is available as a PDF file from https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e197">Tropical cyclogenesis (TCG) climatology is the key to understanding regional weather extremes and long-term risk, yet their large-scale environmental drivers remain difficult to characterize from observations or traditional physical-based modeling. In this study, we present a deep learning (DL) framework based on an 18-layer residual convolutional neural network (TCG-Net) to reconstruct TCG climatology in the western North Pacific (WNP) basin from climate reanalysis data. The framework addresses two tasks (1) the Past Domain (PD) task that predicts when TCG occurs in the WNP within the next 48 h, and (2) the Dynamic Domain (DD) task that predicts the spatial distribution of TCG at a given date and time. For each task, different labeling strategies are employed to generate negative samples that can help maximize the distinction between TCG and non-TCG conditions. To enhance the model's capability in handling the rarity of TCG data, temporal feature enrichment is further used to incorporate environmental information from the preceding 6 h time steps, which helps improve the representation of each training task. In addition, random under-sampling is applied with class weighting to address the severe imbalance caused by large numbers of negative TCG samples under these labeling strategies. Using NASA's Modern-Era Retrospective analysis for Research and Applications Version 2 (MERRA-2) with a training period from 1980–2016 and a test set from 2017–2022, we show that TCG-Net achieves an overall <inline-formula><mml:math id="M1" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>-score of 0.39 for the PD task and 0.33 for the DD task. In the PD task, feature selection experiments reveal that only a subset of environmental variables is required for robust performance, consistent with prior physical studies. In contrast, for the DD task, full-feature models perform better, likely due to their ability to exploit unknown or latent feature interactions. Both tasks reproduce key characteristics of the observed seasonality and spatial TCG distribution when evaluated against the best-track dataset. These results demonstrate that DL-based reconstructions, when coupled with task-specific labeling, temporal enrichment, and imbalance-aware training, can complement physics-based models and vortex-tracking algorithms and provide an efficient pathway for downscaling or projecting TCG climatology from coarse-resolution climate model outputs.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>Quỹ Đổi mới sáng tạo Vingroup</funding-source>
<award-id>VINIF.2023.DA019</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e220">The western North Pacific (WNP) basin has been well-documented to be the most active area of tropical cyclone (TC) activities <xref ref-type="bibr" rid="bib1.bibx24 bib1.bibx37 bib1.bibx49" id="paren.1"/>. With favorable conditions for TC formation (also known as tropical cyclogenesis, or TCG) such as warmer sea surface temperature (SST), active monsoon trough formation, or frequent convectively coupled equatorial waves, WNP produces about 25–30 TCs annually with a quarter of that affecting Vietnam coastal regions <xref ref-type="bibr" rid="bib1.bibx10 bib1.bibx50 bib1.bibx46" id="paren.2"/>. From the climate perspective, any change in TC main characteristics such as frequency, genesis locations, or intensity is often considered to be a manifestation of climate change. Thus, developing effective methods to construct TC climatology from different climate datasets is of significant importance for studying future projections of TC climatology from climate model outputs or outlooks from global forecasting systems <xref ref-type="bibr" rid="bib1.bibx23 bib1.bibx54 bib1.bibx1 bib1.bibx27 bib1.bibx7 bib1.bibx6 bib1.bibx18" id="paren.3"><named-content content-type="pre">e.g.,</named-content></xref>.</p>
      <p id="d2e234">Traditionally, creating a TC climatology from gridded climate output involves using a vortex tracking algorithm to detect TC centers. This algorithm relies on a set of TC characteristics, such as absolute vorticity, surface maximum wind, minimum central pressure, warm core, or lifetime, which are checked at each model grid point <xref ref-type="bibr" rid="bib1.bibx55 bib1.bibx64 bib1.bibx43 bib1.bibx5 bib1.bibx16 bib1.bibx60 bib1.bibx51 bib1.bibx52" id="paren.4"/>. While being effective for high-resolution model outputs where TC characteristics are well-captured and thus suitable for weather forecast models or well-developed TCs, vortex tracking methods face challenges with coarse-resolution climate models (<inline-formula><mml:math id="M2" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.5</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula>). For these coarse-resolution models or datasets, TC characteristics, especially during early formation, are often unclear <xref ref-type="bibr" rid="bib1.bibx60 bib1.bibx47" id="paren.5"/>. Consequently, directly tracking a vortex from such outputs can lead to uncertainties in the timing and location of early TCG. This issue becomes more apparent when studying climate change aspects like shifts in TCG location or timing across different climate datasets <xref ref-type="bibr" rid="bib1.bibx45 bib1.bibx47" id="paren.6"><named-content content-type="pre">e.g.,</named-content></xref>.</p>
      <p id="d2e260">The rapid advancement of machine learning (ML) techniques has opened new avenues for atmospheric research as well as operational forecasting. Given the vast amount of observational and model-generated data, weather and climate systems naturally provide a “big data” platform that are well-suited for training ML models not only for short-term weather prediction but also for capturing long-term climate variability <xref ref-type="bibr" rid="bib1.bibx41 bib1.bibx36 bib1.bibx2 bib1.bibx25 bib1.bibx33" id="paren.7"><named-content content-type="pre">e.g.,</named-content></xref>. In fact, many private and governmental organizations have recently developed deep learning architectures that outperform traditional physics-based models in weather forecasting, as demonstrated by <xref ref-type="bibr" rid="bib1.bibx36" id="text.8"/> and <xref ref-type="bibr" rid="bib1.bibx25" id="text.9"/>.</p>
      <p id="d2e274">Among recent applications of ML to TC research, most efforts have been limited to short-term contexts such as weather forecasting, satellite retrieval, or diagnostic studies. For examples, <xref ref-type="bibr" rid="bib1.bibx30" id="text.10"/>, <xref ref-type="bibr" rid="bib1.bibx58" id="text.11"/> developed a deep learning (DL) model with satellite data to train a convolutional neural network, which can categorize TC intensity based on different cloud patterns and satellite channels. This line of approach has been further advanced to help improve TC forecasts by integrating the tracking information and/or other reanalysis data, with some modest performance for nowcasting and diagnoses <xref ref-type="bibr" rid="bib1.bibx11 bib1.bibx21 bib1.bibx9 bib1.bibx14" id="paren.12"><named-content content-type="pre">e.g.,</named-content></xref>.</p>
      <p id="d2e289">Specific to TCG study from climate data <xref ref-type="bibr" rid="bib1.bibx33" id="paren.13"/>, hereinafter NK2024, recently evaluated several DL architectures and identified promising potential for early warning of TCG events in the WNP basin. Using NCEP reanalysis data and formulating TCG detection as a classification problem, they demonstrated reasonable forecast skill of DL models, even at a coarse spatial resolution of <inline-formula><mml:math id="M3" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula>. While the model’s predictive skill declines with increasing forecast lead time, their approach highlights an important capability of detecting TCG directly from climate outputs. In particular, certain DL architectures can be used to identify both the location and timing of TCG events within a given domain, which is valuable for broader applications in TCG climatology or future projections. It is important to emphasize here the distinction between predicting and detecting TCG. While long-lead TCG prediction suffers from rapidly decreasing skill due to the inherent limits of predictability in tropical dynamics, detecting TCG from coarse-resolution climate data (also known as downscaling) proves to be much more skillful and feasible. The reason is that detection relies primarily on identifying favorable environmental conditions at the time of genesis, making it a problem of climate downscaling rather than one constrained by the chaotic dynamics of the atmosphere.</p>
      <p id="d2e311">Although existing ML applications for TC research show some promises, they have focused so far mostly on short-term prediction or nowcasting, rather than on TC climatology. Specifically, the use of ML for constructing TC climatology remains relatively preliminary <xref ref-type="bibr" rid="bib1.bibx40 bib1.bibx56 bib1.bibx8" id="paren.14"><named-content content-type="pre">e.g.,</named-content></xref>. Key challenges include how to leverage ML to diagnose climate model outputs, identify and correct model biases, or construct robust statistics of large-scale climate features. Thus, the development of ML-based techniques for downscaling of TC climatology or distributions of any other extreme events from reanalysis data is largely unexplored, yet represents an important open direction for future applications of DL for climate research.</p>
      <p id="d2e319">Given the rapid advancements in DL techniques, the main objective of this study is to introduce TCG-Net, a framework for reconstructing TCG climatology from climate reanalysis datasets. Specifically, we will extract key TCG characteristics such as frequency or spatial distribution from the NASA Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2), instead of forecasting TCG as in, e.g., NK2024. Our DL-derived TCG climatology can serve as an independent validation and complement to those obtained from traditional vortex tracking methods. Furthermore, any TC climatology obtained from MERRA-2 data can also be used as a reference for examining the change of TC climatology in future projections, which justifies our DL approach herein.</p>
      <p id="d2e322">The rest of this work is organized as follows. In Sect. <xref ref-type="sec" rid="Ch1.S2"/>, details of data pre-processing and our CNN algorithms are presented. An approach to generate and label TCG binary dataset for each application will also be discussed. Section <xref ref-type="sec" rid="Ch1.S3"/> presents the detailed design of our DL pipeline in this study, along with DL experimental designs. Section <xref ref-type="sec" rid="Ch1.S4"/> provides results and related discussions. Finally, a summary and concluding remarks are given in Sect. <xref ref-type="sec" rid="Ch1.S5"/>.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Methodology and Data</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Input data</title>
      <p id="d2e349">To make our work directly applicable to research in TC climate downscaling, the MERRA-2 reanalysis dataset <xref ref-type="bibr" rid="bib1.bibx13" id="paren.15"/> was used in this study. This dataset is an atmospheric reanalysis based on the Goddard Earth Observing System Model (GEOS-5, Version 5) data assimilation system <xref ref-type="bibr" rid="bib1.bibx13" id="paren.16"/>. Unlike the original MERRA, MERRA-2, employed a newer version of GEOS-5 that assimilated newer microwave sounders and infrared radiance, as well as other data types. In particular, all data collections from MERRA-2, are provided on the same horizontal grid, which has <inline-formula><mml:math id="M4" display="inline"><mml:mrow><mml:mn mathvariant="normal">576</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">361</mml:mn></mml:mrow></mml:math></inline-formula> points in the longitudinal and latitudinal direction, respectively (a resolution of <inline-formula><mml:math id="M5" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.625</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">0.5</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> longitude-by-latitude grid), and interpolated to 42 standard pressure vertical levels. While the output collections of MERRA-2, are on the regular <inline-formula><mml:math id="M6" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.625</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">0.5</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> that are relatively coarse for TC inner-core region, our focus in this study is on the climatology of TCG, which requires mostly environmental conditions at the meso to synoptic-scale. As such, 0.5° resolution data is sufficient for our purposes as discussed in NK2024.</p>
      <p id="d2e398">Although several reanalysis datasets are available, MERRA-2 was selected for this study primarily due to two reasons: (i) its spatial and temporal resolution is suitable for detecting  TCG, (ii) its data format is convenient for ML model development, and (iii) TC-related information was assimilated into MERRA-2 via derived satellite products <xref ref-type="bibr" rid="bib1.bibx13" id="paren.17"/>. Unlike some TC metrics such as intensity or accumulated energy that require detailed inner-core structure, TCG is a process that is largely governed by environmental conditions. With a resolution of 0.5° MERRA-2 is therefore expected to capture the overall environmental conditions for DL purposes. Note that the MERRA-2 dataset provides gridded atmospheric data that includes 11 key meteorological variables at standard pressure levels, spanning the global domain from <inline-formula><mml:math id="M7" display="inline"><mml:mrow><mml:mn mathvariant="normal">90</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> S to <inline-formula><mml:math id="M8" display="inline"><mml:mrow><mml:mn mathvariant="normal">90</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> N latitude (at <inline-formula><mml:math id="M9" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.5</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> resolution) and from <inline-formula><mml:math id="M10" display="inline"><mml:mrow><mml:mn mathvariant="normal">180</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> W to <inline-formula><mml:math id="M11" display="inline"><mml:mrow><mml:mn mathvariant="normal">180</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> E longitude (at <inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.625</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> resolution). The dataset is available from 1 January 1980, to 31 December 2022, with data sampled every 3 h. Each daily file contains 8 time slices and is stored in NETCDF4 format, with a file size of approximately 2.2–2.3 gigabytes.</p>
      <p id="d2e465">One limitation of the MERRA-2 dataset as compared to other reanalysis datasets is that this dataset contains a single resolution of <inline-formula><mml:math id="M13" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.5</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">0.625</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula>, while other reanalysis datasets such as ERA5 provide higher resolution up to <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.25</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> at an hourly interval. Using such higher-resolution datasets is certainly an advantage, as it can help optimize ML models. However, we note that most current global climate projection outputs are given on <inline-formula><mml:math id="M15" display="inline"><mml:mn mathvariant="normal">0.5</mml:mn></mml:math></inline-formula> to <inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> resolutions. Thus, using <inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.5</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> data can help better demonstrate the usefulness and facilitate the applications or finetuning of our DL models for reconstructing TC climatology from global climate outputs as designed. While ERA5 is considered to be among the best for climate reanalysis and DL model development, whether ERA5 is better than MERRA-2 in terms of TC climatology at the <inline-formula><mml:math id="M18" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.5</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> resolution has not been demonstrated. In this regard, our choice of MERRA-2 can be considered as a pre-learning step, which can be easily refined with ERA5 or any other climate datasets. For the purpose of implementing and evaluating TCG-Net, the MERRA-2 data is therefore sufficient.</p>
      <p id="d2e530">Along with the use of MERRA-2 dataset for training, the International Best Track Archive for Climate Stewardship (IBTrACS) <xref ref-type="bibr" rid="bib1.bibx22" id="paren.18"/> was also used to label all TCG events and locations, which contains global TC records. Specifically in the WNP basin, the Joint Typhoon Warning Center records in IBTrACS are selected, which contains data from 1 January 1890 through present day, sampled every <inline-formula><mml:math id="M19" display="inline"><mml:mn mathvariant="normal">3</mml:mn></mml:math></inline-formula> h and archived in a single CSV file. The fact that this IBTrACS dataset is structured with the same synoptic times as the MERRA-2 dataset is useful, because it allows us to pair these two datasets for supervised DL. Details of these datasets as well as their corresponding variables are summarized in Table <xref ref-type="table" rid="T1"/>.</p>

<table-wrap id="T1"><label>Table 1</label><caption><p id="d2e549">A list of variables and their corresponding ranges in the MERRA-2, and IBTrACS, raw datasets.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Variable</oasis:entry>
         <oasis:entry colname="col2">MERRA-2</oasis:entry>
         <oasis:entry colname="col3">IBTrACS</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">LAT MIN</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M20" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>90</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M21" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>90</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">LAT MAX</oasis:entry>
         <oasis:entry colname="col2">90</oasis:entry>
         <oasis:entry colname="col3">90</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">LAT STEP</oasis:entry>
         <oasis:entry colname="col2">0.5</oasis:entry>
         <oasis:entry colname="col3">0.01</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">LON MIN</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M22" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>180</oasis:entry>
         <oasis:entry colname="col3">0</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">LON MAX</oasis:entry>
         <oasis:entry colname="col2">180</oasis:entry>
         <oasis:entry colname="col3">360</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">LON STEP</oasis:entry>
         <oasis:entry colname="col2">0.625</oasis:entry>
         <oasis:entry colname="col3">0.01</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">TIME MIN</oasis:entry>
         <oasis:entry colname="col2">1 Jan 1980</oasis:entry>
         <oasis:entry colname="col3">1 Jan 1890</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">TIME MAX</oasis:entry>
         <oasis:entry colname="col2">31 Dec 2022</oasis:entry>
         <oasis:entry colname="col3">31 Dec 2022</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">TIME STEP</oasis:entry>
         <oasis:entry colname="col2">3 h</oasis:entry>
         <oasis:entry colname="col3">6 or 3 h</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">File format</oasis:entry>
         <oasis:entry colname="col2">NetCDF4</oasis:entry>
         <oasis:entry colname="col3">Csv</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">File split</oasis:entry>
         <oasis:entry colname="col2">By day</oasis:entry>
         <oasis:entry colname="col3">*No splitting</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">File size</oasis:entry>
         <oasis:entry colname="col2">2.2–2.3 GB</oasis:entry>
         <oasis:entry colname="col3">306 MB</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Data pre-processing</title>
      <p id="d2e753">Given that MERRA-2 and IBTrACS datasets have their different format, structure, and parameters, it is necessary to first synchronize these datasets before any DL training can be carried out. For this data pre-process step, we convert the longitude axis of MERRA-2 from <inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">180</mml:mn><mml:mo>:</mml:mo><mml:mo>+</mml:mo><mml:mn mathvariant="normal">180</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">360</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> to match with the coordinate definitions in IBTrACS, which is needed so that the <monospace>latitude</monospace> and <monospace>longitude</monospace> values of IBTrACS can be located properly on the MERRA-2 coordinate grids for data extraction. In our study, a domain of a size [35° S–70° N] <inline-formula><mml:math id="M25" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> [60–220°] in the Pacific Ocean is then extracted from the global MERRA-2 dataset. For each MERRA-2 file, a timestamp is output at an interval of 6 h to match with the best track data (note that the original MERRA-2 dataset consists of daily data files at an interval of 3 h). All details of these data outputs after pre-processing MERRA-2 and IBTrACS are provided in Table <xref ref-type="table" rid="T2"/>. The corresponding pre-process workflow is thoroughly described in our Zenodo repository <xref ref-type="bibr" rid="bib1.bibx26" id="paren.19"/>.</p>

<table-wrap id="T2"><label>Table 2</label><caption><p id="d2e814">Pre-processed data format and structures obtained from the MERRA-2 and IBTrACS datasets to train DL models.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Variables</oasis:entry>
         <oasis:entry colname="col2">MERRA-2</oasis:entry>
         <oasis:entry colname="col3">IBTrACS</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">LAT MIN</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M26" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>35</oasis:entry>
         <oasis:entry colname="col3">0</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">LAT MAX</oasis:entry>
         <oasis:entry colname="col2">70</oasis:entry>
         <oasis:entry colname="col3">60</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">LAT STEP</oasis:entry>
         <oasis:entry colname="col2">0.5</oasis:entry>
         <oasis:entry colname="col3">0.01</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">LON MIN</oasis:entry>
         <oasis:entry colname="col2">60</oasis:entry>
         <oasis:entry colname="col3">100</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">LON MAX</oasis:entry>
         <oasis:entry colname="col2">220</oasis:entry>
         <oasis:entry colname="col3">180</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">LON STEP</oasis:entry>
         <oasis:entry colname="col2">0.625</oasis:entry>
         <oasis:entry colname="col3">0.01</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">TIME MIN</oasis:entry>
         <oasis:entry colname="col2">1 Jan 1980</oasis:entry>
         <oasis:entry colname="col3">1 Jan 1890</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">TIME MAX</oasis:entry>
         <oasis:entry colname="col2">31 Dec 2022</oasis:entry>
         <oasis:entry colname="col3">31 Dec 2022</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">TIME STEP</oasis:entry>
         <oasis:entry colname="col2">6 h</oasis:entry>
         <oasis:entry colname="col3">6 h</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">File format</oasis:entry>
         <oasis:entry colname="col2">NetCDF4</oasis:entry>
         <oasis:entry colname="col3">CSV</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Sample size</oasis:entry>
         <oasis:entry colname="col2">By forecasts</oasis:entry>
         <oasis:entry colname="col3">1024 records</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">File size</oasis:entry>
         <oasis:entry colname="col2">97 MB</oasis:entry>
         <oasis:entry colname="col3">4.0 KB</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Supervised dataset design</title>
      <p id="d2e1006">With the pre-processed data described above, the next step is to generate a labeled dataset for supervised ML models. Specifically for the study of TCG, we need a binary dataset that indicates whether or not a TCG event occurred in the WNP basin needed for the reconstruction of TCG climatology. The process of creating such a binary dataset is critical, as it must include not only positive TCG cases but also a well-designed set of negative samples. Such a requirement of both well-designed positive and negative samples is required for ML models to effectively learn the distinct features between TCG and non-TCG conditions. On one hand, a strong contrast between positive and negative samples increases the likelihood that ML models will identify key patterns, thereby improving performance. Nevertheless, the selection of positive and negative samples must also align with practical applications and purposes in climate research. In fact, the criteria for labeling positive/negative TCG events vary depending on the specific type of TCG climatology as will be shown in the Sect. <xref ref-type="sec" rid="Ch1.S4"/>.  In this study, we follow NK2024 and define a positive TCG event as the first time a storm was recorded in the best track data. One could also define a TCG as the first moment that a tropical depression stage is recorded to make sure TCG characteristics are well-defined for DL training <xref ref-type="bibr" rid="bib1.bibx17" id="paren.20"/>. However, our choice of the first time that a TC was recorded in the best track herein has the benefit of training a DL model that can detect TCG earlier, and so we will use this definition to label a positive TCG event. Some additional steps to handle the uncertainties in this TCG timing will be further presented in our data enrichment in Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>.</p>
      <p id="d2e1018">With the above definition of <italic>positive TCG labels</italic>, we can now scan through all TC track histories and take the first recorded location of each storm to create a data domain corresponding to positive TCG events. Given the typical scale of TCs, this positive TCG domain is chosen to be a square box of size <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:mn mathvariant="normal">18</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">18</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> centered on the first recorded TCG location, which is equivalent to roughly <inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:mn mathvariant="normal">33</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">32</mml:mn></mml:mrow></mml:math></inline-formula> grid points with the MERRA-2, <inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.5</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> resolution. Finally, all relevant information related to a TCG event including its longitudes, latitudes, date, and time was stored in a csv database to facilitate our data sharing and input to the DL interface.</p>
      <p id="d2e1060">For the negative-labeled TCG data, the issue turns out to be more subtle as discussed in <xref ref-type="bibr" rid="bib1.bibx17" id="text.21"/>. Depending on the context and applications, one can in fact have several different ways to define a <italic>negative TCG event</italic>. From the practical perspective, we propose in this study two different sampling strategies for negative-TCG labels. The first strategy, referred to as the Past Domain (PD) strategy, uses temporal context to distinguish between positive and negative labels, i.e., predict when the TCG forms. Specifically, for each positive TCG event occurring at time <inline-formula><mml:math id="M30" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>, all samples within a past window from <inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M32" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> are labeled as positive, capturing the precursors leading up to the TCG time at <inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula>. All earlier samples from <inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> up to time <inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula> further back in time are labeled as negative. This approach aims to answer the question of why a TC forms at a specific time but not earlier. A key advantage of this strategy is that it preserves the geographical location between positive and negative samples while introducing temporal separation. However, depending on the chosen value of <inline-formula><mml:math id="M36" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>, there may be some overlap in favorable environmental conditions between the positive and negative samples.</p>
      <p id="d2e1145">The second approach for generating negative TCG data is referred to as the Dynamic Domain (DD) strategy. In this approach, for each positive TCG event at time <inline-formula><mml:math id="M37" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>, we consider the surrounding spatial regions as negative samples. Specifically, we define eight adjacent regions around a positive TCG location, including the north-west, north, north-east, east, south-east, south, south-west, and west directions, as negative-TCG labels (see Table <xref ref-type="table" rid="T3"/>). These eight surrounding domains can be also shifted back in time, similar to the PD strategy to further account for the uncertainties in the timing of TCG recorded in the best track. This type of domain selection for negative TCG data helps answer a question of why TCG occurs in one place but not in other places at the same day/time. One could randomly choose one of the eight negative domains to construct a more balanced binary dataset, as in NK2004, or choose all eight domains to increase the sample size. Unlike the PD approach, we note that this DD task will have a small chance of including a co-existing TC in the negative samples. This issue can be addressed by simply checking if there is any co-existing TC nearby within the negative TCG domain, and remove this domain or filter a TC out  <xref ref-type="bibr" rid="bib1.bibx32" id="paren.22"/>. In this study, we use a simple approach of discarding all negative TCG domains that have a co-existing TC to avoid complications with changing environmental conditions due to vortex removal processes. This affects <inline-formula><mml:math id="M38" display="inline"><mml:mo>&lt;</mml:mo></mml:math></inline-formula> 1 % of all data points as we have in the WNP basin, as most TCG in this basin rarely overlap within a domain of an <inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:mn mathvariant="normal">18</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">18</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> area.</p>

<table-wrap id="T3"><label>Table 3</label><caption><p id="d2e1187">Illustration of a TCG data labeling strategy based on the dynamical domain approach, for which a positive TCG label at one location is surrounded by 8 negative TCG labels for the sampling strategy. The bold text highlights the location with a positive TCG label, where the TCG appears.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="justify" colwidth="1.5cm"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="1.5cm"/>
     <oasis:colspec colnum="3" colname="col3" align="justify" colwidth="1.5cm"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"><monospace>NW</monospace> north-west</oasis:entry>
         <oasis:entry colname="col2"><monospace>N</monospace> north</oasis:entry>
         <oasis:entry colname="col3"><monospace>NE</monospace>north-east</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"><monospace>W</monospace>west</oasis:entry>
         <oasis:entry colname="col2"><monospace>P</monospace> <bold>positive TCG</bold></oasis:entry>
         <oasis:entry colname="col3"><monospace>E</monospace> east</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><monospace>SW</monospace>south-west</oasis:entry>
         <oasis:entry colname="col2"><monospace>S</monospace>south</oasis:entry>
         <oasis:entry colname="col3"><monospace>SE</monospace>south-east</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e1271">As part of constructing the binary TCG dataset described above, it is important to note that TCs can form at any time of day, whereas the IBTrACS, dataset provides records at fixed 6 h intervals. Consequently, the actual genesis location of a TC may differ slightly from the position recorded in the best-track data. This spatial discrepancy typically depends on storm motion and the algorithms used for detecting vortex centers at operational centers. In this study, we assume that the TCG location does not vary significantly within a 6 h window, which provides a reasonable basis for maintaining consistency in our binary TCG dataset design. This assumption also allows us to use the first recorded time in the IBTrACS, dataset to define positively labeled TCG samples, which by convention are assigned a label of 1. All negative samples are labeled as 0.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e1276"><bold>(a)</bold> The pipeline of TCG-Net for reconstructing TCG climatology from the MERRA-2 dataset, and <bold>(b)</bold> the core DL model based on the ResNet-18 architecture used for TCG reconstruction in this study.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026-f01.png"/>

        </fig>

</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>TCG-Net: Deep Learning Framework</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Model design</title>
      <p id="d2e1306">Recent advances in DL research have demonstrated substantial potential across a range of fields, including image recognition, natural language processing, autonomous driving, and weather prediction <xref ref-type="bibr" rid="bib1.bibx12 bib1.bibx20 bib1.bibx14 bib1.bibx30 bib1.bibx44 bib1.bibx2 bib1.bibx57" id="paren.23"/>. In the context of TCG applications, several DL approaches using satellite imagery data and TCG predictors have been proposed <xref ref-type="bibr" rid="bib1.bibx63 bib1.bibx35 bib1.bibx28 bib1.bibx62 bib1.bibx20" id="paren.24"/>. To make further use of climate data for a broader context, NK2024 employed a convolutional neural network (CNN)-based DL framework and showed promising results for TCG application despite limited training data. Their study underscores DL's image recognition capabilities, which enable the detection and analysis of relevant environmental patterns from climate meteorological fields for TCG prediction.</p>
      <p id="d2e1315">Given such promising performance of the CNN-based models as well as the features tailored specifically for TCG reported in NK2024, we propose TCG-Net, a deep learning architecture designed to predict tropical cyclogenesis by leveraging atmospheric reanalysis data from MERRA-2. The pipeline begins with data preprocessing, where inputs are divided into dynamic and past domains to capture both current atmospheric conditions and historical context. These data are then split into training, development, and testing sets to support reliable model evaluation. At its core, TCG-Net uses backbone models, e.g., CNN or RestNet, to extract meaningful features. Supporting methods, including attention or temporal fusion techniques, enhance the model’s ability to capture complex patterns associated with cyclone formation. The framework jointly predicts two key outputs namely: the location and timing of cyclone genesis. The entire workflow is shown in Fig. <xref ref-type="fig" rid="F1"/>a.</p>
      <p id="d2e1320">In this study, we use ResNet-18 as the primary backbone model. It consists of eight residual blocks, preceded by an initial convolutional layer for input embedding, and followed by a fully connected layer with a softmax activation to predict the probability of storm occurrence, forming a total of 18 layers (Fig. <xref ref-type="fig" rid="F1"/>b). This architecture differs slightly from that used in NK2024, as our study employs the MERRA-2 dataset, which lacks several environmental variables such as CAPE and tropopause-level features. Consequently, the ResNet-18 model required refinement and testing to achieve optimal performance, which differs somewhat from the architecture tailored for the NCEP Final reanalysis dataset in NK2024.</p>
      <p id="d2e1325">It is noted that our experiments with alternative DL architectures such as more convolutional layers, vision transformers, or pre-trained models all displayed minimal improvement over the performance of the adapted ResNet-18 used in this study. While this conclusion is obtained from the few models that we have tried, it is possible that the TCG problem has limited predictability or MERRA-2 dataset may contain limited information for TCG at 0.5° resolution that more sophisticated models or architectures could not help learn further. This is a known problem in DL training, which explains why a simple CNN model could reach as good a performance as other models <xref ref-type="bibr" rid="bib1.bibx4" id="paren.25"/>. Note also that complex models with more parameters generally have a very high “capacity” to learn. When the training data does not contain a variety of patterns, a high-capacity model can easily memorize the training data, including its noise and idiosyncrasies, instead of learning generalizable features. This leads to good performance on the training set but poor performance on unseen data, which is known as overfitting. Thus, the ResNet-18 model was adapted herein for our TCG reconstruction problem.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Evaluation metrics</title>
      <p id="d2e1339">To monitor and verify our DL models, we employ several metrics from traditional classification problems for training models and meteorological metrics specific for TCG climatology such as seasonal or spatial distributions for validation. For the DL training, the performance of our DL model loss fucntion is based on Precision (<inline-formula><mml:math id="M40" display="inline"><mml:mi>P</mml:mi></mml:math></inline-formula>), Recall (<inline-formula><mml:math id="M41" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula>), and <inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> scores. These scores are useful due to the inherent class imbalance for which TC occurrences (positive samples) are significantly fewer than non-TC cases (negative samples). By definition, <inline-formula><mml:math id="M43" display="inline"><mml:mi>P</mml:mi></mml:math></inline-formula> measures the proportion of correctly identified TC instances out of all instances predicted as TC:

            <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M44" display="block"><mml:mrow><mml:mtext>Precision</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mtext>TP</mml:mtext><mml:mrow><mml:mtext>TP</mml:mtext><mml:mo>+</mml:mo><mml:mtext>FP</mml:mtext></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          
          where TP (True Positives) represents correctly detected TCs, and FP (False Positives) denotes non-TC cases incorrectly classified as TC. A high precision ensures that the model minimizes false alarms, which is critical for reliable early warning systems.</p>
      <p id="d2e1400">In contrast, <inline-formula><mml:math id="M45" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula> quantifies the proportion of actual TC occurrences that the model successfully identifies:

            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M46" display="block"><mml:mrow><mml:mtext>Recall</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mtext>TP</mml:mtext><mml:mrow><mml:mtext>TP</mml:mtext><mml:mo>+</mml:mo><mml:mtext>FN</mml:mtext></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where FN (False Negatives) represents actual TCs that the model fails to detect. A high recall ensures that most TC events are captured, reducing the risk of missing critical storm formations.</p>
      <p id="d2e1433">To balance take into account both <inline-formula><mml:math id="M47" display="inline"><mml:mi>P</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M48" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula>, we use the <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score defined as:

            <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M50" display="block"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mtext>-score</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>×</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mtext>Precision</mml:mtext><mml:mo>×</mml:mo><mml:mtext>Recall</mml:mtext></mml:mrow><mml:mrow><mml:mtext>Precision</mml:mtext><mml:mo>+</mml:mo><mml:mtext>Recall</mml:mtext></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

          A high <inline-formula><mml:math id="M51" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score indicates a good trade-off between false alarms and missed detections, making it a crucial metric for assessing the reliability of TC detection models. Since missed TCs can lead to severe consequences, while excessive false alarms may reduce trust in predictions, optimizing both <inline-formula><mml:math id="M52" display="inline"><mml:mi>P</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M53" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula> is essential in operational forecasting.</p>
      <p id="d2e1524">With the above metrics, we could train a DL model to maximize the model <inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score performance for any period, which is set by default to be 10 % of data from 1980–2016 for validation in this study (see Table <xref ref-type="table" rid="T4"/>). Note that these category verification metrics are needed to train the ResNet-18 model. Whether the model can perform well for our TCG reconstruction depends further on its performance over other climate evaluations such as seasonal distribution or spatial distribution as presented in the Result section.</p>

<table-wrap id="T4"><label>Table 4</label><caption><p id="d2e1544">The statistic of training, validation, and test sets.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Dataset</oasis:entry>
         <oasis:entry colname="col2">Period</oasis:entry>
         <oasis:entry colname="col3">Number of</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">TCG events</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Training</oasis:entry>
         <oasis:entry colname="col2">1980–2016 (Random 90 %)</oasis:entry>
         <oasis:entry colname="col3">1117</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Validation</oasis:entry>
         <oasis:entry colname="col2">1980–2016 (Random 10 %)</oasis:entry>
         <oasis:entry colname="col3">124</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Test</oasis:entry>
         <oasis:entry colname="col2">2017–2023</oasis:entry>
         <oasis:entry colname="col3">188</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>TCG data imbalance</title>
      <p id="d2e1630">Given the uncertainty in the TCG timing in the best-track and the severe class imbalance of TCG data caused by the limited number of TCG events, it is essential to apply some additional techniques to mitigate this imbalance before training a DL model. One approach, as described in Sect. <xref ref-type="sec" rid="Ch1.S2.SS3"/>, involves temporal data enrichment using past windows, which is mostly suitable for the PD labeling strategy. This same past data window enrichment can also be extended to the DD labeling strategy by using the data from previous cycles to help increase the number of positive labels. While this data encrichment could help address the uncertainty in the TCG timing, the imbalance ratio for the DD labeling strategy is always <inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">8</mml:mn></mml:mrow></mml:math></inline-formula>. Thus, adding more past data would not help reduce the imbalance ratio.</p>
      <p id="d2e1647">To address this class imbalance issue, we also introduce a complementary method, known as the Random UnderSampling (RUS) approach. Specifically, the RUS method controls the ratio between negative and positive TCG samples in the training dataset. For example, a RUS value of <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula> maintains four negative samples for every positive one. Since this undersampling strategy may still leave a slight imbalance, we further apply class weighting in the loss function to emphasize the minority class during training. This technique increases the loss penalty for misclassifying rare (positive) cases, encouraging the model to better capture them. Two types of class weights are used in this study, which include (i) the balanced class weight (default) and (ii) the proportional class weight defined as <inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> RUS. Together with the temporal data enrichment, the RUS-based sampling and class weighting strategies can improve the performance of our DL models for TCG reconstruction, surpassing the simple hyperparameter optimization based on the <inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>-score criterion. </p>
</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>TCG detection</title>
      <p id="d2e1694">With the output from the ResNet-18 model, we can finally produce a TC probability map for climate reconstruction. Given two different strategies for data labeling based on the DD and PD methods, detecting a TCG event and assigning it to a point in space and time require some specific details.</p>
      <p id="d2e1697">For the PD strategy, it focuses on the temporal aspect of TCG prediction, which generates negative samples at the same location as positive samples. Thus, TCG detection for this approach is straightforward, as one can simply choose any fixed location at time <inline-formula><mml:math id="M59" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>, obtain data at that same location from previous times <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:math></inline-formula> to time <inline-formula><mml:math id="M61" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>, where <inline-formula><mml:math id="M62" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> represents previous time steps, and then generate TCG probability for that fixed domain. With this strategy, one can obtain TCG probability for any area of interest, so long as the domain contains sufficient TCG labels during an interval <inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mi>k</mml:mi><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> where TCG even exits for model training. In this regard, PD can provide TCG information for any location as expected.</p>
      <p id="d2e1751">With the DD strategy, we recall that one wants to extract the map information of TCG over the entire WNP basin at any given time <inline-formula><mml:math id="M64" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> such that the spatial variation of TCG can be examined. So, our approach for reconstructing TCG for this DD strategy is to divide the WNP basin into a grid of <inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula>, and apply the ResNet-18 model to a domain located at the center of each grid point separately. This way, we can detect TCG for the entire grid simultaneously at time <inline-formula><mml:math id="M66" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>. Assuming that a positive TCG event is found when TCG probability is larger than a certain threshold (e.g., 0.5), one can reconstruct a map of TCG locations at any given time <inline-formula><mml:math id="M67" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> as expected. Because of these different purposes in extracting TCG information for the PD and DD strategies, our evaluation and interpretation of the model performance for TCG climatology have to be therefore based on separate metrics and criteria, as will be presented in the next section.</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Results and Discussion</title>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>TCG-Net benchmarking</title>
      <p id="d2e1808">To first have a general picture of how the ResNet-18 model is optimized for our TCG prediction problem, Fig. <xref ref-type="fig" rid="F2"/> shows the precision, recall, and <inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score obtained from the test set for the PD and DD tasks. Note that for the PD strategy, we focus on a fixed domain over a part of the WNP basin whose TC activity has the most influence on Vietnam's coastal region ([0–30° N]–[100–150° W]). This domain choice is of course application-specific and so it should be adjusted for each region of interest. Given this arbitrary choice of area for the PD strategy, this section will report all results over the aforementioned domain.</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e1826">Overall performance of TCG-Net in terms of Precision (blue), Recall (red), <inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score (black),  precision–recall area under the curve (PR-AUC, green) and area-under-the curve ROC (AUC-ROC, yellow) for the TCG prediction on <bold>(a)</bold> Past Domain, and <bold>(b)</bold> Dynamic Domain.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026-f02.png"/>

        </fig>

      <p id="d2e1852">As can be seen in Fig. <xref ref-type="fig" rid="F2"/>, the temporal feature enrichment plays a significant role in the performance of the ResNet-18 model. For the PD task, a longer period of feature enrichment generally gives a better precision score without much reduction in the recall performance (0.57–0.62), thus allowing for a higher <inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score for longer enrichment windows. This is consistent with the precision–recall area under the curve (PR-AUC) score (see green columns), which shows better forecast skill for longer data enrichment windows, even for highly-imbalanced data. In fact, the area-under-curve ROC (AUC-ROC) score is of 0.76–0.77 for all enrichment windows, indicating that the model could capture TCG occurrence beyond a random chance as expected.</p>
      <p id="d2e1869">Physically, such behavior of the PD sampling strategy means that including more past information at a fixed location generally helps improve the accuracy of the model performance in capturing TCG in that area. This is because past information around the TCG moment recorded in the best track contain sufficient signals of TCG that help ResNet-18 learn better. Note however that the overall <inline-formula><mml:math id="M71" display="inline"><mml:mi>P</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M72" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula> scores are still relatively low even when including all past information up to 48 h. Thus, the overall <inline-formula><mml:math id="M73" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score for detecting TCG from MERRA-2 dataset is much less as compared to that directly from satellite image data <xref ref-type="bibr" rid="bib1.bibx9 bib1.bibx58" id="paren.26"><named-content content-type="pre">see, e.g.,</named-content></xref></p>
      <p id="d2e1901">As discussed in <xref ref-type="bibr" rid="bib1.bibx33" id="text.27"/>, the low performance of DL models in detecting TCG from climate reanalysis datasets could also reflect the fact that the TCG problem has limited predictability that including more information would not help improve its performance, consistent with the high false alarm rate in physical-based models. This can be seen also in our attempt to apply other methods such as random forecast, XGBoost, or climatological method based on the genesis potential index in Fig. <xref ref-type="fig" rid="F3"/>. All of these attempts indicate a similar low skill in capturing TCG, which shows an <inline-formula><mml:math id="M74" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>-score in the range of 0.3–0.32 even with the best TCG-Net model's performance.</p>
      <p id="d2e1920">Another potential reason for such a low <inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score is the MERRA-2 data itself, which may not cover all possible environmental patterns for rare extreme events like TCG. This highlights the difficulty in reconstructing TCG climatology from climate reanalysis data as compared to that from satellite images.</p>
      <p id="d2e1934">For the DD strategy, the dependence of the model performance on the temporal enrichment is opposite. Specifically, the <inline-formula><mml:math id="M76" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>-score is optimal for very short enrichment during a 6 to 12 h window, and it quickly decreases when the time window is longer (see Figs. <xref ref-type="fig" rid="F2"/>b and <xref ref-type="fig" rid="F4"/>). Practically, this means using past information as data enrichment to explain why a TC forms at one place but not at other places will not help if one includes more past information.</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e1954">Comparison of the model performance for the Past Domain task between TCG-Net and traditional classification models, i.e., XGboost and random forest, which are enhanced by including the climatological Genesis Potential Index (GPI). The thick dashed line denotes the performance of TCG-Net with ResNet-18 backbone. </p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026-f03.png"/>

        </fig>

      <p id="d2e1964">This result is also expected if one recalls that the main aim of the DD strategy is to search for where a TCG occurs at any given time. So, the model is trained to detect spatial TCG signals rather than from temporal information. Enriching the DD strategy by including more past information introduces more irrelevant environmental information from the past (i.e., negative labels) that causes more bias towards negative samples. This explains why the <inline-formula><mml:math id="M77" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula> score (blue columns in Fig. <xref ref-type="fig" rid="F2"/>b) decreases quickly with a longer enrichment window. As a result, the DD strategy exhibits a declining in both <inline-formula><mml:math id="M78" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and PR-AUC score, indicating that the model's overall reliability in detecting TCG decreases when too much past information is used, even when the model is still able to distinguish TCG from random variation (see the AUC-ROC curve in Fig. <xref ref-type="fig" rid="F2"/>b).</p>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e1991">Comparison of the model performance for the Dynamic Domain task between TCG-Net and traditional classification models, i.e., XGboost and random forest, which are enhanced by including the climatological Genesis Potential Index (GPI). The thick dashed line denotes the performance of TCG-Net with ResNet-18 backbone.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026-f04.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>TCG reconstructed climatology</title>
      <p id="d2e2008">With the optimized ResNet-18 model as presented above, our next attempt is to examine the seasonal distribution of TCG frequency. Here, the seasonal distribution is defined as the averaged ratio of the count of TCG events over the entire WNP basin detected in a given month to the total number of TCG events detected each year, which can be interpreted as monthly TCG frequency. To avoid the issue with an arbitrary choice of the domain location for the PD strategy, we will generate the monthly TCG frequency only for the DD strategy in this analysis.</p>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e2013">Monthly distribution of TCG frequency detected in the WNP basin from the test data (2017–2022), using the best-tuned ResNet-18 model for the DD strategy with data enrichment windows from 6 to 48 h. The black solid curve denotes the TCG frequency obtained from the best track during the same time period.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026-f05.png"/>

        </fig>

      <p id="d2e2022">As seen in Fig. <xref ref-type="fig" rid="F5"/>, the ResNet-18 model could reproduce well the overall seasonal distribution of TCG during a test period from 2017–2022, with a peak of TC activities in July–October, followed by an inactive period in January–April, similar to the observed TCG frequency. In addition, the ResNet-18 model appears to also reproduce the double peaks of TCG frequency in August and October consistent with the observed distribution, albeit the dip of TCG frequency in September is not as clear as that from the best track. During the peak period, note that the trainings with longer enrichment windows (i.e., 36–48 h) tend to generate more positive TCG events than those with shorter windows (6–24 h), suggesting that early signals of TC formation become more detectable as more past information is included.</p>
      <p id="d2e2028">Towards the end of the peak season (during November–December), note that TCG-Net tends to produce more TCG than the observation, while it underestimates the TCG frequency during May–June. These differences indicate a common fact in TC climate research that optimizing a DL model based on one specific metric such as <inline-formula><mml:math id="M79" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score, precision, or recall, would lead to biases in other metrics <xref ref-type="bibr" rid="bib1.bibx53" id="paren.28"><named-content content-type="pre">e.g.,</named-content></xref>. Another possible reason for this discrepancy could also be due to the limited ability of large-scale environments in reproducing the seasonal variability of TCG <xref ref-type="bibr" rid="bib1.bibx48 bib1.bibx29" id="paren.29"><named-content content-type="pre">e.g.,</named-content></xref>. Regardless of such differences due to different tuning metrics, the consistency of the seasonal TCG reconstruction by the ResNet-18 model across the enrichment windows suggests that ResNet-18 can understand the seasonal changes in large-scale environments for TCG as expected.</p>

      <fig id="F6" specific-use="star"><label>Figure 6</label><caption><p id="d2e2054"><bold>(a)</bold> The spatial distribution of the observed TCG density (shaded) during the 2017–2022 period as obtained from the best track; <bold>(b–i)</bold> 5-year average of TCG probability prediction that is obtained from the ResNet-18 model with different data enrichment windows from 6–48 h during the same test period as in <bold>(a)</bold>. Note that different shading scales are used for different data enrichment windows so that one can better see the contrast between the areas of maximum probability for TCG predicted by the ResNet-18 model.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026-f06.png"/>

        </fig>

      <p id="d2e2071">Along with the seasonal TCG frequency, another important climate metric to validate the ResNet-18 model is the spatial distribution of TCG climatology. In this regard, Fig. <xref ref-type="fig" rid="F6"/> shows the horizontal map of the TCG probability detected over the entire WNP basin, using the DD strategy with different enrichment windows from 6 to 48 h. Here, the shading in each <inline-formula><mml:math id="M80" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mn mathvariant="normal">5</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> box represents the averaged probability of positive TCG predictions from the ResNet-18 model over the test period (2017–2022). For the observed TCG, the shading denotes however the actual number of TCG events occurs in each box, divided by the total number of TCGs during the 2017–2022 period (commonly known as TCG density in literature). Because the predicted and observed TCG density are proportional to each other, they can serve as a metrics to evaluate our DL model's performance from a different angle.</p>
      <p id="d2e2093">Overall, ResNet-18 could again reproduce well the TCG density distribution in the WNP basin during the 2017–2022 period, with clear TCG hotspots in both the East Philippine Sea and the South China Sea (SCS), with the spatial correlation in the range of 0.75–0.83. For long enrichment windows between 18–42 h, note that the ResNet-18 model tends to extend the TCG region too far east of the East Philippine Sea as compared to the observation. Despite the deterioration of ResNet-18's performance for longer enrichment windows, the overall spatial distribution of TCG probability is still distinct and concentrated in the central SCS, the eastern Philippine Sea, and the Vietnam coastal region. For the shorter data enrichment window of 6–12 h, the model could also provide a reasonable fit as compared to the observed distribution (Fig. <xref ref-type="fig" rid="F6"/>b), although it is not as good as the longer windows or the seasonal distribution in Fig. <xref ref-type="fig" rid="F5"/>. The correlation between the DL-detected and best-track TCG density peaks around 24–36 h windows, reaching the highest correlation of 0.83 for these windows.</p>

      <fig id="F7" specific-use="star"><label>Figure 7</label><caption><p id="d2e2102"><bold>(a)</bold> The precision score <inline-formula><mml:math id="M81" display="inline"><mml:mi>P</mml:mi></mml:math></inline-formula> for the TCG prediction of TCG-Net using a range of the RUS ratio and class weight (solid colors) for the Past Domain task and the same sampling ratio for both the training and test sets; <bold>(b)</bold>–<bold>(e)</bold> similar to <bold>(a)</bold> but for the <inline-formula><mml:math id="M82" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M83" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, AUC and Precision-AUC scores, respectively. The dashed black line denotes the reference obtained from our best-tuned model. Note that <italic>weight balanced</italic> assigns fixed importance to each class based on frequency whilst <italic>weight dynamics</italic> adaptively adjusts sample or class importance.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026-f07.png"/>

        </fig>

      <p id="d2e2155">It is worth noting that while the predicted TCG probabilities consistently peak in the eastern Philippine Sea, some localized maxima in the central SCS and along the Vietnamese coastline lack consistency across data enrichment windows. This variability reflects the inherent challenge of detecting early TCG signals in the SCS, which are typically weak and highly variable. As a result, ResNet-18 struggles to capture these localized signals effectively when optimized based on the <inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score over the broader WNP basin. Unfortunately, the amount of available TCG data in the SCS region alone is insufficient for meaningful DL model training, making it difficult to resolve such inconsistencies given the current limitations of climate datasets. In addition, the test period from 2017–2022 may possess some unique characteristics that the model trained during the 1980–2016 period could not capture. Because of this, we could only reconstruct the TCG climatology using the DD strategy in this subsection, even though the PD strategy is more directly applicable for real-time forecasting purposes.</p>
      <p id="d2e2169">Aside from these local issues, the ability of our DL model in capturing the broad spatiotemporal patterns of TCG suggests that atmospheric signals associated with TCG become increasingly detectable when more relevant information is incorporated. From a practical standpoint, these results are significant because not only do they validate the performance of our DL model, but they also demonstrate that TCG can be predicted from large-scale environmental information, even at a spatial resolution of 0.5°. This result has two important consequences: (1) as long as climate models can reliably simulate the large-scale environment, it is possible to learn TCG patterns with DL models and derive TCG climatology without resorting to the more computationally expensive high-resolution dynamical downscaling, and (2) changes in TCG climatology can be captured through changes in large-scale environments, which are generally much more robust and reliable in climate projections than individual storm-scale features.</p>
</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Sensitivity analyses</title>
      <p id="d2e2180">Given that the TCG dataset is highly imbalanced due to the rarity of positive TCG labels, it is important to examine how ResNet-18 could be optimized under various training data scenarios. For this purpose, Fig. <xref ref-type="fig" rid="F7"/> shows ResNet-18's performance with different RUS ratios as described in Sect. 3.3. For the sake of clarity, we show here the absolute RUS ratios with a fixed number of positive TCG labels (minority) while changing the number of negative labels (majority) according to each ratio displayed in Fig. <xref ref-type="fig" rid="F7"/>. For each RUS ratio, a class weight value that adjusts the loss function is also provided, which controls the importance towards the positive TCG labels during training.</p>

      <fig id="F8" specific-use="star"><label>Figure 8</label><caption><p id="d2e2189"><bold>(a)</bold> The precision score <inline-formula><mml:math id="M85" display="inline"><mml:mi>P</mml:mi></mml:math></inline-formula> for the TCG prediction of TCG-Net using a range of the RUS ratio and class weight (solid colors) for the Dynamic Domain task and the same sampling ratio for both the training and test sets; <bold>(b)</bold>–<bold>(e)</bold> similar to <bold>(a)</bold> but for the <inline-formula><mml:math id="M86" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, AUC and Precision-AUC scores, respectively. The dashed black line denotes the reference obtained from our best-tuned model. Note that <italic>weight balanced</italic> assigns fixed importance to each class based on frequency whilst <italic>weight dynamics</italic> adaptively adjusts sample or class importance.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026-f08.png"/>

        </fig>

      <p id="d2e2241">Figures <xref ref-type="fig" rid="F7"/>–<xref ref-type="fig" rid="F8"/> display a range of the <inline-formula><mml:math id="M88" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M89" display="inline"><mml:mi>P</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M90" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> scores for different RUS ratios and class weights, using both the PD and DD sampling strategies. In general, a larger RUS ratio (i.e., more balance between positive and negative labels) tends to give higher <inline-formula><mml:math id="M91" display="inline"><mml:mi>P</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M92" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula> scores, thus resulting in a higher <inline-formula><mml:math id="M93" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score for the PD strategy. The optimal RUS ratio of <inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula> (one positive label corresponds to 4 negative labels) combined with a class weight of 0.5 provides the best performance for the PD strategy in terms of detecting TCG, which is chosen as a default value in Fig. <xref ref-type="fig" rid="F2"/>.</p>
      <p id="d2e2314">For the DD strategy, the model behavior is somewhat different because of the constrain that one positive TCG location is surrounded by 8 negative labels by design. Therefore, the data imbalance at any given time is always fixed for the training. When applying the data enrichment longer windows, the number of negative TCG labels increases rapidly because most of the days in the test period have no TCG. As a result, the imbalance becomes very small. For a reference, we show here only three RUS ratios of <inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">20</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">30</mml:mn></mml:mrow></mml:math></inline-formula> so one can compare their performance for the DD labeling strategy.</p>
      <p id="d2e2353">As seen in Fig. <xref ref-type="fig" rid="F8"/>, the DD strategy performs best when the RUS ratio is <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">30</mml:mn></mml:mrow></mml:math></inline-formula>, with a weight class of 0.2. Too small or too large RUS ratios both degrade the model performance. Consistent with the control performance shown in Fig. <xref ref-type="fig" rid="F2"/>, all RUS ratio and class weight experiments also show best performance when the data enrichment window is around 0–12 h in terms of these scores, beyond which the performance of the DD strategy starts decaying rapidly. Such behavior is likely because the inclusion of larger negative samples from surrounding environment at farther back in time cannot help the model distinguish the positive labels, leading to a much lower <inline-formula><mml:math id="M99" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula> score. In contrast, too large RUS ratio means less over positive TCG data for training. Thus, the overall <inline-formula><mml:math id="M100" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score is higher for a RUS ratio of <inline-formula><mml:math id="M101" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">30</mml:mn></mml:mrow></mml:math></inline-formula> and a shorter time window as seen in Fig. <xref ref-type="fig" rid="F8"/>.</p>
      <p id="d2e2405">In addition to the model optimizations based on the RUS ratio, feature enrichment time windows, and class weights as presented above, there are many other factors related to model architecture or hyperparameter settings that must also be considered to achieve optimal performance. Many of these aspects are excessively granular and so we cannot discuss every single one of them here. However, it is of interest to note that no single DL model is universally optimal across all weather features and spatial-temporal scales, particularly given the current limitations of available climate datasets and DL architectures.</p>
      <p id="d2e2409">Along with the above hyperparameter sensitivity, we also experimented with a range of model architectures, from a relatively simple CNN to more complex DL frameworks. However, their performance was broadly comparable in terms of <inline-formula><mml:math id="M102" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score, precision, and recall (results not shown). Therefore, in this subsection, we keep the ResNet-18 architecture fixed and focus on sensitivity experiments involving different hyperparameter configurations or sampling strategies, rather than presenting results for different DL architectures.</p>
</sec>
<sec id="Ch1.S4.SS4">
  <label>4.4</label><title>Feature Importance Analysis</title>
      <p id="d2e2431">One last critical question in developing a DL model when the training data does not contain sufficient information is how to make full use of the data to optimize the model's performance. This process, known as feature engineering, becomes more significant when one tries to understand why we get what we see from DL model's output. Instead of running a DL model as a black box with all possible input data channels, exploring the importance of different input data can help better understand the role of different physical information in the model prediction that we wish to examine from a physical standpoint.</p>
      <p id="d2e2434">This subsection provides several additional analyses that employ a different set of input channels to see how effectively ResNet-18 could perform with limited information. Specifically, we examine two analyses that (1) use a set of features known to be of importance for TCG from previous studies, and (2) apply an automatic feature filter based on the rank of input channels. While this is often considered to be a part of model tuning, we treat them separately in this subsection as choosing the right input channels will have significant implications in our further understanding of TCG processes.</p>

<table-wrap id="T5" specific-use="star"><label>Table 5</label><caption><p id="d2e2440">List of features selected using the feature engineering and feature ranking filter approach as obtained for each labeling strategy during the training period.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="justify" colwidth="3cm"/>
     <oasis:colspec colnum="4" colname="col4" align="justify" colwidth="3cm" colsep="1"/>
     <oasis:colspec colnum="5" colname="col5" align="justify" colwidth="3cm"/>
     <oasis:colspec colnum="6" colname="col6" align="justify" colwidth="3cm"/>
     <oasis:thead>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="1">ID</oasis:entry>

         <oasis:entry rowsep="1" colname="col2" morerows="1">Name of Features</oasis:entry>

         <oasis:entry rowsep="1" namest="col3" nameend="col4" align="center" colsep="1">Past Domain </oasis:entry>

         <oasis:entry rowsep="1" namest="col5" nameend="col6" align="center">Dynamic Domain </oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col3">Feature Engineering</oasis:entry>

         <oasis:entry colname="col4">Feature Ranking</oasis:entry>

         <oasis:entry colname="col5">Feature Engineering</oasis:entry>

         <oasis:entry colname="col6">Feature Ranking</oasis:entry>

       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1">1</oasis:entry>

         <oasis:entry colname="col2">QL</oasis:entry>

         <oasis:entry colname="col3"/>

         <oasis:entry colname="col4">400, 700, 825, 900, 950</oasis:entry>

         <oasis:entry colname="col5"/>

         <oasis:entry colname="col6">100, 1000, 150, 200, 300, 400, 500, 600, 700, 800, 875, 900, 950, 975</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1">2</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M103" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">500</oasis:entry>

         <oasis:entry colname="col4">200, 925</oasis:entry>

         <oasis:entry colname="col5">500</oasis:entry>

         <oasis:entry colname="col6">100, 550, 950</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1">3</oasis:entry>

         <oasis:entry colname="col2">QI</oasis:entry>

         <oasis:entry colname="col3"/>

         <oasis:entry colname="col4">250, 450, 600, 800, 900, 925, 950, 1000</oasis:entry>

         <oasis:entry colname="col5"/>

         <oasis:entry colname="col6">100, 1000, 150, 500, 600, 700, 900</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1">4</oasis:entry>

         <oasis:entry colname="col2">OMEGA</oasis:entry>

         <oasis:entry colname="col3">500</oasis:entry>

         <oasis:entry colname="col4">450, 875</oasis:entry>

         <oasis:entry colname="col5">500</oasis:entry>

         <oasis:entry colname="col6">100, 150, 250, 600, 925, 1000</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1">5</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M104" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">500, 900</oasis:entry>

         <oasis:entry colname="col4">725</oasis:entry>

         <oasis:entry colname="col5">500, 900</oasis:entry>

         <oasis:entry colname="col6">150, 200, 900</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1">6</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M105" display="inline"><mml:mi>U</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">200, 800</oasis:entry>

         <oasis:entry colname="col4">825, 1000</oasis:entry>

         <oasis:entry colname="col5">200, 800</oasis:entry>

         <oasis:entry colname="col6">1000, 550, 200</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1">7</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M106" display="inline"><mml:mi>V</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">200, 800</oasis:entry>

         <oasis:entry colname="col4">150, 550</oasis:entry>

         <oasis:entry colname="col5">200, 800</oasis:entry>

         <oasis:entry colname="col6">1000, 600, 400 150, 100</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1">8</oasis:entry>

         <oasis:entry colname="col2">RH</oasis:entry>

         <oasis:entry colname="col3">750</oasis:entry>

         <oasis:entry colname="col4">950</oasis:entry>

         <oasis:entry colname="col5">750</oasis:entry>

         <oasis:entry colname="col6">100, 200, 400, 700, 825, 875, 925, 1000</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1">9</oasis:entry>

         <oasis:entry colname="col2">QV</oasis:entry>

         <oasis:entry colname="col3"/>

         <oasis:entry colname="col4"/>

         <oasis:entry colname="col5"/>

         <oasis:entry colname="col6">100, 150, 900</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1">10</oasis:entry>

         <oasis:entry colname="col2">VOR</oasis:entry>

         <oasis:entry colname="col3">200, 700, 900</oasis:entry>

         <oasis:entry colname="col4"/>

         <oasis:entry colname="col5">200, 700, 900</oasis:entry>

         <oasis:entry colname="col6"/>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col1">11</oasis:entry>

         <oasis:entry colname="col2">DIV</oasis:entry>

         <oasis:entry colname="col3">200</oasis:entry>

         <oasis:entry colname="col4"/>

         <oasis:entry colname="col5">200</oasis:entry>

         <oasis:entry colname="col6"/>

       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e2750">For the first approach (hereafter referred to as feature engineering), input channels are based on their well-documented importance from previous observational and modeling studies <xref ref-type="bibr" rid="bib1.bibx15 bib1.bibx38 bib1.bibx59 bib1.bibx61 bib1.bibx3 bib1.bibx39 bib1.bibx42 bib1.bibx31 bib1.bibx34 bib1.bibx33 bib1.bibx18" id="paren.30"><named-content content-type="pre">see, e.g.,</named-content></xref>. These specific channels are useful for DL model development, because not all meteorological data contains independent information about TCG. Thus, using a subset of atmospheric variables that capture the strongest TCG signals will help DL models to learn better TCG processes. Following <xref ref-type="bibr" rid="bib1.bibx33" id="text.31"/>, this feature engineering selects a group of variables on several low, middle, and high pressure levels as shown in Table <xref ref-type="table" rid="T5"/>.</p>
      <p id="d2e2763">For the feature filtering based on their importance ranking (hereafter feature ranking), data channels are automatically selected by their contribution to the prediction score instead of depending on specialized knowledge as for the feature engineering approach. This can be done by ranking all input features in terms of their mean activation values of the first filter from our best-tuned ResNet-18 model. Selection is then proceeded iteratively as follows. First, the channel with the highest score in the scoreboard is chosen and removed from the pool. Next, we eliminate any remaining layers that are highly correlated with the chosen channel, based on the Pearson <inline-formula><mml:math id="M107" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula> correlation threshold. Finally, the progress is continued until all remaining features are either selected from their score or discarded, depending on the threshold that is used to stop the selection.</p>

      <fig id="F9"><label>Figure 9</label><caption><p id="d2e2775">Feature-importance weights of the top-10 % features in the TCG-Net model for the Past Domain task.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026-f09.png"/>

        </fig>

      <fig id="F10"><label>Figure 10</label><caption><p id="d2e2786">Feature-importance weights of the top-10 % features in the TCG-Net model for the Dynamic Domain task.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026-f10.png"/>

        </fig>

      <p id="d2e2795">Table <xref ref-type="table" rid="T5"/> compares the channels obtained from the two feature selection methods, using the MERRA-2 data during the training with the PD and DD strategies as shown in Figs. <xref ref-type="fig" rid="F9"/>–<xref ref-type="fig" rid="F10"/>. It is of interest to note that the selected features between these two methods share a great number of overlaps, indicating that previous findings on TCG factors such as vertical wind shear (zonal wind components at 800 and 200 hPa level), low-level moisture (900–700 hPa relative humidity, RH), mid-level vertical motion (OMEGA at 500 hPa), or low-to-mid level temperature all play an important role in TCG prediction.</p>
      <p id="d2e2805">In addition to these common features, we notice that feature ranking appears to capture more pressure levels than feature engineering. For example, NK2024's feature engineering uses RH at 750 hPa, while feature ranking captures several levels for the DD strategy (see last column in Table <xref ref-type="table" rid="T5"/>). Likewise, zonal wind components in feature engineering require only 200 and 800 hPa, but feature ranking captures a group of levels at 1000, 550, 200 hPa. Despite this difference in the pressure levels, the fact that both feature engineering and feature ranking share many common features could indicate that ResNet-18 is capable of learning large-scale environments correctly for the TCG problem. From this regard, our DL model not only justifies the use of the feature engineering method in previous studies, but also presents a way to help enhance our understanding of the key environments governing TCG when the ranking is expanded to include more environmental factors.</p>
      <p id="d2e2810">Among all selected features, we should note that there are several features from the feature ranking method that are not included in feature engineering such as the specific liquid (QL) and ice content (QI), which represent the cloud information. In their early study, NK2024 did not include these features as they are linked to vertical motion and relative humidity via the CAPE channel. Similarly, the specific humidity (QV) is just an equivalent representation of RH and so it is not included in feature engineering. These variables, however, emerge from the output of feature ranking, probably because of their stronger roles in capturing TCG during the model training, even though they do not provide any new physical implications. Note also that the original MERRA-2 data does not contain some fields such as vorticity or divergence used in NK2024. Thus, several features in NK2024's study cannot be obtained from the ranking feature method shown in Table <xref ref-type="table" rid="T5"/>.</p>

      <fig id="F11" specific-use="star"><label>Figure 11</label><caption><p id="d2e2817">Evaluation of the model performance for several different feature selection methods including all features (solid color columns), 13 selected features based on feature engineering in <xref ref-type="bibr" rid="bib1.bibx33" id="text.32"/> (striped columns), and feature ranking of top 10 % (dotted columns) using the Past Domain task.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026-f11.png"/>

        </fig>

      <fig id="F12" specific-use="star"><label>Figure 12</label><caption><p id="d2e2831">Evaluation of the model performance for several different feature selection methods including all features (solid color columns), 13 selected features based on feature engineering in <xref ref-type="bibr" rid="bib1.bibx33" id="text.33"/> (striped columns), and feature ranking of top 10% (dotted columns) using the Dynamic Domain task.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026-f12.png"/>

        </fig>

      <p id="d2e2843">For model performance, Figs. <xref ref-type="fig" rid="F11"/>–<xref ref-type="fig" rid="F12"/> compare the <inline-formula><mml:math id="M108" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M109" display="inline"><mml:mi>P</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M110" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula> scores among the full-feature, feature engineering, and feature ranking methods for both PD and DD labeling strategies. Among these feature selection methods, the feature ranking appears to deliver the most stable and effective results for the PD labeling strategy, particularly for long enrichment windows. The feature engineering and full-feature methods also perform well, but with slightly lower overall performance than feature engineering.</p>
      <p id="d2e2876">In contrast, the DD labeling strategy shows a declining trend in performance of the feature engineering method, with <inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:mi>P</mml:mi><mml:mo>,</mml:mo><mml:mi>R</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M112" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score all decreased over longer time windows. However, feature engineering still maintains relatively higher and more stable <inline-formula><mml:math id="M113" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula> values, indicating its potential utility in capturing more relevant signals in predicting TCG spatial distribution without the need of all data channels.</p>
      <p id="d2e2909">Regardless of the labeling strategy, one can see that both feature engineering and feature ranking provide similar performance for a range of data enrichment windows, despite much fewer features than the full-feature method. For a long window of 36–48 h, feature engineering and feature ranking could deliver even better performance for the PD approach, thus confirming that detecting TCG would mostly rely on a subset of variables/channels instead of full data at all levels. In fact, examining the spatial distributions of TCG climatology from both feature engineering and feature ranking (Figs. <xref ref-type="fig" rid="F13"/>–<xref ref-type="fig" rid="F14"/>) shows little change in the overall patterns as compared to the full-feature output (cf. Fig. <xref ref-type="fig" rid="F6"/>). The consistency among all feature selection methods is also captured for the seasonal TCG density (not shown), thus giving us some foundation for further understanding and predicting TCG based on a set of selected features for future DL improvement or implementations.</p>

      <fig id="F13" specific-use="star"><label>Figure 13</label><caption><p id="d2e2921">Similar to Fig. <xref ref-type="fig" rid="F6"/> but for the feature engineering approach.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026-f13.png"/>

        </fig>

      <fig id="F14" specific-use="star"><label>Figure 14</label><caption><p id="d2e2934">Similar to Fig. <xref ref-type="fig" rid="F6"/> but for the automatic feature ranking approach.</p></caption>
          <graphic xlink:href="https://gmd.copernicus.org/articles/19/4009/2026/gmd-19-4009-2026-f14.png"/>

        </fig>


</sec>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <label>5</label><title>Conclusions</title>
      <p id="d2e2957">In this study, we presented a deep learning (DL) framework to reconstruct the climatology of tropical cyclone genesis (TCG) from climate reanalysis datasets. Recognizing that the definition of TCG climatology may vary depending on specific purposes and practical needs, our DL approach was designed and evaluated from multiple perspectives, based on different TCG labeling strategies for model training. Due to the limited number of TCG events available for training, we also implemented different data enrichment and feature selection methods to optimize our DL model for both TCG climatology reconstruction and potential prediction tasks.</p>
      <p id="d2e2960">Using the MERRA-2 reanalysis for the training data and the ResNet-18 architecture as a backbone for our DL model, we demonstrated that ResNet-18 exhibits promising capability in detecting TCG from climate data at 0.5° resolution. Although the <inline-formula><mml:math id="M114" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score for TCG prediction remains relatively low, which is partly due to the inherent low predictability of TCG, the limited TCG samples, and related information available in MERRA-2, we showed that the <inline-formula><mml:math id="M115" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> score can be improved through appropriate hyperparameter tuning, labeling strategies, class weighting, and feature selections.</p>
      <p id="d2e2985">Comparing the DL-reconstructed seasonality and spatial distribution of TCG to the best track during the test period showed several noteworthy results. First, our ResNet-18 design could reproduce the seasonality of TCG monthly frequency, with double peaks of TCG frequency in August and October as well as the inactive period from January to May, consistent with those obtained from observations. Second, ResNet-18 could recover also the spatial distribution of TCG climatology, with main areas in the Eastern Philippine Sea and SCS. While there are some fluctuations in the TCG distribution for the areas along the coastal regions or different test periods <xref ref-type="bibr" rid="bib1.bibx19" id="text.34"/>, the overall well-recovered map of TCG in the WNP basin indicates that large-scale environments from the MERRA-2 dataset contain some important hidden signals of TCG processes that DL models can be trained and learn.</p>
      <p id="d2e2991">Further sensitivity experiments with different feature selection methods revealed that reconstructing TCG climatology from reanalysis datasets is possible due to the existence of some key channels that contain the required TCG signals for DL models to learn. Specifically, our use of feature engineering based on a set of features reported in previous studies and feature ranking that filters input features based on the model impacts both capture some common data channels needed for TCG processes. Several of these key features obtained from the MERRA-2 dataset are robust among sensitivity experiments, which include vertical wind shear, low-to-mid level moisture, mid-level vertical motion, or mid-level geopotential height. The feature ranking method could detect some additional features such as high-level liquid or ice content that may contain some cloud signal information beyond what used in previous models.</p>
      <p id="d2e2995">The results from these feature-sensitivity experiments support that combining expert knowledge with automated feature engineering can help enhance feature representation when training data is not sufficient, or when using full feature for DL model development is too costly. Moreover, while feature engineering helps avoid overfitting and promotes interpretability, the feature ranking approach can help uncover hidden signals beyond what known from previous studies. For our TCG problem here, at least both the feature engineering and feature ranking approaches could share similar factors, which help build more confidence in using those input features to understand the variability of TCG climatology. Future improvements could involve integrating approaches using ranking mechanisms, attention-based models, or dimensionality reduction to retain a compact yet informative feature set for TCG prediction tasks.</p>
      <p id="d2e2999">Along with the focus on reconstructing TCG climatology, this study suggests that our approach could hold some potential for real-time TCG prediction in operational settings. Depending on a specific forecasting objective, such as predicting TCG at a fixed location (as in the PD labeling strategy) or deriving the spatial distribution of TCG at any given time (as in the DD labeling strategy), a DL model can be designed and trained for each task separately. While the relatively low <inline-formula><mml:math id="M116" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> scores observed for both strategies indicate that real-time forecasts of TCG from climate or global model output is still considerably uncertain, our DL approach offers a valuable, independent alternative to traditional physical-based models, with the capability of providing early warnings of TCG 1–3 d in advance. In this context, our work contributes to a deeper understanding of TCG processes and offers another practical guidance for improving DL models in real-time applications, particularly for extreme events where minority classes are the main focus.</p>
      <p id="d2e3013">Despite promising capabilities, this study reveals also several key challenges in applying DL models to TCG research. First, DL performance is highly sensitive to data preprocessing methods, particularly in labeling negative TCG events, an issue that is exacerbated by the limited number of TCG occurrences. Second, the pronounced class imbalance between positive and negative TCG labels during training remains a significant barrier. Addressing this challenge requires a careful integration of undersampling techniques, data augmentation strategies, or dynamic class weighting to ensure more robust and consistent performance across evaluation metrics. Last, any model design and performance are strongly dependent on the training dataset, which make it hard to generalize from one resolution to the others. These challenges highlight the complexity of DL applications to extreme weather events that one needs to fully account for in any DL model development.</p>
</sec>

      
      </body>
    <back><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d2e3020">The source code  of the TCG-Net framework and the pipeline used for reconstructing TCG climatology in this study are publicly available in Zenodo repository under DOI: <ext-link xlink:href="https://doi.org/10.5281/zenodo.17459622" ext-link-type="DOI">10.5281/zenodo.17459622</ext-link> <xref ref-type="bibr" rid="bib1.bibx26" id="paren.35"/> and CC BY 4.0 licence. The repository includes all scripts for data preprocessing, model training, evaluation metrics, sensitivity analysis, and visualization. Our code structure is designed in a way to support not only the reproducibility of results herein but also further experimentation with different climate datasets or TCG-related tasks. For reproducibility, this research explores two publicly available datasets including: the NASA Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2, <uri>https://disc.gsfc.nasa.gov/datasets?project=MERRA-2</uri>, last access: 27 October 2025) and the International Best Track Archive for Climate Stewardship (IBTrACS, <uri>https://www.ncei.noaa.gov/products/international-best-track-archive</uri>, last access: 28 May 2024). Owing to the large size of the MERRA-2 dataset (approximately 20 TB), only a reference link to the raw data is provided. The complete workflow from data collection, pre-processing, to model training and testing, is described in detail in the accompanying Zenodo repository <xref ref-type="bibr" rid="bib1.bibx26" id="paren.36"/>, which also contains the IBTrACS dataset as a compressed file.</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e3041">DTL: Formal analysis, Conceptualization, Methodology, Validation, Supervision, Writing – review &amp;e ditting. DTB: Investigation, Software, Validation, Writing – original draft. HGAD: Methodology, Software, Writing – original draft. DHN: Software, Validation. MHT: Software, Validation. XTN: Software, Validation. QTL: Software, Validation, Writing – review &amp; editting. QLL: Software, Validation. THN: Validation, Supervision. TTNN: Formal analysis, Conceptualization, Funding acquisition, Methodology, Supervision. CK: Conceptualization, Methodology, Validation, Supervision, Writing – review &amp; editting.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e3047">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e3053">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e3059">We thank the editor and two anonymous reviewers for their very constructive comments and suggestions, which have helped improve our work significantly.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e3064">This research was funded by Vingroup Innovation Foundation (VinIF) under project code VINIF.2023.DA019.</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e3070">This paper was edited by Tao Zhang and reviewed by three anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Bengtsson et al.(2007)Bengtsson, Hodges, Esch, Keenlyside, Kornblueh, Luo, and Yamagata</label><mixed-citation>Bengtsson, L., Hodges, K. I., Esch, M., Keenlyside, N., Kornblueh, L., Luo, J.-J., and Yamagata, T.: How may tropical cyclones change in a warmer climate?, Tellus A, 59A, 539–561, <ext-link xlink:href="https://doi.org/10.1111/j.1600-0870.2007.00251.x" ext-link-type="DOI">10.1111/j.1600-0870.2007.00251.x</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Bi et al.(2023)Bi, L, H, X, X, and Q.</label><mixed-citation>Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., and Tian, Q.: Accurate medium-range global weather forecasting with 3D neural networks, Nature, 619, 533–538, <ext-link xlink:href="https://doi.org/10.1038/s41586-023-06185-3" ext-link-type="DOI">10.1038/s41586-023-06185-3</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Bister and Emanuel(1997)</label><mixed-citation>Bister, M. and Emanuel, K. A.: The Genesis of Hurricane Guillermo: TEXMEX Analyses and a Modeling Study, Mon. Weather Rev., 125, 2662–2682, <ext-link xlink:href="https://doi.org/10.1175/1520-0493(1997)125&lt;2662:TGOHGT&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0493(1997)125&lt;2662:TGOHGT&gt;2.0.CO;2</ext-link>, 1997. </mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Brigato and Iocchi(2020)</label><mixed-citation>Brigato, L. and Iocchi, L.: A Close Look at Deep Learning with Small Data, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.2003.12843" ext-link-type="DOI">10.48550/arXiv.2003.12843</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Camargo and Zebiak(2002)</label><mixed-citation>Camargo, S. J. and Zebiak, S. E.: Improving the Detection and Tracking of Tropical Cyclones in Atmospheric General Circulation Models, Weather Forecast., 17, 1152–1162, <ext-link xlink:href="https://doi.org/10.1175/1520-0434(2002)017&lt;1152:ITDATO&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0434(2002)017&lt;1152:ITDATO&gt;2.0.CO;2</ext-link>, 2002.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Camargo et al.(2023)Camargo, Murakami, Bloemendaal, Chand, Deshpande, Dominguez-Sarmiento, González-Alemán, Knutson, Lin, Moon, Patricola, Reed, Roberts, Scoccimarro, Tam, Wallace, Wu, Yamada, Zhang, and Zhao</label><mixed-citation>Camargo, S. J., Murakami, H., Bloemendaal, N., Chand, S. S., Deshpande, M. S., Dominguez-Sarmiento, C., González-Alemán, J. J., Knutson, T. R., Lin, I.-I., Moon, I.-J., Patricola, C. M., Reed, K. A., Roberts, M. J., Scoccimarro, E., Tam, C. Y. F., Wallace, E. J., Wu, L., Yamada, Y., Zhang, W., and Zhao, H.: An update on the influence of natural climate variability and anthropogenic climate change on tropical cyclones, Tropical Cyclone Research and Review, 12, 216–239, <ext-link xlink:href="https://doi.org/10.1016/j.tcrr.2023.10.001" ext-link-type="DOI">10.1016/j.tcrr.2023.10.001</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Cha et al.(2020)Cha, Knutson, Lee, Ying, and Nakaegawa</label><mixed-citation>Cha, E. J., Knutson, T. R., Lee, T.-C., Ying, M., and Nakaegawa, T.: Third assessment on impacts of climate change on tropical cyclones in the Typhoon Committee Region – Part II: Future projections, Tropical Cyclone Research and Review, 9, 75–86, <ext-link xlink:href="https://doi.org/10.1016/j.tcrr.2020.04.005" ext-link-type="DOI">10.1016/j.tcrr.2020.04.005</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Chen and Yuan(2024)</label><mixed-citation>Chen, A. and Yuan, C.: Deep learning-based spatial downscaling and its application for tropical cyclone detection in the western North Pacific, Front. Earth Sci.,   12, 2024, <ext-link xlink:href="https://doi.org/10.3389/feart.2024.1345714" ext-link-type="DOI">10.3389/feart.2024.1345714</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Chen et al.(2020)Chen, Zhang, and Wang</label><mixed-citation>Chen, R., Zhang, W., and Wang, X.: Machine Learning in Tropical Cyclone Forecast Modeling: A Review, Atmosphere, 11, <ext-link xlink:href="https://doi.org/10.3390/atmos11070676" ext-link-type="DOI">10.3390/atmos11070676</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Defforge and Merlis(2017)</label><mixed-citation>Defforge, C. L. and Merlis, T. M.: Observed warming trend in sea surface temperature at tropical cyclone genesis, Geophys. Res. Lett., 44, 1034–1040, <ext-link xlink:href="https://doi.org/10.1002/2016GL071045" ext-link-type="DOI">10.1002/2016GL071045</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Gao et al.(2018a)Gao, Zhao, Pan, Li, Zhou, Xu, Zhong, and Shi</label><mixed-citation> Gao, S., Zhao, P., Pan, B., Li, Y., Zhou, M., Xu, J., Zhong, S., and Shi, Z.: A nowcasting model for the prediction of typhoon tracks based on a long short term memory neural network, Acta Oceanol. Sin., 37, 8–12, 2018a.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Gao et al.(2018b)Gao, Zhao, Pan, Li, Zhou, Xu, Zhong, and Shi</label><mixed-citation> Gao, S., Zhao, P., Pan, B., Li, Y., Zhou, M., Xu, J., Zhong, S., and Shi, Z.: A nowcasting model for the prediction of typhoon tracks based on a long short term memory neural network, Acta Oceanol. Sin., 37, 8–12, 2018b.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Gelaro et al.(2017)</label><mixed-citation>Gelaro, R., McCarty, W., Suárez, M. J., Todling, R., Molod, A., Takacs, L., Randles, C. A., Darmenov, A., Bosilovich, M. G., Reichle, R., Wargan, K., Coy, L., Cullather, R., Draper, C., Akella, S., Buchard, V., Conaty, A., da Silva, A. M., Gu, W., Kim, G.-K., Koster, R., Lucchesi, R., Merkova, D., Nielsen, J. E., Partyka, G., Pawson, S., Putman, W., Rienecker, M., Schubert, S. D., Sienkiewicz, M., and Zhao, B.: The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2), J. Climate, 30, 5419–5454, <ext-link xlink:href="https://doi.org/10.1175/JCLI-D-16-0758.1" ext-link-type="DOI">10.1175/JCLI-D-16-0758.1</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Giffard-Roisin et al.(2020)</label><mixed-citation>Giffard-Roisin, S., Yang, M., Charpiat, G., Kumler Bonfanti, C., Kégl, B., and Monteleoni, C.: Tropical cyclone track forecasting using fused deep learning from aligned reanalysis data, Front. Big Data, 3, 1–13, <ext-link xlink:href="https://doi.org/10.3389/fdata.2020.00001" ext-link-type="DOI">10.3389/fdata.2020.00001</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Gray(1968)</label><mixed-citation>Gray, W. M.: Global View of The Origin of Tropical Disturbances, Mon. Weather Rev., 96, 669–700, <ext-link xlink:href="https://doi.org/10.1175/1520-0493(1968)096&lt;0669:GVOTOO&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0493(1968)096&lt;0669:GVOTOO&gt;2.0.CO;2</ext-link>, 1968.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Horn et al.(2014)</label><mixed-citation>Horn, M., Walsh, K., Zhao, M., Camargo, S. J., Scoccimarro, E., Murakami, H., Wang, H., Ballinger, A., Kumar, A., Shaevitz, D. A., Jonas, J. A., and Oouchi, K.: Tracking Scheme Dependence of Simulated Tropical Cyclone Response to Idealized Climate Simulations, J. Climate, 27, 9197–9213, <ext-link xlink:href="https://doi.org/10.1175/JCLI-D-14-00200.1" ext-link-type="DOI">10.1175/JCLI-D-14-00200.1</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Kieu and Nguyen(2024)</label><mixed-citation>Kieu, C. and Nguyen, Q.: Binary dataset for machine learning applications to tropical cyclone formation prediction, Sci. Data, 11, 446, <ext-link xlink:href="https://doi.org/10.1038/s41597-024-03281-5" ext-link-type="DOI">10.1038/s41597-024-03281-5</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Kieu et al.(2023)Kieu, Zhao, Tan, Zhang, and Knutson</label><mixed-citation>Kieu, C., Zhao, M., Tan, Z., Zhang, B., and Knutson, T.: On the Role of Sea Surface Temperature in the Clustering of Global Tropical Cyclone Formation, J. Climate,  1–39, <ext-link xlink:href="https://doi.org/10.1175/JCLI-D-22-0623.1" ext-link-type="DOI">10.1175/JCLI-D-22-0623.1</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Kieu et al.(2025</label><mixed-citation>Kieu, C., Nguyen, T. T., Le, D.-T., Hoang, D. G.-A., Luu, Q.-L., Dang, B. T., Ngo, T. X., Luu, Q.-T., Du, T. D., and Mai, K. V.: Reconstructing Pre-Satellite Tropical Cyclogenesis Climatology Using Deep Learning, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.2512.17711" ext-link-type="DOI">10.48550/arXiv.2512.17711</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Kim et al.(2019a)</label><mixed-citation>Kim, M., Park, M.-S., Im, J., Park, S., and Lee, M.-I.: Machine Learning Approaches for Detecting Tropical Cyclone Formation Using Satellite Data, Remote Sens., 11, <ext-link xlink:href="https://doi.org/10.3390/rs11101195" ext-link-type="DOI">10.3390/rs11101195</ext-link>, 2019a.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Kim et al.(2019b)</label><mixed-citation>Kim, M., Park, M.-S., Im, J., Park, S., and Lee, M.-I.: Machine Learning Approaches for Detecting Tropical Cyclone Formation Using Satellite Data, Remote Sens., 11, <ext-link xlink:href="https://doi.org/10.3390/rs11101195" ext-link-type="DOI">10.3390/rs11101195</ext-link>, 2019b.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Knapp et al.(2010)Knapp, Kruk, Levinson, Diamond, and Neumann</label><mixed-citation>Knapp, K. R., Kruk, M. C., Levinson, D. H., Diamond, H. J., and Neumann, C. J.: The International Best Track Archive for Climate Stewardship (IBTrACS): Unifying Tropical Cyclone Data, B. Am. Meteorol. Soc., 91, 363–376, <ext-link xlink:href="https://doi.org/10.1175/2009BAMS2755.1" ext-link-type="DOI">10.1175/2009BAMS2755.1</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Knutson et al.(1998)Knutson, Tuleya, and Kurihara</label><mixed-citation>Knutson, T. R., Tuleya, R. E., and Kurihara, Y.: Simulated increase of hurricane intensities in a CO<sub>2</sub>-warmed climate, Science, 279, 1018–1021, <ext-link xlink:href="https://doi.org/10.1126/science.279.5353.1018" ext-link-type="DOI">10.1126/science.279.5353.1018</ext-link>, 1998.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Kossin et al.(2016)Kossin, Emanuel, and Camargo</label><mixed-citation>Kossin, J. P., Emanuel, K. A., and Camargo, S. J.: Past and Projected Changes in Western North Pacific Tropical Cyclone Exposure, J. Climate, 29, 5725–5739, <ext-link xlink:href="https://doi.org/10.1175/JCLI-D-16-0076.1" ext-link-type="DOI">10.1175/JCLI-D-16-0076.1</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Lam et al.(2023)</label><mixed-citation>Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F., Ravuri, S., Ewalds, T., Eaton-Rosen, Z., Hu, W., Merose, A., Hoyer, S., Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S., and Battaglia, P.: Learning skillful medium-range global weather forecasting, Science, 382, 1416–1421, <ext-link xlink:href="https://doi.org/10.1126/science.adi2336" ext-link-type="DOI">10.1126/science.adi2336</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Le et al.(2025)Le, Dang, Hoang Gia, Nguyen, Tien, Luu, Luu, Nguyen, Nguyen, and Kieu</label><mixed-citation>Le, D.-T., Dang, T.-B., Hoang Gia, A.-D., Nguyen, D.-H., Tien, M.-H., Luu, Q.-T., Luu, Q.-L., Nguyen, T.-H., Nguyen, T. N. T., and Kieu, C.: From Reanalysis to Climatology: Deep Learning Reconstruction of Tropical Cyclogenesis in the Western North Pacific, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.17459622" ext-link-type="DOI">10.5281/zenodo.17459622</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Lee et al.(2020)Lee, Camargo, Vitart, Sobel, Camp, Wang, Tippett, and Yang</label><mixed-citation>Lee, C.-Y., Camargo, S. J., Vitart, F., Sobel, A. H., Camp, J., Wang, S., Tippett, M. K., and Yang, Q.: Subseasonal Predictions of Tropical Cyclone Occurrence and ACE in the S2S Dataset, Weather  Forecast., 35, 921–938, <ext-link xlink:href="https://doi.org/10.1175/WAF-D-19-0217.1" ext-link-type="DOI">10.1175/WAF-D-19-0217.1</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Matsuoka et al.(2018)Matsuoka, Nakano, Sugiyama, and Uchida</label><mixed-citation>Matsuoka, D., Nakano, M., Sugiyama, D., and Uchida, S.: Deep learning approach for detecting tropical cyclones and their precursors in the simulation by a cloud-resolving global nonhydrostatic atmospheric model, Prog. Earth Planet. Sci., <ext-link xlink:href="https://doi.org/10.1186/s40645-018-0245-y" ext-link-type="DOI">10.1186/s40645-018-0245-y</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Menkes et al.(2012)Menkes, Lengaigne, Marchesiello, Jourdain, Vincent, Lefèvre, Chauvin, and Royer</label><mixed-citation> Menkes, C. E., Lengaigne, M., Marchesiello, P., Jourdain, N. C., Vincent, E. M., Lefèvre, J., Chauvin, F., and Royer, J.-F.: Comparison of tropical cyclogenesis indices on seasonal to interannual timescales, Clim. Dynam., 38, 301–321, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Miller et al.(2017)Miller, Maskey, and Berendes</label><mixed-citation>Miller, J., Maskey, M., and Berendes, T.: Using deep learning for tropical cyclone intensity estimation, in: AGU Fall Meeting Abstracts, vol. 2017, IN11E–05, <uri>https://ntrs.nasa.gov/api/citations/20170011716/downloads/20170011716.pdf</uri> (last access: 7 May 2026), 2017.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Molinari et al.(2000)Molinari, Vollaro, Skubis, and Dickinson</label><mixed-citation>Molinari, J., Vollaro, D., Skubis, S., and Dickinson, M.: Origins and Mechanisms of Eastern Pacific Tropical Cyclogenesis: A Case Study, Mon. Weather Rev., 128, 125–139, <ext-link xlink:href="https://doi.org/10.1175/1520-0493(2000)128&lt;0125:OAMOEP&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0493(2000)128&lt;0125:OAMOEP&gt;2.0.CO;2</ext-link>, 2000.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Nguyen(2023)</label><mixed-citation> Nguyen, Q.: Deep Learning for Tropical Cyclone Formation Detection, ProQuest Dissertations Publishing, Indiana University, 120 pp., 2023.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Nguyen and Kieu(2024)</label><mixed-citation>Nguyen, Q. and Kieu, C.: Predicting Tropical Cyclone Formation with Deep Learning, Weather  Forecast., 39, 241–258, <ext-link xlink:href="https://doi.org/10.1175/WAF-D-23-0103.1" ext-link-type="DOI">10.1175/WAF-D-23-0103.1</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Nolan et al.(2007)Nolan, Rappin, and Emanuel</label><mixed-citation> Nolan, D., Rappin, E. D., and Emanuel, K. A.: Tropical cyclogenesis sensitivity to environmental parameters in radiative–convective equilibrium, Q. J. Roy. Meteor. Soc., 133, 2085–2107, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Park et al.(2016)Park, Kim, Lee, Im, and Park</label><mixed-citation> Park, M.-S., Kim, M., Lee, M.-I., Im, J., and Park, S.: Detection of tropical cyclone genesis via quantitative satellite ocean surface wind pattern and intensity analyses using decision trees, Remote Sens. Environ., 183, 205–214, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Pathak et al.(2022)Pathak, Subramanian, Harrington, Raja, Chattopadhyay, Mardani, Kurth, Hall, Li, Azizzadenesheli, Hassanzadeh, Kashinath, and Anandkumar</label><mixed-citation>Pathak, J., Subramanian, S., Harrington, P., Raja, S., Chattopadhyay, A., Mardani, M., Kurth, T., Hall, D., Li, Z., Azizzadenesheli, K., Hassanzadeh, P., Kashinath, K., and Anandkumar, A.: FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators, arXiv [preprint], <uri>https://arxiv.org/pdf/2202.11214</uri> (last access: 7 May 2026), 2022.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Peduzzi et al.(2012)Peduzzi, Chatenoux, Dao, Bono, Herold, Kossin, Mouton, and Nordbeck</label><mixed-citation>Peduzzi, P., Chatenoux, B., Dao, H., Bono, A. D., Herold, C., Kossin, J., Mouton, F., and Nordbeck, O.: Tropical cyclones: Global trends in human exposure, vulnerability and risk, Nat. Clim. Change, 2, 289–294, <ext-link xlink:href="https://doi.org/10.1038/NCLIMATE1410" ext-link-type="DOI">10.1038/NCLIMATE1410</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Riehl and Malkus(1958)</label><mixed-citation> Riehl, H. and Malkus, J. S.: On the heat balance in the equatorial trough zone, Geophysica, 6, 503–538, 1958.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Ritchie and Holland(1997)</label><mixed-citation>Ritchie, E. A. and Holland, G. J.: Scale Interactions during the Formation of Typhoon Irving, Mon. Weather Rev., 125, 1377–1396, <ext-link xlink:href="https://doi.org/10.1175/1520-0493(1997)125&lt;1377:SIDTFO&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0493(1997)125&lt;1377:SIDTFO&gt;2.0.CO;2</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>Scher and Messori(2019)</label><mixed-citation>Scher, S. and Messori, G.: Weather and climate forecasting with neural networks: using general circulation models (GCMs) with different complexity as a study ground, Geosci. Model Dev., 12, 2797–2809, <ext-link xlink:href="https://doi.org/10.5194/gmd-12-2797-2019" ext-link-type="DOI">10.5194/gmd-12-2797-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Schultz et al.(2021)Schultz, Betancourt, Gong, Kleinert, Langguth, Leufen, Mozaffari, and Stadtler</label><mixed-citation>Schultz, M. G., Betancourt, C., Gong, B., Kleinert, F., Langguth, M., Leufen, L. H., Mozaffari, A., and Stadtler, S.: Can deep learning beat numerical weather prediction?, Philos. T. Roy. Soc. A, 379, 20200097, <ext-link xlink:href="https://doi.org/10.1098/rsta.2020.0097" ext-link-type="DOI">10.1098/rsta.2020.0097</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Simpson et al.(1997)Simpson, Ritchie, Holland, Halverson, and Stewart</label><mixed-citation>Simpson, J., Ritchie, E., Holland, G. J., Halverson, J., and Stewart, S.: Mesoscale Interactions in Tropical Cyclone Genesis, Mon. Weather Rev., 125, 2643–2661, <ext-link xlink:href="https://doi.org/10.1175/1520-0493(1997)125&lt;2643:MIITCG&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0493(1997)125&lt;2643:MIITCG&gt;2.0.CO;2</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Strachan et al.(2013)Strachan, Vidale, Hodges, Roberts, and Demory</label><mixed-citation>Strachan, J., Vidale, P. L., Hodges, K., Roberts, M., and Demory, M.-E.: Investigating Global Tropical Cyclone Activity with a Hierarchy of AGCMs: The Role of Model Resolution, J. Climate, 26, 133–152, <ext-link xlink:href="https://doi.org/10.1175/JCLI-D-12-00012.1" ext-link-type="DOI">10.1175/JCLI-D-12-00012.1</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Su et al.(2020)Su, Wu, Jiang, Pai, Liu, Zhai, Tavallali, and DeMaria</label><mixed-citation>Su, H., Wu, L., Jiang, J. H., Pai, R., Liu, A., Zhai, A. J., Tavallali, P., and DeMaria, M.: Applying satellite observations of tropical cyclone internal structures to rapid intensification forecast with machine learning, Geophys. Res. Lett., 47, e2020GL089102, <ext-link xlink:href="https://doi.org/10.1029/2020GL089102" ext-link-type="DOI">10.1029/2020GL089102</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx45"><label>Tan et al.(2015)Tan, T., H., and Kieu</label><mixed-citation>Tan, P.-V., Long, T.-T., Hai, B.-H., and Chanh, K.: Seasonal forecasting of tropical cyclone activity in the coastal region of Vietnam using RegCM4.2, Climate Res., 62, 115–129, <ext-link xlink:href="https://doi.org/10.3354/cr01267" ext-link-type="DOI">10.3354/cr01267</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx46"><label>Thanh et al.(2020)Thanh, Cuong, Hien, and Kieu</label><mixed-citation>Thanh, N. T., Cuong, H. D., Hien, N. X., and Kieu, C.: Relationship between sea surface temperature and the maximum intensity of tropical cyclones affecting Vietnam's coastline, Int. J. Climatol., 40, 2527–2538, <ext-link xlink:href="https://doi.org/10.1002/joc.6348" ext-link-type="DOI">10.1002/joc.6348</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx47"><label>Tien et al.(2020)Tien, Hoa, Thanh, and Kieu</label><mixed-citation>Tien, T. T., Hoa, D. N.-Q., Thanh, C., and Kieu, C.: Assessing the Impacts of Augmented Observations on the Forecast of Typhoon Wutip’s (2013) Formation Using the Ensemble Kalman Filter, Weather   Forecast., 35, 1483–1503, <ext-link xlink:href="https://doi.org/10.1175/WAF-D-20-0001.1" ext-link-type="DOI">10.1175/WAF-D-20-0001.1</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx48"><label>Tippett et al.(2011)Tippett, Camargo, and Sobel</label><mixed-citation> Tippett, M. K., Camargo, S. J., and Sobel, A. H.: A Poisson regression index for tropical cyclone genesis and the role of large-scale vorticity in genesis, J. Climate, 24, 2335–2357, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx49"><label>Tran-Quang et al.(2020)Tran-Quang, Pham-Thanh, Vu, Kieu, and Phan-Van</label><mixed-citation>Tran-Quang, D., Pham-Thanh, H., Vu, T.-A., Kieu, C., and Phan-Van, T.: Climatic Shift of the Tropical Cyclone Activity Affecting Vietnam’s Coastal Region, J. Appl. Meteor. Clim., 59, 1755–1768, <ext-link xlink:href="https://doi.org/10.1175/JAMC-D-20-0021.1" ext-link-type="DOI">10.1175/JAMC-D-20-0021.1</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx50"><label>Trinh et al.(2021)Trinh, Cuong, Kham, and Kieu</label><mixed-citation>Trinh, D. H., Cuong, H. D., Kham, D. V., and Kieu, C.: Remote Control of Sea Surface Temperature on the Variability of Tropical Cyclone Activity Affecting Vietnam’s Coastline, J. Appl. Meteorol. Clim., 60, 323–339, <ext-link xlink:href="https://doi.org/10.1175/JAMC-D-20-0170.1" ext-link-type="DOI">10.1175/JAMC-D-20-0170.1</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx51"><label>Ullrich and Zarzycki(2017)</label><mixed-citation>Ullrich, P. A. and Zarzycki, C. M.: TempestExtremes: a framework for scale-insensitive pointwise feature tracking on unstructured grids, Geosci. Model Dev., 10, 1069–1090, <ext-link xlink:href="https://doi.org/10.5194/gmd-10-1069-2017" ext-link-type="DOI">10.5194/gmd-10-1069-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx52"><label>Vu et al.(2021)Vu, Kieu, Chavas, and Wang</label><mixed-citation>Vu, T.-A., Kieu, C., Chavas, D., and Wang, Q.: A Numerical Study of the Global Formation of Tropical Cyclones, J. Adv. Model. Earth Sy., 13, <ext-link xlink:href="https://doi.org/10.1029/2020MS002207" ext-link-type="DOI">10.1029/2020MS002207</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx53"><label>Vu et al.(2025)Vu, Kieu, Robeson, Staten, and Kravitz</label><mixed-citation> Vu, T.-A., Kieu, C., Robeson, S. M., Staten, P., and Kravitz, B.: Climate projection of tropical cyclone lifetime in the western north Pacific basin, J. Climate, 38, 181–201, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx54"><label>Walsh et al.(2007)Walsh, Fiorino, Landsea, and McInnes</label><mixed-citation>Walsh, K. J. E., Fiorino, M., Landsea, C. W., and McInnes, K. L.: Objectively Determined Resolution-Dependent Threshold Criteria for the Detection of Tropical Cyclones in Climate Models and Reanalyses, J. Climate, 20, 2307–2314, <ext-link xlink:href="https://doi.org/10.1175/JCLI4074.1" ext-link-type="DOI">10.1175/JCLI4074.1</ext-link>, 2007. </mixed-citation></ref>
      <ref id="bib1.bibx55"><label>Walsh et al.(2015)Walsh, Camargo, Vecchi, Daloz, Elsner, Emanuel, Horn, Lim, Roberts, Patricola, Scoccimarro, Sobel, Strazzo, Villarini, Wehner, Zhao, Kossin, LaRow, Oouchi, Schubert, Wang, Bacmeister, Chang, Chauvin, Jablonowski, Kumar, Murakami, Ose, Reed, Saravanan, Yamada, Zarzycki, Vidale, Jonas, and Henderson</label><mixed-citation>Walsh, K. J. E., Camargo, S. J., Vecchi, G. A., Daloz, A. S., Elsner, J., Emanuel, K., Horn, M., Lim, Y.-K., Roberts, M., Patricola, C., Scoccimarro, E., Sobel, A. H., Strazzo, S., Villarini, G., Wehner, M., Zhao, M., Kossin, J. P., LaRow, T., Oouchi, K., Schubert, S., Wang, H., Bacmeister, J., Chang, P., Chauvin, F., Jablonowski, C., Kumar, A., Murakami, H., Ose, T., Reed, K. A., Saravanan, R., Yamada, Y., Zarzycki, C. M., Vidale, P. L., Jonas, J. A., and Henderson, N.: Hurricanes and Climate: The U.S. CLIVAR Working Group on Hurricanes, B. Am. Meteorol. Soc., 96, 997–1017, <ext-link xlink:href="https://doi.org/10.1175/BAMS-D-13-00242.1" ext-link-type="DOI">10.1175/BAMS-D-13-00242.1</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx56"><label>Wang et al.(2022)Wang, Zhao, Huang, and Wang</label><mixed-citation>Wang, Z., Zhao, J., Huang, H., and Wang, X.: A Review on the Application of Machine Learning Methods in Tropical Cyclone Forecasting, Front. Earth Sci.,  10, 2022, <ext-link xlink:href="https://doi.org/10.3389/feart.2022.902596" ext-link-type="DOI">10.3389/feart.2022.902596</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx57"><label>Weyn et al.(2021)Weyn, Durran, Caruana, and Cresswell-Clay</label><mixed-citation>Weyn, J. A., Durran, D. R., Caruana, R., and Cresswell-Clay, N.: Sub-Seasonal Forecasting With a Large Ensemble of Deep-Learning Weather Prediction Models, J. Adv. Model. Earth Sy., 13, e2021MS002502, <ext-link xlink:href="https://doi.org/10.1029/2021MS002502" ext-link-type="DOI">10.1029/2021MS002502</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx58"><label>Wimmers et al.(2019)Wimmers, Velden, and Cossuth</label><mixed-citation>Wimmers, A., Velden, C., and Cossuth, J. H.: Using Deep Learning to Estimate Tropical Cyclone Intensity from Satellite Passive Microwave Imagery, Mon. Weather Rev., 147, 2261–2282, <ext-link xlink:href="https://doi.org/10.1175/MWR-D-18-0391.1" ext-link-type="DOI">10.1175/MWR-D-18-0391.1</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx59"><label>Yanai(1964)</label><mixed-citation> Yanai, M.: Formation of tropical cyclones, Rev. Geophys., 2, 367–414, 1964.</mixed-citation></ref>
      <ref id="bib1.bibx60"><label>Zarzycki and Ullrich(2017)</label><mixed-citation> Zarzycki, C. and Ullrich, P.: Assessing sensitivities in algorithmic detection of tropical cyclones in climate data, Geophys. Res. Lett., 44, 1141–1149, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx61"><label>Zhang and Bao(1996)</label><mixed-citation>Zhang, D.-L. and Bao, N.: Oceanic Cyclogenesis as Induced by a Mesoscale Convective System Moving Offshore. Part I: A 90-h Real-Data Simulation, Mon. Weather Rev., 124, 1449–1469, <ext-link xlink:href="https://doi.org/10.1175/1520-0493(1996)124&lt;1449:OCAIBA&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0493(1996)124&lt;1449:OCAIBA&gt;2.0.CO;2</ext-link>, 1996.</mixed-citation></ref>
      <ref id="bib1.bibx62"><label>Zhang et al.(2019)Zhang, Lin, Lin, Zhang, Yu, Cao, and Xue</label><mixed-citation>Zhang, T., Lin, W., Lin, Y., Zhang, M., Yu, H., Cao, K., and Xue, W.: Prediction of Tropical Cyclone Genesis from Mesoscale Convective Systems Using Machine Learning, Weather   Forecast., 34, 1035–1049, <ext-link xlink:href="https://doi.org/10.1175/WAF-D-18-0201.1" ext-link-type="DOI">10.1175/WAF-D-18-0201.1</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx63"><label>Zhang et al.(2015)Zhang, Fu, Peng, and Li</label><mixed-citation> Zhang, W., Fu, B., Peng, M. S., and Li, T.: Discriminating developing versus nondeveloping tropical disturbances in the western North Pacific through decision tree analysis, Weather   Forecast., 30, 446–454, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx64"><label>Zhao et al.(2009)Zhao, Held, Lin, and Vecchi</label><mixed-citation>Zhao, M., Held, I. M., Lin, S.-J., and Vecchi, G. A.: Simulations of Global Hurricane Climatology, Interannual Variability, and Response to Global Warming Using a 50-km Resolution GCM, J. Climate, 22, 6653–6678, <ext-link xlink:href="https://doi.org/10.1175/2009JCLI3049.1" ext-link-type="DOI">10.1175/2009JCLI3049.1</ext-link>, 2009.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>From reanalysis to climatology: deep learning reconstruction  of tropical cyclogenesis in the western North Pacific</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Bengtsson et al.(2007)Bengtsson, Hodges, Esch, Keenlyside, Kornblueh,
Luo, and Yamagata</label><mixed-citation>
      
Bengtsson, L., Hodges, K. I., Esch, M., Keenlyside, N., Kornblueh, L., Luo,
J.-J., and Yamagata, T.: How may tropical cyclones change in a warmer
climate?, Tellus A, 59A, 539–561,
<a href="https://doi.org/10.1111/j.1600-0870.2007.00251.x" target="_blank">https://doi.org/10.1111/j.1600-0870.2007.00251.x</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Bi et al.(2023)Bi, L, H, X, X, and Q.</label><mixed-citation>
      
Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., and Tian, Q.: Accurate medium-range global weather forecasting with 3D neural networks, Nature, 619, 533–538, <a href="https://doi.org/10.1038/s41586-023-06185-3" target="_blank">https://doi.org/10.1038/s41586-023-06185-3</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Bister and Emanuel(1997)</label><mixed-citation>
      
Bister, M. and Emanuel, K. A.: The Genesis of Hurricane Guillermo: TEXMEX
Analyses and a Modeling Study, Mon. Weather Rev., 125, 2662–2682,
<a href="https://doi.org/10.1175/1520-0493(1997)125&lt;2662:TGOHGT&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0493(1997)125&lt;2662:TGOHGT&gt;2.0.CO;2</a>, 1997.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Brigato and Iocchi(2020)</label><mixed-citation>
      
Brigato, L. and Iocchi, L.: A Close Look at Deep Learning with Small Data,
arXiv [preprint],
<a href="https://doi.org/10.48550/arXiv.2003.12843" target="_blank">https://doi.org/10.48550/arXiv.2003.12843</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Camargo and Zebiak(2002)</label><mixed-citation>
      
Camargo, S. J. and Zebiak, S. E.: Improving the Detection and Tracking of
Tropical Cyclones in Atmospheric General Circulation Models, Weather
Forecast., 17, 1152–1162,
<a href="https://doi.org/10.1175/1520-0434(2002)017&lt;1152:ITDATO&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0434(2002)017&lt;1152:ITDATO&gt;2.0.CO;2</a>, 2002.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Camargo et al.(2023)Camargo, Murakami, Bloemendaal, Chand, Deshpande,
Dominguez-Sarmiento, González-Alemán, Knutson, Lin, Moon, Patricola, Reed,
Roberts, Scoccimarro, Tam, Wallace, Wu, Yamada, Zhang, and
Zhao</label><mixed-citation>
      
Camargo, S. J., Murakami, H., Bloemendaal, N., Chand, S. S., Deshpande, M. S.,
Dominguez-Sarmiento, C., González-Alemán, J. J., Knutson, T. R., Lin,
I.-I., Moon, I.-J., Patricola, C. M., Reed, K. A., Roberts, M. J.,
Scoccimarro, E., Tam, C. Y. F., Wallace, E. J., Wu, L., Yamada, Y., Zhang,
W., and Zhao, H.: An update on the influence of natural climate variability
and anthropogenic climate change on tropical cyclones, Tropical Cyclone
Research and Review, 12, 216–239,
<a href="https://doi.org/10.1016/j.tcrr.2023.10.001" target="_blank">https://doi.org/10.1016/j.tcrr.2023.10.001</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Cha et al.(2020)Cha, Knutson, Lee, Ying, and
Nakaegawa</label><mixed-citation>
      
Cha, E. J., Knutson, T. R., Lee, T.-C., Ying, M., and Nakaegawa, T.: Third
assessment on impacts of climate change on tropical cyclones in the Typhoon
Committee Region – Part II: Future projections, Tropical Cyclone Research
and Review, 9, 75–86, <a href="https://doi.org/10.1016/j.tcrr.2020.04.005" target="_blank">https://doi.org/10.1016/j.tcrr.2020.04.005</a>,
2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Chen and Yuan(2024)</label><mixed-citation>
      
Chen, A. and Yuan, C.: Deep learning-based spatial downscaling and its
application for tropical cyclone detection in the western North Pacific,
Front. Earth Sci.,   12, 2024,
<a href="https://doi.org/10.3389/feart.2024.1345714" target="_blank">https://doi.org/10.3389/feart.2024.1345714</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Chen et al.(2020)Chen, Zhang, and Wang</label><mixed-citation>
      
Chen, R., Zhang, W., and Wang, X.: Machine Learning in Tropical Cyclone
Forecast Modeling: A Review, Atmosphere, 11, <a href="https://doi.org/10.3390/atmos11070676" target="_blank">https://doi.org/10.3390/atmos11070676</a>,
2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Defforge and Merlis(2017)</label><mixed-citation>
      
Defforge, C. L. and Merlis, T. M.: Observed warming trend in sea surface
temperature at tropical cyclone genesis, Geophys. Res. Lett., 44,
1034–1040, <a href="https://doi.org/10.1002/2016GL071045" target="_blank">https://doi.org/10.1002/2016GL071045</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Gao et al.(2018a)Gao, Zhao, Pan, Li, Zhou, Xu, Zhong,
and Shi</label><mixed-citation>
      
Gao, S., Zhao, P., Pan, B., Li, Y., Zhou, M., Xu, J., Zhong, S., and Shi, Z.: A
nowcasting model for the prediction of typhoon tracks based on a long short
term memory neural network, Acta Oceanol. Sin., 37, 8–12,
2018a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Gao et al.(2018b)Gao, Zhao, Pan, Li, Zhou, Xu, Zhong,
and Shi</label><mixed-citation>
      
Gao, S., Zhao, P., Pan, B., Li, Y., Zhou, M., Xu, J., Zhong, S., and Shi, Z.: A
nowcasting model for the prediction of typhoon tracks based on a long short
term memory neural network, Acta Oceanol. Sin., 37, 8–12,
2018b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Gelaro et al.(2017)</label><mixed-citation>
      
Gelaro, R., McCarty, W., Suárez, M. J., Todling, R., Molod, A., Takacs, L.,
Randles, C. A., Darmenov, A., Bosilovich, M. G., Reichle, R., Wargan, K.,
Coy, L., Cullather, R., Draper, C., Akella, S., Buchard, V., Conaty, A.,
da Silva, A. M., Gu, W., Kim, G.-K., Koster, R., Lucchesi, R., Merkova, D.,
Nielsen, J. E., Partyka, G., Pawson, S., Putman, W., Rienecker, M., Schubert,
S. D., Sienkiewicz, M., and Zhao, B.: The Modern-Era Retrospective Analysis
for Research and Applications, Version 2 (MERRA-2), J. Climate, 30,
5419–5454, <a href="https://doi.org/10.1175/JCLI-D-16-0758.1" target="_blank">https://doi.org/10.1175/JCLI-D-16-0758.1</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Giffard-Roisin et al.(2020)</label><mixed-citation>
      
Giffard-Roisin, S., Yang, M., Charpiat, G., Kumler Bonfanti, C., Kégl, B., and Monteleoni, C.: Tropical cyclone track forecasting using fused deep
learning from aligned reanalysis data, Front. Big Data, 3, 1–13, <a href="https://doi.org/10.3389/fdata.2020.00001" target="_blank">https://doi.org/10.3389/fdata.2020.00001</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Gray(1968)</label><mixed-citation>
      
Gray, W. M.: Global View of The Origin of Tropical Disturbances, Mon. Weather
Rev., 96, 669–700, <a href="https://doi.org/10.1175/1520-0493(1968)096&lt;0669:GVOTOO&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0493(1968)096&lt;0669:GVOTOO&gt;2.0.CO;2</a>,
1968.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Horn et al.(2014)</label><mixed-citation>
      
Horn, M., Walsh, K., Zhao, M., Camargo, S. J., Scoccimarro, E., Murakami, H.,
Wang, H., Ballinger, A., Kumar, A., Shaevitz, D. A., Jonas, J. A., and
Oouchi, K.: Tracking Scheme Dependence of Simulated Tropical Cyclone Response
to Idealized Climate Simulations, J. Climate, 27, 9197–9213,
<a href="https://doi.org/10.1175/JCLI-D-14-00200.1" target="_blank">https://doi.org/10.1175/JCLI-D-14-00200.1</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Kieu and Nguyen(2024)</label><mixed-citation>
      
Kieu, C. and Nguyen, Q.: Binary dataset for machine learning applications to
tropical cyclone formation prediction, Sci. Data, 11, 446, <a href="https://doi.org/10.1038/s41597-024-03281-5" target="_blank">https://doi.org/10.1038/s41597-024-03281-5</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Kieu et al.(2023)Kieu, Zhao, Tan, Zhang, and Knutson</label><mixed-citation>
      
Kieu, C., Zhao, M., Tan, Z., Zhang, B., and Knutson, T.: On the Role of Sea
Surface Temperature in the Clustering of Global Tropical Cyclone Formation,
J. Climate,  1–39, <a href="https://doi.org/10.1175/JCLI-D-22-0623.1" target="_blank">https://doi.org/10.1175/JCLI-D-22-0623.1</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Kieu et al.(2025</label><mixed-citation>
      
Kieu, C., Nguyen, T. T., Le, D.-T., Hoang, D. G.-A., Luu, Q.-L., Dang, B. T.,
Ngo, T. X., Luu, Q.-T., Du, T. D., and Mai, K. V.: Reconstructing
Pre-Satellite Tropical Cyclogenesis Climatology Using Deep Learning, arXiv
[preprint],
<a href="https://doi.org/10.48550/arXiv.2512.17711" target="_blank">https://doi.org/10.48550/arXiv.2512.17711</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Kim et al.(2019a)</label><mixed-citation>
      
Kim, M., Park, M.-S., Im, J., Park, S., and Lee, M.-I.: Machine Learning
Approaches for Detecting Tropical Cyclone Formation Using Satellite Data,
Remote Sens., 11, <a href="https://doi.org/10.3390/rs11101195" target="_blank">https://doi.org/10.3390/rs11101195</a>, 2019a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Kim et al.(2019b)</label><mixed-citation>
      
Kim, M., Park, M.-S., Im, J., Park, S., and Lee, M.-I.: Machine Learning
Approaches for Detecting Tropical Cyclone Formation Using Satellite Data,
Remote Sens., 11, <a href="https://doi.org/10.3390/rs11101195" target="_blank">https://doi.org/10.3390/rs11101195</a>, 2019b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Knapp et al.(2010)Knapp, Kruk, Levinson, Diamond, and
Neumann</label><mixed-citation>
      
Knapp, K. R., Kruk, M. C., Levinson, D. H., Diamond, H. J., and Neumann, C. J.:
The International Best Track Archive for Climate Stewardship (IBTrACS):
Unifying Tropical Cyclone Data, B. Am. Meteorol. Soc., 91, 363–376, <a href="https://doi.org/10.1175/2009BAMS2755.1" target="_blank">https://doi.org/10.1175/2009BAMS2755.1</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Knutson et al.(1998)Knutson, Tuleya, and Kurihara</label><mixed-citation>
      
Knutson, T. R., Tuleya, R. E., and Kurihara, Y.: Simulated increase of
hurricane intensities in a CO<sub>2</sub>-warmed climate, Science, 279, 1018–1021,
<a href="https://doi.org/10.1126/science.279.5353.1018" target="_blank">https://doi.org/10.1126/science.279.5353.1018</a>, 1998.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Kossin et al.(2016)Kossin, Emanuel, and Camargo</label><mixed-citation>
      
Kossin, J. P., Emanuel, K. A., and Camargo, S. J.: Past and Projected Changes
in Western North Pacific Tropical Cyclone Exposure, J. Climate, 29,
5725–5739, <a href="https://doi.org/10.1175/JCLI-D-16-0076.1" target="_blank">https://doi.org/10.1175/JCLI-D-16-0076.1</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Lam et al.(2023)</label><mixed-citation>
      
Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M.,
Alet, F., Ravuri, S., Ewalds, T., Eaton-Rosen, Z., Hu, W., Merose, A., Hoyer,
S., Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S., and
Battaglia, P.: Learning skillful medium-range global weather forecasting,
Science, 382, 1416–1421, <a href="https://doi.org/10.1126/science.adi2336" target="_blank">https://doi.org/10.1126/science.adi2336</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Le et al.(2025)Le, Dang, Hoang Gia, Nguyen, Tien, Luu, Luu, Nguyen,
Nguyen, and Kieu</label><mixed-citation>
      
Le, D.-T., Dang, T.-B., Hoang Gia, A.-D., Nguyen, D.-H., Tien, M.-H., Luu,
Q.-T., Luu, Q.-L., Nguyen, T.-H., Nguyen, T. N. T., and Kieu, C.: From
Reanalysis to Climatology: Deep Learning Reconstruction of Tropical
Cyclogenesis in the Western North Pacific, Zenodo [code],
<a href="https://doi.org/10.5281/zenodo.17459622" target="_blank">https://doi.org/10.5281/zenodo.17459622</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Lee et al.(2020)Lee, Camargo, Vitart, Sobel, Camp, Wang, Tippett, and
Yang</label><mixed-citation>
      
Lee, C.-Y., Camargo, S. J., Vitart, F., Sobel, A. H., Camp, J., Wang, S.,
Tippett, M. K., and Yang, Q.: Subseasonal Predictions of Tropical Cyclone
Occurrence and ACE in the S2S Dataset, Weather  Forecast., 35, 921–938, <a href="https://doi.org/10.1175/WAF-D-19-0217.1" target="_blank">https://doi.org/10.1175/WAF-D-19-0217.1</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Matsuoka et al.(2018)Matsuoka, Nakano, Sugiyama, and
Uchida</label><mixed-citation>
      
Matsuoka, D., Nakano, M., Sugiyama, D., and Uchida, S.: Deep learning approach
for detecting tropical cyclones and their precursors in the simulation by a
cloud-resolving global nonhydrostatic atmospheric model, Prog. Earth
Planet. Sci., <a href="https://doi.org/10.1186/s40645-018-0245-y" target="_blank">https://doi.org/10.1186/s40645-018-0245-y</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Menkes et al.(2012)Menkes, Lengaigne, Marchesiello, Jourdain,
Vincent, Lefèvre, Chauvin, and Royer</label><mixed-citation>
      
Menkes, C. E., Lengaigne, M., Marchesiello, P., Jourdain, N. C., Vincent,
E. M., Lefèvre, J., Chauvin, F., and Royer, J.-F.: Comparison of tropical
cyclogenesis indices on seasonal to interannual timescales, Clim. Dynam.,
38, 301–321, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Miller et al.(2017)Miller, Maskey, and
Berendes</label><mixed-citation>
      
Miller, J., Maskey, M., and Berendes, T.: Using deep learning for tropical
cyclone intensity estimation, in: AGU Fall Meeting Abstracts, vol. 2017,
IN11E–05, <a href="https://ntrs.nasa.gov/api/citations/20170011716/downloads/20170011716.pdf" target="_blank"/> (last access: 7 May 2026), 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Molinari et al.(2000)Molinari, Vollaro, Skubis, and
Dickinson</label><mixed-citation>
      
Molinari, J., Vollaro, D., Skubis, S., and Dickinson, M.: Origins and
Mechanisms of Eastern Pacific Tropical Cyclogenesis: A Case Study, Mon.
Weather Rev., 128, 125–139,
<a href="https://doi.org/10.1175/1520-0493(2000)128&lt;0125:OAMOEP&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0493(2000)128&lt;0125:OAMOEP&gt;2.0.CO;2</a>, 2000.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Nguyen(2023)</label><mixed-citation>
      
Nguyen, Q.: Deep Learning for Tropical Cyclone Formation Detection, ProQuest
Dissertations Publishing, Indiana University, 120&thinsp;pp., 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Nguyen and Kieu(2024)</label><mixed-citation>
      
Nguyen, Q. and Kieu, C.: Predicting Tropical Cyclone Formation with Deep
Learning, Weather  Forecast., 39, 241–258,
<a href="https://doi.org/10.1175/WAF-D-23-0103.1" target="_blank">https://doi.org/10.1175/WAF-D-23-0103.1</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Nolan et al.(2007)Nolan, Rappin, and Emanuel</label><mixed-citation>
      
Nolan, D., Rappin, E. D., and Emanuel, K. A.: Tropical cyclogenesis sensitivity
to environmental parameters in radiative–convective equilibrium, Q. J.
Roy. Meteor. Soc., 133, 2085–2107, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Park et al.(2016)Park, Kim, Lee, Im, and Park</label><mixed-citation>
      
Park, M.-S., Kim, M., Lee, M.-I., Im, J., and Park, S.: Detection of tropical
cyclone genesis via quantitative satellite ocean surface wind pattern and
intensity analyses using decision trees, Remote Sens. Environ., 183,
205–214, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Pathak et al.(2022)Pathak, Subramanian, Harrington, Raja,
Chattopadhyay, Mardani, Kurth, Hall, Li, Azizzadenesheli, Hassanzadeh,
Kashinath, and Anandkumar</label><mixed-citation>
      
Pathak, J., Subramanian, S., Harrington, P., Raja, S., Chattopadhyay, A.,
Mardani, M., Kurth, T., Hall, D., Li, Z., Azizzadenesheli, K., Hassanzadeh,
P., Kashinath, K., and Anandkumar, A.: FourCastNet: A Global Data-driven
High-resolution Weather Model using Adaptive Fourier Neural Operators, arXiv [preprint], <a href="https://arxiv.org/pdf/2202.11214" target="_blank"/> (last access: 7 May 2026), 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Peduzzi et al.(2012)Peduzzi, Chatenoux, Dao, Bono, Herold, Kossin,
Mouton, and Nordbeck</label><mixed-citation>
      
Peduzzi, P., Chatenoux, B., Dao, H., Bono, A. D., Herold, C., Kossin, J.,
Mouton, F., and Nordbeck, O.: Tropical cyclones: Global trends in human
exposure, vulnerability and risk, Nat. Clim. Change, 2, 289–294,
<a href="https://doi.org/10.1038/NCLIMATE1410" target="_blank">https://doi.org/10.1038/NCLIMATE1410</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Riehl and Malkus(1958)</label><mixed-citation>
      
Riehl, H. and Malkus, J. S.: On the heat balance in the equatorial trough zone,
Geophysica, 6, 503–538, 1958.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Ritchie and Holland(1997)</label><mixed-citation>
      
Ritchie, E. A. and Holland, G. J.: Scale Interactions during the Formation of
Typhoon Irving, Mon. Weather Rev., 125, 1377–1396,
<a href="https://doi.org/10.1175/1520-0493(1997)125&lt;1377:SIDTFO&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0493(1997)125&lt;1377:SIDTFO&gt;2.0.CO;2</a>, 1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Scher and Messori(2019)</label><mixed-citation>
      
Scher, S. and Messori, G.: Weather and climate forecasting with neural networks: using general circulation models (GCMs) with different complexity as a study ground, Geosci. Model Dev., 12, 2797–2809, <a href="https://doi.org/10.5194/gmd-12-2797-2019" target="_blank">https://doi.org/10.5194/gmd-12-2797-2019</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Schultz et al.(2021)Schultz, Betancourt, Gong, Kleinert, Langguth,
Leufen, Mozaffari, and Stadtler</label><mixed-citation>
      
Schultz, M. G., Betancourt, C., Gong, B., Kleinert, F., Langguth, M., Leufen,
L. H., Mozaffari, A., and Stadtler, S.: Can deep learning beat numerical
weather prediction?, Philos. T. Roy. Soc. A, 379, 20200097,
<a href="https://doi.org/10.1098/rsta.2020.0097" target="_blank">https://doi.org/10.1098/rsta.2020.0097</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Simpson et al.(1997)Simpson, Ritchie, Holland, Halverson, and
Stewart</label><mixed-citation>
      
Simpson, J., Ritchie, E., Holland, G. J., Halverson, J., and Stewart, S.:
Mesoscale Interactions in Tropical Cyclone Genesis, Mon. Weather Rev.,
125, 2643–2661, <a href="https://doi.org/10.1175/1520-0493(1997)125&lt;2643:MIITCG&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0493(1997)125&lt;2643:MIITCG&gt;2.0.CO;2</a>, 1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Strachan et al.(2013)Strachan, Vidale, Hodges, Roberts, and
Demory</label><mixed-citation>
      
Strachan, J., Vidale, P. L., Hodges, K., Roberts, M., and Demory, M.-E.:
Investigating Global Tropical Cyclone Activity with a Hierarchy of AGCMs: The
Role of Model Resolution, J. Climate, 26, 133–152,
<a href="https://doi.org/10.1175/JCLI-D-12-00012.1" target="_blank">https://doi.org/10.1175/JCLI-D-12-00012.1</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Su et al.(2020)Su, Wu, Jiang, Pai, Liu, Zhai, Tavallali, and
DeMaria</label><mixed-citation>
      
Su, H., Wu, L., Jiang, J. H., Pai, R., Liu, A., Zhai, A. J., Tavallali, P., and DeMaria, M.: Applying satellite observations of tropical cyclone internal
structures to rapid intensification forecast with machine learning, Geophys. Res. Lett., 47, e2020GL089102, <a href="https://doi.org/10.1029/2020GL089102" target="_blank">https://doi.org/10.1029/2020GL089102</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Tan et al.(2015)Tan, T., H., and Kieu</label><mixed-citation>
      
Tan, P.-V., Long, T.-T., Hai, B.-H., and Chanh, K.: Seasonal forecasting of tropical cyclone activity in the coastal region of Vietnam using RegCM4.2, Climate Res., 62, 115–129, <a href="https://doi.org/10.3354/cr01267" target="_blank">https://doi.org/10.3354/cr01267</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Thanh et al.(2020)Thanh, Cuong, Hien, and Kieu</label><mixed-citation>
      
Thanh, N. T., Cuong, H. D., Hien, N. X., and Kieu, C.: Relationship between sea
surface temperature and the maximum intensity of tropical cyclones affecting
Vietnam's coastline, Int. J. Climatol., 40, 2527–2538,
<a href="https://doi.org/10.1002/joc.6348" target="_blank">https://doi.org/10.1002/joc.6348</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Tien et al.(2020)Tien, Hoa, Thanh, and Kieu</label><mixed-citation>
      
Tien, T. T., Hoa, D. N.-Q., Thanh, C., and Kieu, C.: Assessing the Impacts of
Augmented Observations on the Forecast of Typhoon Wutip’s (2013) Formation
Using the Ensemble Kalman Filter, Weather   Forecast., 35, 1483–1503,
<a href="https://doi.org/10.1175/WAF-D-20-0001.1" target="_blank">https://doi.org/10.1175/WAF-D-20-0001.1</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Tippett et al.(2011)Tippett, Camargo, and Sobel</label><mixed-citation>
      
Tippett, M. K., Camargo, S. J., and Sobel, A. H.: A Poisson regression index
for tropical cyclone genesis and the role of large-scale vorticity in
genesis, J. Climate, 24, 2335–2357, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Tran-Quang et al.(2020)Tran-Quang, Pham-Thanh, Vu, Kieu, and
Phan-Van</label><mixed-citation>
      
Tran-Quang, D., Pham-Thanh, H., Vu, T.-A., Kieu, C., and Phan-Van, T.: Climatic
Shift of the Tropical Cyclone Activity Affecting Vietnam’s Coastal Region,
J. Appl. Meteor. Clim., 59, 1755–1768,
<a href="https://doi.org/10.1175/JAMC-D-20-0021.1" target="_blank">https://doi.org/10.1175/JAMC-D-20-0021.1</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Trinh et al.(2021)Trinh, Cuong, Kham, and Kieu</label><mixed-citation>
      
Trinh, D. H., Cuong, H. D., Kham, D. V., and Kieu, C.: Remote Control of Sea
Surface Temperature on the Variability of Tropical Cyclone Activity Affecting
Vietnam’s Coastline, J. Appl. Meteorol. Clim., 60,
323–339, <a href="https://doi.org/10.1175/JAMC-D-20-0170.1" target="_blank">https://doi.org/10.1175/JAMC-D-20-0170.1</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Ullrich and Zarzycki(2017)</label><mixed-citation>
      
Ullrich, P. A. and Zarzycki, C. M.: TempestExtremes: a framework for scale-insensitive pointwise feature tracking on unstructured grids, Geosci. Model Dev., 10, 1069–1090, <a href="https://doi.org/10.5194/gmd-10-1069-2017" target="_blank">https://doi.org/10.5194/gmd-10-1069-2017</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Vu et al.(2021)Vu, Kieu, Chavas, and Wang</label><mixed-citation>
      
Vu, T.-A., Kieu, C., Chavas, D., and Wang, Q.: A Numerical Study of the Global
Formation of Tropical Cyclones, J. Adv. Model. Earth
Sy., 13, <a href="https://doi.org/10.1029/2020MS002207" target="_blank">https://doi.org/10.1029/2020MS002207</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>Vu et al.(2025)Vu, Kieu, Robeson, Staten, and Kravitz</label><mixed-citation>
      
Vu, T.-A., Kieu, C., Robeson, S. M., Staten, P., and Kravitz, B.: Climate
projection of tropical cyclone lifetime in the western north Pacific basin,
J. Climate, 38, 181–201, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Walsh et al.(2007)Walsh, Fiorino, Landsea, and
McInnes</label><mixed-citation>
      
Walsh, K. J. E., Fiorino, M., Landsea, C. W., and McInnes, K. L.: Objectively
Determined Resolution-Dependent Threshold Criteria for the Detection of
Tropical Cyclones in Climate Models and Reanalyses, J. Climate, 20,
2307–2314, <a href="https://doi.org/10.1175/JCLI4074.1" target="_blank">https://doi.org/10.1175/JCLI4074.1</a>, 2007.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>Walsh et al.(2015)Walsh, Camargo, Vecchi, Daloz, Elsner, Emanuel,
Horn, Lim, Roberts, Patricola, Scoccimarro, Sobel, Strazzo, Villarini,
Wehner, Zhao, Kossin, LaRow, Oouchi, Schubert, Wang, Bacmeister, Chang,
Chauvin, Jablonowski, Kumar, Murakami, Ose, Reed, Saravanan, Yamada,
Zarzycki, Vidale, Jonas, and Henderson</label><mixed-citation>
      
Walsh, K. J. E., Camargo, S. J., Vecchi, G. A., Daloz, A. S., Elsner, J.,
Emanuel, K., Horn, M., Lim, Y.-K., Roberts, M., Patricola, C., Scoccimarro,
E., Sobel, A. H., Strazzo, S., Villarini, G., Wehner, M., Zhao, M., Kossin,
J. P., LaRow, T., Oouchi, K., Schubert, S., Wang, H., Bacmeister, J., Chang,
P., Chauvin, F., Jablonowski, C., Kumar, A., Murakami, H., Ose, T., Reed,
K. A., Saravanan, R., Yamada, Y., Zarzycki, C. M., Vidale, P. L., Jonas,
J. A., and Henderson, N.: Hurricanes and Climate: The U.S. CLIVAR Working
Group on Hurricanes, B. Am. Meteorol. Soc., 96,
997–1017, <a href="https://doi.org/10.1175/BAMS-D-13-00242.1" target="_blank">https://doi.org/10.1175/BAMS-D-13-00242.1</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>Wang et al.(2022)Wang, Zhao, Huang, and Wang</label><mixed-citation>
      
Wang, Z., Zhao, J., Huang, H., and Wang, X.: A Review on the Application of
Machine Learning Methods in Tropical Cyclone Forecasting, Front. Earth Sci.,  10, 2022, <a href="https://doi.org/10.3389/feart.2022.902596" target="_blank">https://doi.org/10.3389/feart.2022.902596</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>Weyn et al.(2021)Weyn, Durran, Caruana, and
Cresswell-Clay</label><mixed-citation>
      
Weyn, J. A., Durran, D. R., Caruana, R., and Cresswell-Clay, N.: Sub-Seasonal
Forecasting With a Large Ensemble of Deep-Learning Weather Prediction Models,
J. Adv. Model. Earth Sy., 13, e2021MS002502,
<a href="https://doi.org/10.1029/2021MS002502" target="_blank">https://doi.org/10.1029/2021MS002502</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>Wimmers et al.(2019)Wimmers, Velden, and Cossuth</label><mixed-citation>
      
Wimmers, A., Velden, C., and Cossuth, J. H.: Using Deep Learning to Estimate
Tropical Cyclone Intensity from Satellite Passive Microwave Imagery, Mon.
Weather Rev., 147, 2261–2282, <a href="https://doi.org/10.1175/MWR-D-18-0391.1" target="_blank">https://doi.org/10.1175/MWR-D-18-0391.1</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>Yanai(1964)</label><mixed-citation>
      
Yanai, M.: Formation of tropical cyclones, Rev. Geophys., 2, 367–414, 1964.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>Zarzycki and Ullrich(2017)</label><mixed-citation>
      
Zarzycki, C. and Ullrich, P.: Assessing sensitivities in algorithmic detection
of tropical cyclones in climate data, Geophys. Res. Lett., 44, 1141–1149,
2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib61"><label>Zhang and Bao(1996)</label><mixed-citation>
      
Zhang, D.-L. and Bao, N.: Oceanic Cyclogenesis as Induced by a Mesoscale
Convective System Moving Offshore. Part I: A 90-h Real-Data Simulation,
Mon. Weather Rev., 124, 1449–1469,
<a href="https://doi.org/10.1175/1520-0493(1996)124&lt;1449:OCAIBA&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0493(1996)124&lt;1449:OCAIBA&gt;2.0.CO;2</a>, 1996.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib62"><label>Zhang et al.(2019)Zhang, Lin, Lin, Zhang, Yu, Cao, and
Xue</label><mixed-citation>
      
Zhang, T., Lin, W., Lin, Y., Zhang, M., Yu, H., Cao, K., and Xue, W.:
Prediction of Tropical Cyclone Genesis from Mesoscale Convective Systems
Using Machine Learning, Weather   Forecast., 34, 1035–1049,
<a href="https://doi.org/10.1175/WAF-D-18-0201.1" target="_blank">https://doi.org/10.1175/WAF-D-18-0201.1</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib63"><label>Zhang et al.(2015)Zhang, Fu, Peng, and Li</label><mixed-citation>
      
Zhang, W., Fu, B., Peng, M. S., and Li, T.: Discriminating developing versus
nondeveloping tropical disturbances in the western North Pacific through
decision tree analysis, Weather   Forecast., 30, 446–454, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib64"><label>Zhao et al.(2009)Zhao, Held, Lin, and Vecchi</label><mixed-citation>
      
Zhao, M., Held, I. M., Lin, S.-J., and Vecchi, G. A.: Simulations of Global
Hurricane Climatology, Interannual Variability, and Response to Global
Warming Using a 50-km Resolution GCM, J. Climate, 22, 6653–6678,
<a href="https://doi.org/10.1175/2009JCLI3049.1" target="_blank">https://doi.org/10.1175/2009JCLI3049.1</a>, 2009.

    </mixed-citation></ref-html>--></article>
