Intercomparison of four algorithms for detecting tropical cyclones using ERA5

Bourdin, Stella; Fromang, Sébastien; Dulac, William; Cattiaux, Julien; Chauvin, Fabrice

doi:https://doi.org/10.5194/gmd-15-6759-2022

Articles | Volume 15, issue 17

https://doi.org/10.5194/gmd-15-6759-2022

Articles | Volume 15, issue 17

Methods for assessment of models

06 Sep 2022

Methods for assessment of models |

| 06 Sep 2022

Intercomparison of four algorithms for detecting tropical cyclones using ERA5

Stella Bourdin, Sébastien Fromang, William Dulac, Julien Cattiaux, and Fabrice Chauvin

Video abstract

Abstract

The assessment of tropical cyclone (TC) statistics requires the direct, objective, and automatic detection and tracking of TCs in reanalyses and model simulations. Research groups have independently developed numerous algorithms during recent decades in order to answer that need. Today, there is a large number of trackers that aim to detect the positions of TCs in gridded datasets. The questions we ask here are the following: does the choice of tracker impact the climatology obtained? And, if it does, how should we deal with this issue?

This paper compares four trackers with very different formulations in detail. We assess their performances by tracking TCs in the ERA5 reanalysis and by comparing the outcome to the IBTrACS observations database.

We find typical detection rates of the trackers around 80 %. At the same time, false alarm rates (FARs) greatly vary across the four trackers and can sometimes exceed the number of genuine cyclones detected. Based on the finding that many of these false alarms (FAs) are extra-tropical cyclones (ETCs), we adapt two existing filtering methods common to all trackers. Both post-treatments dramatically impact FARs, which range from 9 % to 36 % in our final catalogs of TC tracks. We then show that different traditional metrics can be very sensitive to the particular choice of tracker, which is particularly true for the TC frequencies and their durations. By contrast, all trackers identify a robust negative bias in ERA5 TC intensities, a result already noted in previous studies.

We conclude by advising against using as many trackers as possible and averaging the results. A more efficient approach would involve selecting one or a few trackers with well-known and complementary properties.

Download & links

Article (PDF, 5641 KB)

Download & links

How to cite.

Received: 08 Apr 2022 – Discussion started: 11 Apr 2022 – Revised: 29 Jul 2022 – Accepted: 29 Jul 2022 – Published: 06 Sep 2022

1 Introduction

Assessing whether and how tropical cyclone (TC) activity will evolve with climate change is a crucial but difficult question to tackle. Since the theoretical understanding of these events remains incomplete, and the observations' time span is too short to infer robust trends in their properties, projections of TC activity typically rely on model simulations (Knutson et al., 2019, 2020). In this realm, the main impediment is their limited spatial resolution, which is currently around 100 km for the vast majority of CMIP6 models. This resolution is still too low to simulate realistic TCs (Camargo and Wing, 2016; Roberts et al., 2020 a). However, with the recent advances in computational resources, global simulations with atmospheric spatial resolutions that reach 50–25 km are now feasible and will become more and more common in the future. The few high-resolution model results already published clearly demonstrate a dramatic improvement in simulating TCs (Manganello et al., 2012; Murakami et al., 2015; Walsh et al., 2015; Roberts et al., 2020 a). This avenue is raising hopes in our capacity to better understand these storms and to better predict their future evolution.

Studying TCs in global simulations spanning several decades requires their objective and automatic detection and tracking, which is accomplished by so-called TC trackers. Trackers are algorithms that are able to detect cyclonic structures associated with a warm core in a gridded dataset and link them together into a trajectory. Many modeling and operational centers have developed such trackers independently, and there is now a wealth of such algorithms available to the community and described in the literature (see for example the list compiled by Zarzycki and Ullrich, 2017, in the Appendix of their paper). Broadly speaking, TC trackers can be divided in two main categories: “physics-based” and “dynamics-based” trackers. The former rely on thermodynamical variables. They are based on the detection of a local minimum sea-level pressure (SLP) combined with a warm-core criterion – usually expressed as a temperature anomaly or a geopotential thickness – on top of which discriminating intensity criteria are applied based on surface winds or vorticity. This category includes, for example, the trackers from Camargo and Zebiak (2002), Zhao et al. (2009), Murakami (2014), Horn et al. (2014), or Chauvin et al. (2006) and Zarzycki and Ullrich (2017), hereafter referred to as CNRM and UZ, respectively. “Dynamics-based” trackers, on the other hand, rely on dynamical variables such as vorticity or other derivatives of the velocity. They include the TRACK method (Strachan et al., 2013; Hodges et al., 2017) and the OWZ algorithm (Tory et al., 2013 b). Trackers in the latter category often claim to be resolution-independent (Tory et al., 2013 a). By contrast, the physics-based trackers usually embed a threshold on the 10 m wind: a parameter known to be very sensitive to resolution (Walsh et al., 2007).

Despite this diversity, only a few studies explicitly aim to compare different TC trackers. Horn et al. (2014) were the first to put forward the question of tracker comparison. The authors showed that the results obtained using four physics-based trackers could vary significantly because of the different thresholds and criterion variables used by the different algorithms. Raavi and Walsh (2020) later performed a similar comparison between the CSIRO and OWZ trackers. The OWZ tracker was found to produce better results across a wide range of resolutions, while the CSIRO tracker performed better for the high-resolution datasets.

These studies confirm the naive expectation that different tracking algorithms inevitably have different TC detection skills. As a result, it is often difficult to compare different studies because they use different trackers. For example, future projections of TC frequencies in CMIP5 as reported by Tory et al. (2013 b) and Camargo (2013) are difficult to compare because they used the OWZ tracker and that of Camargo and Zebiak (2002), respectively. Two recent papers by Roberts et al. (2020 a) have tried to circumvent this problem using multiple trackers when analyzing a given dataset and check whether the result is robust, i.e., independent of the tracker (Roberts et al., 2020 a, b). These intercomparisons of a series of HighResMIP simulations (Haarsma et al., 2016) use TRACK and UZ. In both papers, the authors reported large differences between the two trackers in the frequencies of TCs. Nevertheless, they also confirmed robust improvements in TC statistics with spatial resolution regardless of the tracking algorithm they considered. However, a detailed comparison of the two trackers' properties is still lacking at these high spatial resolutions and would improve interpretations of modeling results. The present paper performs such a comparison in order to document the relative strengths and weaknesses of the large variety of trackers presented above, as well as provide guidelines for the use of TC trackers in climate simulation outputs.

This paper reports the results of an intercomparison of four different trackers with properties as different as possible from one another in terms of their formulation. The report is based on a comparison between the tracks detected by these trackers on a reanalysis (ERA5, Hersbach et al., 2020) and those recorded in an observation database, i.e., the International Best Track Archive for Climate Stewardship (IBTrACS, Knapp et al., 2010). This study uses the reanalysis as a bridge between observations and simulation. Our main goal is not to provide an assessment of ERA5 performances in reproducing a given TC climatology but to compare the trackers with one another. Numerous studies have undergone such an assessment on several other reanalyses, including ERA5's predecessor ERA-Interim (Hodges et al., 2017; Schenkel and Hart, 2012; Murakami, 2014; Bell et al., 2018). Only recently, Zarzycki et al. (2021) presented an evaluation of ERA5's TCs against other reanalyses. The study shows that ERA5 performs as well as reanalyses that include specific TC assimilation techniques such as JRA and NCEP, and that a significant improvement is brought about by the increase in resolution between ERA-Interim and ERA5. A comprehensive assessment of TCs in ERA5 will be presented in future work.

The paper is organized as follows: after a description of the classification and datasets, we detail the algorithms of the four trackers as well as our track-matching method (Sect. 2). We then use the four trackers to track TCs in ERA5 and to match the detected tracks with IBTrACS tracks, and we present a detailed analysis of the population of missing and false alarm (FA) tracks so obtained (Sect. 3.1). This knowledge is taken into account to develop two methods common to all trackers that aim to filter extra-tropical FAs from the results (Sect. 3.2 and 3.3). The filtered datasets are then used to analyze the sensitivity of traditional metrics to the choice of the trackers (Sect. 4). Finally, we gather the insight gained from this analysis to consider the complementarity of different trackers and provide some guidelines for applying TC trackers to model results (Sect. 5). The conclusion gives a summary of the trackers' common points and differences (Sect. 6).

Klotzbach et al. (2020)

Table 1Tropical cyclone (TC) intensity classification. Saffir–Simpson Hurricane Scale (SSHS) thresholds are converted into 10 min sustained wind using a 1.12 conversion coefficient.

* This threshold is not in the original classification but has been derived by us using the same method.

Download Print Version | Download XLSX

2 Data and methods

Our analysis combines resources available for both the database of observed TCs, namely IBTrACS (Knapp et al., 2010) and the ERA5 reanalysis (Hersbach et al., 2020). Before describing these two datasets in detail, we first highlight our procedure to classify TCs according to their intensities. We next describe the specifics of the four trackers we compare in this paper, and explain our track-matching method.

2.1 Tropical cyclone (TC) intensities and classification

The TCs are commonly classified on the Saffir–Simpson Hurricane Scale (SSHS) with the peak 1 min near-surface wind (generally at 10 m above the surface). This is different from the World Meteorological Organization (WMO) standard to report the 10 min near-surface sustained wind u₁₀. For that reason, we have chosen to systematically convert 1 min sustained winds to 10 min sustained winds. To do so, we applied the 1.12 coefficient provided by the IBTrACS documentation (Knapp et al., 2010), although we note there are some ambiguities in the precise value one should use for that purpose (Harper et al., 2010). As a result, u₁₀ must exceed 29 m s⁻¹ for a given structure to be classified as a TC, while tropical storms (TS) are defined as storms for which $16 m s^{- 1} < u_{10} < 29 m s^{- 1}$ . The threshold values of u₁₀ for each TC category are reported in Table 1.

In the present paper, we will evaluate TC intensities using their minimum SLP. As discussed in the literature in the past few years, the rationale behind this practice is 2-fold. First, minimum SLP is easier to measure than u₁₀ (Klotzbach et al., 2020), thereby reducing the uncertainty associated with its evaluation. It is also uniformly defined among the different forecast agencies (Knapp et al., 2010), thereby removing the uncertainties associated with the conversion between winds obtained for different averaging periods such as described above. In addition, models tend to be able to reproduce the observed range of the minimum SLP of TCs but fail to simulate the largest wind speeds (Knutson et al., 2015; Chavas et al., 2017). The minimum SLP is a more reliable indicator of TC intensities than wind speeds. This is true in models, but also for ERA5, as recently shown by Zarzycki et al. (2021). Finally, and even if we do not tackle TC damage in this study, it has also been argued that minimum SLP is a better predictor of TC damage than maximum wind speed (Klotzbach et al., 2020).

Simpson and Saffir (1974) provided a version of the SSHS categorization in terms of pressure, but it does not preserve the proportion in categories of the wind scale. Therefore, we rather use the classification from Klotzbach et al. (2020) to compute TC intensity categories. It is reported in Table 1 for completeness.

2.2 Datasets

2.2.1 IBTrACS

The IBTrACS (Knapp et al., 2010) version 4 is the most comprehensive database of observed TCs. We used the “since 1980” subset in the present paper (Knapp et al., 2018). It combines data provided by TC centers of WMO, namely the Regional Specialized Meteorological Centers (RSMCs) and Tropical Cyclone Warning Centers (TCWCs), as well as non-WMO centers, such as the China Meteorological Administration, the Hong Kong Observatory, and the Joint Typhoon Warning Center. Since IBTrACS sources are so diverse, the database is heterogeneous and requires careful treatment before one can safely use it. The steps we followed are summarized below and detailed in a workflow chart (Fig. B1).

This study considers the cyclonic seasons from 1980 to 2019 in the Northern Hemisphere (NH, 40 seasons) and from 1981 to 2019 in the Southern Hemisphere (SH, 39 seasons). We removed seasons after 2019 because they contain provisional tracks. We also filtered out all tracks labeled as “spur” since they correspond to “usually short-lived tracks associated with main track and often represent alternate positions at the beginning of a system [or] actual system interactions”¹. In the remaining tracks, we only kept 6-hourly time steps for consistency with ERA5. Winds and sea-level pressure (SLP) data were retrieved when available, prioritizing the WMO center responsible for the relevant region. Tracks lacking wind data (0.5 % of all tracks) were dropped. Tracks lacking SLP data (7 % of TS intensity tracks) were kept but not be included in those parts of the analysis for which storm intensities are needed. Finally, we removed tracks that do not reach the TS stage (16 m s⁻¹) and those that last less than 1 d.

Hereafter, our selection of IBTrACS data will be referred to as IB-TS. We also define IB-TC as the subset of IB-TS tracks that reached the TC intensity ( $u_{10} > 29 m s^{- 1}$ ). IB-TS (resp. IB-TC) contains 3519 (resp. 1938) tracks.

2.2.2 ERA5

We retrieved data from the fifth generation of ECMWF Reanalysis (ERA5, Hersbach et al., 2020). Hourly estimates of atmospheric variables are provided by ERA5 on a grid with 0.25^∘ horizontal resolution from 1979 to the present day. For the purpose of this paper, we only used 6-hourly data from 1980 to 2019 (as in IBTrACS). We made the choice of using 6-hourly data, considering our final objective, which is to use the trackers on simulations. In simulations, as is customary, we only have 6-hourly data available. However, we checked that the difference it makes is unimportant by running part of the tracking on 1-hourly data.

Unlike other reanalyses such as JRA-55 or NCEP-CFSR, ERA5 does not perform any specific assimilation for TCs (Hodges et al., 2017). Nevertheless, ERA5 has recently been assessed as having similar performances as JRA-55 or NCEP-CFSR for a range of metrics (Zarzycki et al., 2021; Roberts et al., 2020 a). These results motivated our choice to use ERA5 as a test bed to benchmark the detection skills of the four different TC trackers we will now describe.

2.3 TC trackers

In Table B1 we provide a synthesis table of the trackers' criteria and thresholds presented below.

2.3.1 TempestExtremes

TempestExtremes (see https://climate.ucdavis.edu/tempestextremes.php, last access: 22 August 2022) has been developed by Ullrich and Zarzycki (2017) as a command-line software enabling a fast and versatile implementation of TC trackers.

For the tracking of pointwise features, such as TCs, it provides two functions: (i) DetectNodes finds candidates “nodes” corresponding to local extrema of a given variable, and optionally satisfying a set of additional criteria (closed-contours, thresholds); and (ii) StitchNodes links candidates within a given distance of one another into a track. In this paper, we use TempestExtremes to implement two vastly different TC trackers, UZ and OWZ, respectively described by Ullrich et al. (2021) and Tory et al. (2013 c). We describe both algorithms below and provide the associated codes in Appendix C.

2.3.2 UZ algorithm

We implemented the physics-based UZ algorithm in TempestExtremes as described by Ullrich et al. (2021). The thresholds were calibrated by Zarzycki and Ullrich (2017) using sensitivity analysis to several metrics and the data of four reanalysis products. This tracker was referred to as “TempestExtremes” in Roberts et al. (2020 a, b) but we prefer to distinguish between the framework and the tracker formulation itself.

Candidate detection. The first step consists in finding the local minima of SLP. It defines a series of candidate points. In a second step, only those candidates that verify the following two closed-contour criteria are retained:

i.
SLP must increase by 200 Pa over a distance of 5.5^∘ great-circle distance (GCD) from the candidate point;
ii.
Z_300−500 – the geopotential thickness between 300 and 500 hPa – must decrease by 58.8 m² s⁻² over a distance of 6.5^∘ GCD, using the maximum value of Z_300−500 within 1^∘ GCD of the minimum SLP as a reference.

Criterion (i) ensures that the low-pressure region is of sufficient magnitude and coherent. Criterion (ii) verifies that there is an upper-level warm core associated with the local depression. Finally, candidates for which a stronger SLP minimum exists within 6^∘ GCD are eliminated.

Stitching TC tracks. Consecutive candidates are linked together if they lie within 8^∘ GCD of one another. A maximum 24 h gap is allowed in a track, and tracks must last for at least 54 h. Ten 6-hourly time steps (54 h) must also verify the following additional thresholds: $u_{10} \geq 10 m s^{- 1}$ , $| ϕ | \leq 50$ ^∘, z_surf≤150 m, where ϕ and z stand for the latitude and the altitude, respectively. They respectively ensure that the track is of sufficient intensity, located close enough to the Equator, and spends a significant fraction of its lifetime over oceans.

2.3.3 OWZ algorithm

The OWZ algorithm, presented in Tory et al. (2013 c) and assessed using ERA-Interim data by Bell et al. (2018) is based on evaluating the eponymous Obuko-Weiss-Zeta (OWZ) quantity, defined according to

\begin{matrix} (1) & OWZ = max ({OW}_{norm}, 0) \times η \times sign (f), \end{matrix}

where η is the absolute vorticity, the sum of the relative vorticity ζ and the coriolis parameter f, and OW_norm stands for the normalized Obuko-Weiss parameter:

\begin{matrix} (2) & {OW}_{norm} = \frac{ζ^{2} - (E^{2} + F^{2})}{ζ^{2}}, \end{matrix}

in which E and F are the stretching and the shearing deformation, respectively and are given by

E = \frac{\partial u}{\partial x} - \frac{\partial v}{\partial y}, F = \frac{\partial v}{\partial x} + \frac{\partial u}{\partial y} .

Candidate detection. Our implementation of OWZ in TempestExtremes first identifies local maxima of OWZ at 850 hPa. Candidates for which a stronger OWZ maximum exists within 5^∘ GCD are eliminated. Next, only those candidates that satisfy the following six conditions within a distance of 2^∘ GCD of that maximum are retained (with r and q being the relative and specific humidity, respectively, and vws denotes the vertical wind shear between 200 and 850 hPa):

\{\begin{cases} {OWZ}_{850 hPa} & \geq 5 \times 10^{- 5} s^{- 1} \\ {OWZ}_{500 hPa} & \geq 4 \times 10^{- 5} s^{- 1} \\ r_{950 hPa} & \geq 70 % \\ r_{700 hPa} & \geq 50 % \\ q_{950 hPa} & \geq 10 g {kg}^{- 1} \\ vws & \leq 25 m . s^{- 1} . \end{cases}

Stitching TC tracks. Consecutive TC points are stitched together when they lie within a maximum distance of 5^∘ GCD from one another, allowing for a maximum 24 h gap. Additional core thresholds must be reached for at least 9 time-steps (48 h):

\{\begin{cases} {OWZ}_{850 hPa} & \geq 6 \times 10^{- 5} s^{- 1} \\ {OWZ}_{500 hPa} & \geq 5 \times 10^{- 5} s^{- 1} \\ r_{950 hPa} & \geq 85 % \\ r_{700 hPa} & \geq 70 % \\ q_{950 hPa} & \geq 14 g {kg}^{- 1} \\ vws & \leq 12.5 m s^{- 1} . \end{cases}

Finally, tracks that do not reach TS intensity ( $u_{10} = 16 m s^{- 1}$ ) for at least 1 time step are filtered out.

Due to the specifics of the TempestExtremes framework, we note that our implementation differs slightly from the original algorithm described by Tory et al. (2013 c). These modifications, along with the results of a sensitivity study justifying our choices for r_threshold and r_range, are further discussed in Appendix C.

2.3.4 TRACK algorithm

TRACK derives from an extra-tropical cyclone (ETC) tracking algorithm (Hodges, 1994). It is versatile and has since been used to study many types of weather systems, including the detection and tracking of TCs (Bengtsson et al., 2007; Hodges et al., 2017; Roberts et al., 2020 a). The rationale behind TRACK is different from the previously described trackers: because it aims to track all vorticity perturbations, it does not embed any warm-core criterion in its initial fundamental detection. The TC selection, including the warm core test, is only performed in the last step, independently of the tracking. In the present paper, we used the database of trajectories detected by TRACK in ERA5 that was recently published by Roberts et al. (2020 a) without any modification. For completeness, we detail below the thresholds used in that case.

The algorithm is based on ζ_T63(P) which is the relative vorticity at pressure level P, spectrally filtered to retain total wavenumbers 6–63 only, as well as its vertical average from 850 to 600 hPa, hereafter referred to as ${\overline{ζ}}_{T 63}$ . Local extrema of ${\overline{ζ}}_{T 63}$ are detected and the ones for which ${\overline{ζ}}_{T 63} > 5 \times 10^{- 6} s^{- 1}$ define a series of candidate points. Neighboring candidates are then stitched together by minimizing a cost function for track smoothness (Hodges, 1995, 1999). The tracks so obtained must last for at least 2 d and start between 30^∘ S and 30^∘ N.

The presence of a warm core is diagnosed according to the following criteria that must be satisfied for at least 1 d over the ocean:

$ζ_{T 63} (850 hPa) > 6 \times 10^{- 5} s^{- 1}$ .
$ζ_{T 63} (850 hPa) - ζ_{T 63} (250 hPa) > 6 \times 10^{- 5} s^{- 1}$ .
A local maximum of ζ_T63(P) exists at each pressure level.

2.3.5 CNRM algorithm

The CNRM algorithm was developed by Chauvin et al. (2006), and later used in Chauvin et al. (2020) and Cattiaux et al. (2020).

Candidate points are first tracked with the following criteria:

The SLP displays a local minimum which defines the center of the system.
The 850 hPa relative vorticity is larger than $1.5 \times 10^{- 4}$ s⁻¹.
The 850 hPa wind intensity is larger than 5 m s⁻¹.
The sum of the temperature anomalies averaged over the 700, 500, 300 hPa pressure levels is larger than 1 K.
The difference between the 850 and 300 hPa temperature anomalies is smaller than 1 K.
The difference between the 300 and 850 hPa wind intensity is smaller than 5 m s⁻¹.

This detection step is followed by a stitching procedure adapted from Hodges (1994) and detailed in Ayrault (1998). Tracks shorter than 1 d are eliminated. Once TC tracks are obtained, a relaxation step is performed to complete the track life cycle and to detect tracks that were cut into two or more pieces (for example, because of a temporary weakening). This relaxation step is done with a 850 hPa relative vorticity threshold equal to $2.5 \times 10^{- 4}$ s⁻¹.

2.4 Tracks matching

When using reanalysis products like ERA5, detected tracks can tentatively be associated with observed tracks (Murakami, 2014; Hodges et al., 2017; Ullrich et al., 2021). We derived the following matching algorithm: consider the case of a given detected track D composed of n points ( $d_{1}, d_{2}, \dots d_{n}$ ) defined at times ( $t_{1}, t_{2}, \dots, t_{n}$ ). The observations O consist of a database of tracks and can be seen as a collection of points at given times. For each point d_i(t_i) of track D, we associated those points of O at time t_i that are located closer than 300 km from the point d_i. Of course, it is possible that such points do not exist in O. The subset of points of O that have been associated with any point in D is denoted as O_D−paired. It is composed of $| O_{D - paired} |$ elements. There are three possibilities:

$| O_{D - paired} | = 0$ : None of the points of D has been paired to a point in O and D is considered to be an FA.
$| O_{D - paired} | > 0$ and all the points in O_D−paired belong to the same track D_O in O: D_O is considered to be the match of D.
$| O_{D - paired} | > 0$ and the points in O_D−paired belong to more than one track in O: the observed track having the largest number of points paired with D is considered the match of D.

After this matching is completed for all detected tracks, a final treatment is performed: if an observed track is paired with two or more detected tracks, these detected tracks are merged into a single track. Such cases arise when the detected track corresponds to different parts of the same observed tracks and occur when, for example, the TC temporarily weakened while going over an island before strengthening again. In Appendix D, we present a rapid analysis that validates our method.

This matching procedure enables us to label tracks as “Hits” (H), “Misses” (M), and “False Alarms” (FAs). Hits are tracks present in IB-TS and detected in ERA5. Misses are tracks present in IB-TS that were not detected in ERA5. False Alarms are tracks detected in ERA5 that do not correspond to any track in IB-TS. We then used this labeling to define two detection skills metrics, the Probability of Detection (POD, sometimes also presented as HR for “Hit Rate”) and the False Alarm Rate (FAR):

\begin{matrix} (3) & POD = \frac{H}{H + M}, \\ (4) & FAR = \frac{FA}{H + FA} . \end{matrix}

3 A common post-treatment for trackers

We used Eqs. (3) and (4) to calculate the POD and FAR of the four trackers with respect to IB-TS. For UZ, we found a POD of 75 % and an FAR equal to 18 %. These values are almost identical to Zarzycki et al. (2021), who report 78 % and 14 % for their POD and FAR, respectively. Subtle differences in the pre-processing of the IBTrACS data account for this difference (Colin Zarzycki,, personal communication, 2022) but the fact that both PODs and FARs are almost identical validates our implementation of that tracker. For TRACK, we found a POD of 85 % and a FAR equal to 50 %. Both scores are comparable to the values reported by Hodges et al. (2017), who applied TRACK to other reanalyses. We note that the POD we report here is on the higher end of the values found by Hodges et al. (2017), which is consistent with our more restrictive filtering of IBTrACS than Hodges et al. (2017). The OWZ and CNRM trackers display PODs similar to UZ, but their FARs are more heterogeneous and amount to 28 % for OWZ and 60 % for the CNRM tracker.

Overall, the results demonstrate that all trackers can capture most of the observed TCs. Although this is satisfying, we note that a given tracker can miss up to one-fourth of the existing tracks. In addition, as stated above, the FARs are more heterogeneous, and FAs can account for more than half of the detected trajectories. These two caveats call for a better understanding of the properties of both populations. This is the purpose of the following section.

https://gmd.copernicus.org/articles/15/6759/2022/gmd-15-6759-2022-f01

Figure 1Histograms representing the properties of the Hits, the Misses, and the False Alarm (FA) tracks for each tracking algorithm. From left to right, the columns correspond to UZ, OWZ, TRACK, and the CNRM tracker, respectively. The rows correspond from top to bottom to the minimum sea-level pressure (SLP, with the storm categories as defined according to Table 1 shown with vertical gray lines), the latitude at which that value is reached, the month at which that value is reached (solid line in the Northern Hemisphere, and dashed line in the Southern Hemisphere), and finally the track duration. The blue and green colors correspond to the Hits and the Misses, respectively, for all plots. Raw FAs are shown in orange while we plot the FAs that remain after the post-treatment in red (see Sect. 3 for details). The histograms display counts that have not been normalized. Hence, the area under each curve is proportional to the number of tracks in each ensemble.

TC	tropical cyclone
TS	tropical storm
SLP	sea-level pressure
SSHS	Saffir–Simpson Hurricane Scale
NH	Northern Hemisphere
SH	Southern Hemisphere
IBTrACS	International Best Track Archive for
	Climate Stewardship
IB-TS	Tropical storm subset of IBTrACS
IB-TC	Tropical cyclone subset of IBTrACS
ERA5	Fifth European ReAnalysis
UZ	Ullrich & Zarzycki
OWZ	Obuko-Weiss-Zeta
CNRM	Centre National de Recherches
	Météorologiques
FA	false alarm
FAR	false alarm rate
POD	probability of detection

ETC	extra-tropical cyclone
NATL	North Atlantic
WNP	western North Pacific
ENP	eastern North Pacific
SP	South Pacific
NI	North Indian
SI	South Indian
ACE	accumulated cyclonic energy

Intercomparison of four algorithms for detecting tropical cyclones using ERA5

2.1 Tropical cyclone (TC) intensities and classification

2.2 Datasets

2.2.1 IBTrACS

2.2.2 ERA5

2.3 TC trackers

2.3.1 TempestExtremes

2.3.2 UZ algorithm

2.3.3 OWZ algorithm

2.3.4 TRACK algorithm

2.3.5 CNRM algorithm

2.4 Tracks matching

3.1 Missing tracks and false alarm (FA) properties

3.2 Post-treatment: two methods

3.3 Post-treatment: the results

4.1 Trackers' detection skills

4.2 Metrics sensitivity

4.3 Track duration

4.4 Intensity

C1 UZ

C2 OWZ

C2.1 Original algorithm

C2.2 TempestExtremes Adaptation

C2.3 Sensitivity analysis

C2.4 Code