ImpactETC1.0: impact-oriented tracking of extratropical cyclones with global optimisation and track reconciliation

Agertoft, Niels; Su, Jian; Pedersen, Jonas Wied; Ringgaard, Ida Margrethe; Larsen, Morten Andreas Dahl

doi:10.5194/gmd-19-5641-2026

Articles | Volume 19, issue 12

https://doi.org/10.5194/gmd-19-5641-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/gmd-19-5641-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 19, issue 12

Model description paper

|

29 Jun 2026

Model description paper |

| 29 Jun 2026

ImpactETC1.0: impact-oriented tracking of extratropical cyclones with global optimisation and track reconciliation

Niels Agertoft, Jian Su, Jonas Wied Pedersen, Ida Margrethe Ringgaard, and Morten Andreas Dahl Larsen

Download

Final revised paper (published on 29 Jun 2026)
Preprint (discussion started on 30 Oct 2025)

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-4466', Anonymous Referee #1, 19 Nov 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-4466/egusphere-2025-4466-RC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2025-4466-RC1
- AC2: 'Reply on RC1', Niels Agertoft, 02 Feb 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-4466/egusphere-2025-4466-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-4466-AC2
RC2:
'Comment on egusphere-2025-4466', Anonymous Referee #2, 13 Dec 2025

In this paper, the authors introduce ImpactETC1.0, software that tracks extratropical cyclones (ETCs). They specifically focus on linking detected/tracked storms to observed regional impacts such as storm surges. The authors argue three key novelties to this work: (1) the application of the Hungarian Algorithm to globally optimize the "correspondence problem" of connecting storm centers across timesteps, which reduced "suboptimal connections" compared to traditional greedy algorithms; (2) a "BLOB" analysis technique for track reconciliation that merges broken tracks when storms cross orographically complex regions, and (3) an automated calibration procedure for post-processing parameters using a new metric they termed "Single Storm Score." Applied to historical storm surge events in Denmark using the high-resolution CERRA reanalysis dataset (1991-2020), the code successfully identified a series of impact-relevant ETCs.
In general, the paper is interesting and well-written (some minor typos are noted below). The material is well-suited for GMD. I would contend that the authors slightly overemphasize certain aspects of the work, and there is some additional room for contextualization and framing. More details regarding that are below. However, this critique aside, the ideas are worth considering, and the field of storm tracking more broadly lacks papers that discuss the "nuts and bolts" algorithmic performance and combine that with a discussion of tradeoffs, which this paper does. I would suggest some form of revisions based on the itemized feedback below. However, pending those revisions, I think the paper could be suitable for publication (and relevant for interested researchers) in GMD with some additional work.
Note, I have not tested the software/data, but I have verified that the DOI linking to the Zenodo is valid, and it contains code that appears sound, so this would adhere to EGU's data requirements.
---
Major Theme #1, tracking mechanics during ETC trajectory building
The Hungarian algorithm implementation is interesting, and the argument for potential use cases for it over a greedy nearest neighbor is compelling computationally. As the authors do note, most of the wall clock time of their software is tied up in data ingestion, so even an order of magnitude slower during the "stitching" step is unlikely to have deleterious consequences on the overall workflow.
Regarding the physical motivation for the algorithm application. From a synoptic meteorology perspective, where the greedy algorithm seems to fail (Table 2) is for small pruning radii. This makes intuitive sense because there are far more potential "local minimum" low-pressure centers with a smaller radius with which to merge/eliminate nearby ones.
However, the realized physical spatial scales for ETCs are ~1000km (if not larger, this can be deduced from observation or scale analysis of the Navier-Stokes equations). Given this, I would struggle to find a practical reason why an ETC tracker would need to have such a small pruning radius. If we agree on this, the Hungarian algorithm doesn't objectively buy a lot of skill, because with larger pruning radii (physically consistent with ETC scales), the deviation between the nearest neighbor and the Hungarian Algorithm narrows significantly. However, as noted by the authors, it doesn't cost a lot more, and I think they do a good job of arguing that, without the cost being prohibitive, it is a physically defensible choice. The action item here may be for the authors to discuss how the scales of motion in weather phenomena dictate hyperparameter settings such as the pruning radius. I could imagine a situation where such an algorithm may be more beneficial for "noiser" fields (e.g., tracking individual thunderstorms, cloud tracing, etc.), so perhaps this can be noted as a target for future application.
Finally, the authors discuss "higher-resolution for ETCs" in the final paragraphs. While local scale impacts (e.g., precipitation, specific winds in coastal channels, etc.) may be better resolved, I'd argue that ETC tracking doesn't benefit from high-resolution (and may be better served by coarsening data anyways).
While not necessary for the paper, it would be interesting to see how the algorithm compares to a more established ETC tracker such as TRACK (Hodges) or TempestExtremes (Ullrich). Given the synoptic scales mentioned above, I would assume there is likely minimal difference, particularly for well-defined tracks, between the methods.
Major Theme #2, tracking variable choices and broken track evaluation
For the BLOB analysis, I interpret the code to work such that if I have a point that exists at t = t_1 and another point exists at t = t_3 (or thereabouts) with a large spatial gap, the BLOB area operator allows them to be "glued" together as a single track based on an overlap strategy. Frankly, in many ways, this seems to behave somewhat like Ullrich et al. (2021) and Peréz-Alarcón et al. (2024), both of which the authors mention. Can the authors comment as to the specific, unique benefits of applying such analysis versus allowing time-varying gaps in the "gluing" stage? I.e., does the BLOB method provide materially different results than the simpler logic during stitching of "any ETCs within nhrs of model time and (nhrs * max_travel_dist/hr) in space are considered the same system?"
The reasoning behind the BLOB logic is to deal with the common challenge of the land surface and orography influencing near-surface quantities (particularly problematic for mean sea level pressure (MSLP), as it's a derived quantity based on the elevation and surface pressure in the model). It is my understanding that the design choices of alternative tracking fields (i.e., see the contributions to the Neu BAMS paper that the authors cite) are chosen in large part because of some of the orography-induced challenges the authors point out here. The authors then go on to use MSLP as a core tracking variable (in conjunction with 500mb vorticity). It appears this was done because low-level vorticity fields are noisy (aside: this is not unexpected in high-resolution data, although this can be smoothed as in many of Hodges' papers). However, this does introduce a bit of what I would consider semi-circular logic ("we have a problem of broken tracks to solve, but we choose a tracking variable that has been shown to be prone to broken tracks"). I'll admit I'm playing a bit of devil's advocate here since I prefer tracking on MSLP myself (and MSLP is probably most tied to surface wind/wave forcing relevant for surge, which may be worth pointing out more aggressively in the manuscript), but some context in the manuscript might help smooth out the rationale.
Major Theme #3: Impacts and compound events
It is my understanding that the core of the ETC (i.e., the sea level pressure minimum) must be in the AOR for it to be classified as an impactful storm. However, it is well known that mid-latitude cyclones can have impacts extending far from the core of the storm (e.g., cold fronts). While I assume the authors are most concerned with the surge associated with the wind field near the comma head, it would be good to discuss this a bit more. For example, such a technique may struggle if applied to ETCs that produce precipitation impacts, as such impacts can be rather disconnected from the storm's dynamical core (e.g., see atmospheric rivers tied to mid-latitude cyclones).
Along the same line, I was a bit disappointed that an impact-focused paper didn't really discuss the hazards themselves. E.g., in this high-resolution model, one could easily analyze the wind field that is the driver of the surge in the region and see how this is tied to the ETC centers that are detected. The authors wouldn't necessarily need to run a hydrodynamical model, but just evaluate the magnitude and spatial extent of the wind impinging on the shoreline. In fact, I would argue that this is a key drawback of pointwise Lagrangian analysis; 1-D tracks lack information about the spatiotemporal evolution of the 3D atmosphere, land, and ocean during relevant extreme weather events. I think it would be a nice addition to the paper (a qualitative case analysis, not unlike what was taken with a handful of events) and could improve the "impact" (no pun intended) the work makes on the community. However, if the authors do not wish to undertake such an endeavor, it should be discussed more at the end of the manuscript.
Major Theme #4: Semantical accuracy
I have slight issues with phrasing like "several innovative components." A definition of "innovative" is "introducing new, valuable ideas, methods, products, or services that are original, solve problems, and improve upon the status quo, often by creatively combining existing elements or challenging norms to deliver fresh solutions and create impact." I will admit, I find the Hungarian optimization the most novel component of the work. I consider the BLOB analysis (see above) to be a distinctive slant on previous strategies aimed at addressing track gaps in space and time. The post-processing optimization can broadly be considered a basic grid-search hyperparameter tuning that prioritizes having a singular event in the impacts domain (i.e., it's computationally efficient because it's only modifying a small subset of hyperparameters). I'll also point out that, tied to the above, the hypothesis implicit in the optimization is to enforce the notion that a singular event leads to specific hazards. For very local scales, that may be the case, but there is a large volume of literature surrounding compound extremes, including multiple events/stressors amplifying a hazard (e.g., flooding).
While I think all of the above are worth publishing, I think the authors should take care not to deemphasize existing work in the space via the use of what I would consider somewhat loaded terms like "novel" and "innovative."
---
Minor comments:
In general, I found the paper to be relatively free of typographical errors. I did see some spellings like "artefacts" that I think are less common usage, but Google assures me they are valid. As always, I encourage the authors to do a thorough read-through after revision, but I commend them on the attention to detail in the initial submission.
Figure 1. Some more geographic information (e.g., lat/lon ticks) would help in the left panel.
Line 137. Figure 3 should be capitalized.
Is Fig. 5a missing some of the inland points over Norway? (e.g., black 16/17 from Fig. 6)
There are many examples where the possessive is applied to plural acronyms/words (e.g., ETC's in lines 380-384). These should not have an apostrophe (apostrophe only used for possessive or contractions).
In Figs. 5 and 6, the colormap is inappropriately washed out on the strong cyclone in the NE corner. Adjust the color scale.
For Figs. 5 and 6, I also wonder if a different colormap would be beneficial (perhaps a perceptually uniform colormap to avoid color-blind issues). Red-blue implies biases or differences visually.
Fig 10. The inset is a bit small; I would try to make it larger. Also, it may be worth trying to add some more color contrast (e.g., coloring the land mass in the inset brown instead of gray or something to highlight it's a distinct figure from the parent it overlays.

Citation: https://doi.org/10.5194/egusphere-2025-4466-RC2
- AC1: 'Reply on RC2', Niels Agertoft, 02 Feb 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-4466/egusphere-2025-4466-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-4466-AC1

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Niels Agertoft on behalf of the Authors (17 Mar 2026) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (21 Mar 2026) by Paul Ullrich

RR by Anonymous Referee #1 (03 Apr 2026)

RR by Anonymous Referee #2 (27 Apr 2026)

Suggestions for revision or reasons for rejection

I appreciate the authors taking the time to consider the reviews and provide both a response and an updated manuscript. The revised manuscript is (in my eyes) materially improved. In particular, I believe the authors have strengthened the discussion around impact attribution and compound-event ambiguity, clarified the purpose and limitations of their BLOB reconciliation approach for terrain-driven track fragmentation, and improved the algorithmic framing of assignment choices (Hungarian vs greedy/nearest-neighbor) by focusing more on practical performance and behavior rather than implying meteorological "ground truth."

The added gradient tracing description is interesting; I like the authors exploring different ways for defining cyclones driving impacts. I could still see failure modes here (e.g., noisy sea level pressure fields which can occur in less diffusive models, sub-cyclone mesoscale pressure minima in very high-res simulations), but the authors do discuss some of these at the end of the manuscript, which I thought framed things well.

I still have some remaining comments below, mostly around further phrasing and clarification. These are primarily about claim discipline, a clear definition of what TA measures, a small amount of additional guidance for tuning Dmax, and some other small typos/phrasing tweaks. I broke them down in "broader comments" and "smaller ones" --I do think they should be considered, but assuming they are addressed in a reasonable fashion, I would expect the manuscript to be close to publishable.

*** Broader comments

BC1: "Accuracy" framing

I still think some moderation of remaining claims of "accuracy" without independent track validation is warranted.

While several sections are now more careful, some headline statements still imply that the framework delivers "accurate" tracking in a general meteorological sense. The evaluation presented primarily demonstrates:
- Differences in assignment behavior (Hungarian vs greedy),
- Changes in continuity/fragmentation and track statistics (including reconciliation effects),
- Improved event association in the sense of returning a plausible number of storms per impact event (your TA-like framing, see below).

These are valuable, but they do not constitute a direct validation of trajectory-level accuracy (i.e., position, intensity evolution, lifecycle) against an external reference (e.g., manual analysis, independent tracker comparison, or reanalysis-based synoptic verification). I would recommend replacing language such as "accurate tracking" with more precise claims consistent with what is demonstrated, for example, "more continuous / less fragmented tracks" or "improved practical association of impacts to candidate storms."

BC2: Clarify what "True Accuracy" (TA) measures and what it does not

The TA-style metric (as described in the revision) appears to quantify whether the method returns the correct number of storms per impact event (or within an AoR and time window), based on manual labeling of storm count. This is a useful metric for impact association. However, most readers (see above) will interpret "accuracy" as track-path correctness or physical attribution of the forcing mechanism. I suggest adding a few lines at the first introduction of TA, explaining being very explicit that it evaluates storm-count attribution (number of relevant tracks per event) rather than track-path accuracy or causal hazard attribution. Also, ensure subsequent discussion uses terminology consistent with this meaning (for example, "count correctness" or "event association accuracy" rather than "tracking accuracy").

BC3: Provide clearer guidance or minimal sensitivity regarding Dmax (and other key hyperparameters)

The revision improves the narrative around parameter choices and how they might impact results (i.e., the "tuning" problem). However, I still have some questions about Dmax... it is simultaneously justified as a pragmatic allowance for terrain-driven discontinuities but later shown to be insufficient in some cases. This is not a flaw, as the authors note, but it means the manuscript should offer readers practical guidance for tuning Dmax in other datasets and time steps. Frankly, this will greatly increase the reproducibility and portability of the framework if people download the code and use it themselves for their applications. My gut tells me just acknowledging this is worthwhile, although a minimal sensitivity analysis (even for a subset): show fragmentation rates or track continuity metrics for Dmax = 200, 300, 400 km (or similar) would be interesting.

*** Typos and formatting suggestions.

Line 6: "includes several novel" -- again, I am not sure I'd call these features "novel". Might just say "... includes several algorithmic..."

Lines 93-94: Similarly, novel is used twice in succession: "Motivated by these challenges, we introduce a novel ETC tracking framework designed to enhance the relevance of ETC tracks for on-the-ground impact assessments. The new framework contains several scientific novelties:" -- I might replace the first novel with "new," maybe the second one could be "developments".

Line 111: Is the native CERRA grid something e.g., Lambert Conformal and not regular lat/lon? Might be worth just adding what type of grid is being interpolated from.

I think it's probably also worth pointing out that the method (as described) requires a Cartesian grid. The authors may feel this is self-evident, but with the growing adoption of unstructured meshes in the climate modeling community, it is worth noting. Developing a method that performs well on unstructured meshes (i.e., without the need for regular latitude-longitude data) might be a useful target for future work.

Line 155: For this step, would it be possible to make the feature detection stage embarrassingly parallel since the correspondence problem is only solved after all timesteps have been analyzed? While I can imagine 5km cells to be "expensive," with a standard HPC for the current year, this might be more feasible.

Line 185: I might call this "small pruning radii" instead of "lax pruning."

Line 187: I am surprised the local minima values are exactly the same, which is difficult even with single precision. I suppose keeping both in these cases is fine, but functionally (and from a meteorological perspective), I do not see how it is different than applying a random choice.

Line 243: The first track-breaking mechanism can sometimes be mitigated by allowing temporal gaps during stitching; see Ullrich et al. (2021), which is already cited.

Line 253: I am not sure the word "hypothetical" is needed here, since this is commonly how sea level pressure correction is applied operationally.

Line 254: Consider adding a reference supporting why/when SLP reduction breaks down over complex terrain (physical reasoning and prior documentation).

Line 316: "We note that the AoR is centred on 60◦ lat, 15◦N and initially spans from 50 − 70◦ lat and 0 − 30◦E." This seems wrong for a few reasons. One, I think they mean 60N, 15E, but also it should be "N" and not "lat."

Figure 5. Consider slightly reducing the contour density, which makes a lot of noise over the Alps, Turkey, N. Africa, etc. I would also suggest the storm track centers be made a different color (blue? purple?) to better stand out against the underlying shading. That or the points should be larger with a bolder outline.

Figs 6b and 7: Why is there a white patch in the middle of the ETC in the top right (northeast) corner?

Table 5: Consider reducing precision (fewer decimal places) to improve readability.

Hide

ED: Reconsider after major revisions (28 Apr 2026) by Paul Ullrich

AR by Niels Agertoft on behalf of the Authors (21 May 2026) Author's response Author's tracked changes Manuscript

ED: Publish as is (23 May 2026) by Paul Ullrich

AR by Niels Agertoft on behalf of the Authors (02 Jun 2026) Author's response Manuscript

Short summary

Extratropical cyclones (ETCs) drive severe weather and cause significant socio-economic impacts. We present ImpactETC1.0, a framework that identifies ETC tracks and links them to local impacts, here storm surges. It uses global optimisation, BLOB analysis, and several post-processing options to improve tracking quality and identify impact-relevant tracks. Results show ImpactETC1.0 enables efficient, impact-focused ETC tracking.