I appreciate the authors taking the time to consider the reviews and provide both a response and an updated manuscript. The revised manuscript is (in my eyes) materially improved. In particular, I believe the authors have strengthened the discussion around impact attribution and compound-event ambiguity, clarified the purpose and limitations of their BLOB reconciliation approach for terrain-driven track fragmentation, and improved the algorithmic framing of assignment choices (Hungarian vs greedy/nearest-neighbor) by focusing more on practical performance and behavior rather than implying meteorological "ground truth."
The added gradient tracing description is interesting; I like the authors exploring different ways for defining cyclones driving impacts. I could still see failure modes here (e.g., noisy sea level pressure fields which can occur in less diffusive models, sub-cyclone mesoscale pressure minima in very high-res simulations), but the authors do discuss some of these at the end of the manuscript, which I thought framed things well.
I still have some remaining comments below, mostly around further phrasing and clarification. These are primarily about claim discipline, a clear definition of what TA measures, a small amount of additional guidance for tuning Dmax, and some other small typos/phrasing tweaks. I broke them down in "broader comments" and "smaller ones" --I do think they should be considered, but assuming they are addressed in a reasonable fashion, I would expect the manuscript to be close to publishable.
*** Broader comments
BC1: "Accuracy" framing
I still think some moderation of remaining claims of "accuracy" without independent track validation is warranted.
While several sections are now more careful, some headline statements still imply that the framework delivers "accurate" tracking in a general meteorological sense. The evaluation presented primarily demonstrates:
- Differences in assignment behavior (Hungarian vs greedy),
- Changes in continuity/fragmentation and track statistics (including reconciliation effects),
- Improved event association in the sense of returning a plausible number of storms per impact event (your TA-like framing, see below).
These are valuable, but they do not constitute a direct validation of trajectory-level accuracy (i.e., position, intensity evolution, lifecycle) against an external reference (e.g., manual analysis, independent tracker comparison, or reanalysis-based synoptic verification). I would recommend replacing language such as "accurate tracking" with more precise claims consistent with what is demonstrated, for example, "more continuous / less fragmented tracks" or "improved practical association of impacts to candidate storms."
BC2: Clarify what "True Accuracy" (TA) measures and what it does not
The TA-style metric (as described in the revision) appears to quantify whether the method returns the correct number of storms per impact event (or within an AoR and time window), based on manual labeling of storm count. This is a useful metric for impact association. However, most readers (see above) will interpret "accuracy" as track-path correctness or physical attribution of the forcing mechanism. I suggest adding a few lines at the first introduction of TA, explaining being very explicit that it evaluates storm-count attribution (number of relevant tracks per event) rather than track-path accuracy or causal hazard attribution. Also, ensure subsequent discussion uses terminology consistent with this meaning (for example, "count correctness" or "event association accuracy" rather than "tracking accuracy").
BC3: Provide clearer guidance or minimal sensitivity regarding Dmax (and other key hyperparameters)
The revision improves the narrative around parameter choices and how they might impact results (i.e., the "tuning" problem). However, I still have some questions about Dmax... it is simultaneously justified as a pragmatic allowance for terrain-driven discontinuities but later shown to be insufficient in some cases. This is not a flaw, as the authors note, but it means the manuscript should offer readers practical guidance for tuning Dmax in other datasets and time steps. Frankly, this will greatly increase the reproducibility and portability of the framework if people download the code and use it themselves for their applications. My gut tells me just acknowledging this is worthwhile, although a minimal sensitivity analysis (even for a subset): show fragmentation rates or track continuity metrics for Dmax = 200, 300, 400 km (or similar) would be interesting.
*** Typos and formatting suggestions.
Line 6: "includes several novel" -- again, I am not sure I'd call these features "novel". Might just say "... includes several algorithmic..."
Lines 93-94: Similarly, novel is used twice in succession: "Motivated by these challenges, we introduce a novel ETC tracking framework designed to enhance the relevance of ETC tracks for on-the-ground impact assessments. The new framework contains several scientific novelties:" -- I might replace the first novel with "new," maybe the second one could be "developments".
Line 111: Is the native CERRA grid something e.g., Lambert Conformal and not regular lat/lon? Might be worth just adding what type of grid is being interpolated from.
I think it's probably also worth pointing out that the method (as described) requires a Cartesian grid. The authors may feel this is self-evident, but with the growing adoption of unstructured meshes in the climate modeling community, it is worth noting. Developing a method that performs well on unstructured meshes (i.e., without the need for regular latitude-longitude data) might be a useful target for future work.
Line 155: For this step, would it be possible to make the feature detection stage embarrassingly parallel since the correspondence problem is only solved after all timesteps have been analyzed? While I can imagine 5km cells to be "expensive," with a standard HPC for the current year, this might be more feasible.
Line 185: I might call this "small pruning radii" instead of "lax pruning."
Line 187: I am surprised the local minima values are exactly the same, which is difficult even with single precision. I suppose keeping both in these cases is fine, but functionally (and from a meteorological perspective), I do not see how it is different than applying a random choice.
Line 243: The first track-breaking mechanism can sometimes be mitigated by allowing temporal gaps during stitching; see Ullrich et al. (2021), which is already cited.
Line 253: I am not sure the word "hypothetical" is needed here, since this is commonly how sea level pressure correction is applied operationally.
Line 254: Consider adding a reference supporting why/when SLP reduction breaks down over complex terrain (physical reasoning and prior documentation).
Line 316: "We note that the AoR is centred on 60◦ lat, 15◦N and initially spans from 50 − 70◦ lat and 0 − 30◦E." This seems wrong for a few reasons. One, I think they mean 60N, 15E, but also it should be "N" and not "lat."
Figure 5. Consider slightly reducing the contour density, which makes a lot of noise over the Alps, Turkey, N. Africa, etc. I would also suggest the storm track centers be made a different color (blue? purple?) to better stand out against the underlying shading. That or the points should be larger with a bolder outline.
Figs 6b and 7: Why is there a white patch in the middle of the ETC in the top right (northeast) corner?
Table 5: Consider reducing precision (fewer decimal places) to improve readability. |