the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Explaining neural networks for detection of tropical cyclones and atmospheric rivers in gridded atmospheric simulation data
Abstract. Detection of atmospheric features in gridded datasets from numerical simulation models is typically done by means of rule-based algorithms. Recently, also the feasibility of learning feature detection tasks using supervised learning with convolutional neural networks (CNNs) has been demonstrated. This approach corresponds to semantic segmentation tasks widely investigated in computer vision. However, while in recent studies the performance of CNNs was shown to be comparable to human experts, CNNs are largely treated as a “black box”, and it remains unclear whether they learn the features for the correct reasons. Here we build on the recently published “ClimateNet” dataset that contains features of tropical cyclones and atmospheric rivers as detected by human experts. We adapt the explainable artificial intelligence technique “Layer-wise Relevance Propagation” (LRP) to the feature detection task and investigate which input information CNNs with the Context-Guided Network (CG-Net) and U-Net architectures use for feature detection. We find that both CNNs indeed consider plausible patterns in the input fields of atmospheric variables, which helps to build trust in the approach. We also demonstrate application of the approach for finding the most relevant input variables and evaluating detection robustness when changing the input domain. However, LRP in its current form cannot explain shape information used by the CNNs, and care needs to be taken regarding the normalization of input values, as LRP cannot explain the contribution of bias neurons, accounting for inputs close to zero. These shortcomings need to be addressed by future work to obtain a more complete explanation of CNNs for geoscientific feature detection.
- Preprint
(7273 KB) - Metadata XML
-
Supplement
(17282 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
CEC1: 'Comment on gmd-2024-60', Juan Antonio Añel, 14 Jun 2024
Dear authors,
After checking your manuscript, we have detected a problem in the compliance of your submission with the Code and Data policy of the journal.
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
Your work heavily relies on the ClimateNet dataset; however, this dataset is made available through a link to a webpage that does not comply with the minimum requirements to be considered a trustable long-term repository. In this way, we have to request you that you store and make available the ClimateNet data that you use for your work in one of the acceptable repositories according to our policy, with a DOI. Therefore, please, publish it, and reply to this comment with the relevant information (link and DOI) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
Also, you must include in a potentially reviewed version of your manuscript the modified 'Code and Data Availability' section, including this new information.
Regards,
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/gmd-2024-60-CEC1 -
AC1: 'Reply on CEC1', Tim Radke, 16 Jul 2024
Dear Juan A. Añel,
Thank you for your comment. We have been trying to contact the authors of the original ClimateNet GMD paper, as we think they should have the opportunity to publish their own data under a DOI themselves. Unfortunately, we have not been able to reach the authors. If they do not reply until the end of this discussion period we will make the data available under a DOI ourselves.
Best Regards,
Tim Radke, on behalf of the authorsCitation: https://doi.org/10.5194/gmd-2024-60-AC1
-
AC1: 'Reply on CEC1', Tim Radke, 16 Jul 2024
-
RC1: 'Comment on gmd-2024-60', Anonymous Referee #1, 27 Aug 2024
Summary
The authors adapt the explainable AI technique "Layer-wise Relevance Propagation" (LRP) to the semantic segmentation task of detecting atmospheric rivers and tropical cyclones from atmospheric data using convolutional neural networks. LRP enable authors to assign a relevance score to weight the contribution of each pixel in the image in classifying an area to be of a certain type, assessing whether the pixel contributed towards the classification output, against it, or was irrelevant. The usage of LRP was explored through a case study, an analysis of results over the dataset, and a few examples of applications. The performed analysis was interesting and the research questions were clearly formulated. The demonstrated application areas are relevant to geosciences and clearly add to the value of the manuscript. Despite the analysis being generally satisfying, there some inconsistencies and issues that demand attention before publication.
General comments
- Inset plots are confusing to the eye, as one tends not to see the bounding boxes drawn because of the grid lines. Also, coastlines are drawn with the same width in the inset and outer plot, reducing the impression that one is looking at a zoomed panel. Perhaps the plots could be made a bit easier to look at.
- 219 time steps leading to 459 total mappings is a very small dataset size for deep learning, that will definitely limit the patterns that are learnable without over-fitting a neural network consisting of millions of parameters. Although this issue has probably already been talked about in preliminary works using the same dataset, I believe it merits at least a short discussion in the context of analyzing the spatial patterns relevant to decisions made.
- Since the authors motivate xAI methods for deep learning with the network learning spurious patterns leading to generalization errors, it would have been interesting to look at a case where a detected area is a complete false positive / false negative in the test set, or if there were none, maybe also how the uncertainty in mappings for a specific area related to the certainty of the network, especially as it was claimed that absolute network output magnitude relate to network certainty (L217-218).
- The U-Net and CG-Net implementations both use batch normalization after convolutional layers, at the very least in most of the modules. This has for effect to negate the effect of the bias in convolutional layers (check Ioffe and Szegedy (2015), p. 5). In addition, the CG-Net implementation doesn't apply biases to any of its convolutional layers. This is inconsistent with the manuscript making it seem like the networks used are heavily reliant on the bias terms of layers. Could the authors provide an explanation and a corrected reasoning for their results?
Specific comments
- L50: Some clarification would be welcome to what is meant by "correct patterns in data".
- L212: Is it a possible problem to sum the contribution of different pixels in a shape when some of these pixels may have relevance scores of the opposite sign for a specific relevance location ? Is there a risk for losing information here?
- L217-218: A good reference to the claim that unnormalized network output magnitude is related to network certainty is needed, as this is used to justify a central design decision.
- Regarding the selection of most relevant input variables (Application 1), it likely is a problem that there is no separate validation and test set. A more correct experimental setup would have assessed the relevance of each variable on a validation set and assessed the retrained networks on a separate test set. However, it can be understood that this was not done because of the already limited size of the dataset. A few sentences on the potential implications of this on the validity of the results could be added.
Citation: https://doi.org/10.5194/gmd-2024-60-RC1 -
RC2: 'Comment on gmd-2024-60', Anonymous Referee #2, 30 Aug 2024
Publisher’s note: this comment was edited on 2 September 2024. The following text is not identical to the original comment, but the adjustments were minor without effect on the scientific meaning.
General comments
The paper "Explaining neural networks for detection of tropical cyclones and atmospheric rivers in gridded atmospheric simulation data" is a valuable contribution to the field of geoscientific machine learning. Despite the suggested typographical and stylistic corrections, the core of the research is solid and presents meaningful advances in the understanding and interpretation of CNNs for meteorological applications, as well as incorporating another AI technique, "layer-wise relevance propagation" (LRP). The document often uses complex sentences that could be simplified for better readability, and the use of acronyms of references within the text is difficult to follow, although introduced, as the reader may be confused to dintinguish between acronyms of mathematical techniques, software or references. Check the in-text citations section (https://www.geoscientific-model-development.net/submission.html ) to follow the format for this type of citation.
Ensure that italics are not used sparingly throughout the text.
The abstract effectively summarises the paper but could better highlight the novel contributions. Mentioning specific findings or results in more detail would strengthen the impact, e.g. stating which were the most relevant input variables found. Consider adding some quantitative elements in the concluding sentences to replace general sentences such as "...which helps to build trust in the approach" with a relevant statistic.
The introduction is well structured but somewhat dense. It could be improved by breaking up longer sentences into shorter, more digestible parts for the reader.
Specific comments
L12 Recently, also the feasibility of learning → Recently, the feasibility of learning feature detection tasks using
L33 “Features are typically objectively detected based on a set of physical and mathematical rules-”→“Features are typically detected based on a set of physical and mathematical rules”L34 For example, cyclones can be identified by means of searching for minima or maxima in variables including mean sea level pressure and lower-tropospheric vorticity → For example, cyclones can be identified by searching for minima or maxima in variables including mean sea level pressure and lower-tropospheric vorticity
L38 Recent research, however, has shown that given a pre-defined labelled dataset… → Consider "Recent research has shown that, given a pre-defined labelled dataset…"
Line 50: CNN --> CNNs
Line 120: "contained the P21" should be "contained in the P21."
Line 410: "linearly scales all inputs to the positive range [0..1]" should be "linearly scales all inputs to the range [0, 1]."
L165 "For instance, a batch size of 10 reduces training time for a single run by 10% while achieving similar evaluation results.": The phrase "achieving similar evaluation results" might be clearer, asuggestion could be "without significantly deviating from the evaluation results."
Line 195: "In our case, activation 𝑎 and relevance 𝑅 are 3-D grids with size of the current layer-dependent horizontal grid times the number of classes" is a confusing sentence. Consider rephrasing for clarity.
L203 referred to by M22 as the 'ignorant-to-zero-input issue
L601 geoscientific science community → geoscientific community
Acknowledgements
Put this section based on the author contibution from:
https://www.geoscientific-model-development.net/submission.html. Names in acronyms, I.e Jhon Smith → (JS)About reference section
Add at least one reference from 2024
Citation: https://doi.org/10.5194/gmd-2024-60-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
484 | 170 | 25 | 679 | 36 | 17 | 20 |
- HTML: 484
- PDF: 170
- XML: 25
- Total: 679
- Supplement: 36
- BibTeX: 17
- EndNote: 20
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1