Autoencoder-based feature extraction for the automatic detection of snow avalanches in seismic data
Abstract. Monitoring snow avalanche activity is essential for operational avalanche forecasting and the successful implementation of mitigation measures to ensure safety in mountain regions. To facilitate and automate the monitoring process, avalanche detection systems equipped with seismic sensors can provide a cost-effective solution. Still, automatically differentiating avalanche signals from other sources in seismic data remains challenging, mainly due to the complexity of seismic signals generated by avalanches, the complex signal transmission through the ground, the relatively rare occurrence of avalanches, and the presence of multiple sources in the continuous seismic data. One approach to automate avalanche detection is by applying machine learning methods. So far, research in this area has mainly focused on extracting standard domain-specific signal attributes in the time and frequency domains as input features for statistical models. In this study, we propose a novel application of deep learning autoencoder models for the automatic and unsupervised extraction of features from seismic recordings. These new features are then fed into classifiers for discriminating snow avalanches. To this end, we trained three Random forest classifiers based on different feature extraction approaches. The first set of 32 features was automatically extracted from the time-series signals by an autoencoder consisting of convolutional layers and a recurrent long short-term memory unit. The second autoencoder applies a series of fully connected layers to extract 16 features from the spectrum of the signals. As a benchmark, a third random forest was trained with typical waveform, spectral and spectrogram attributes used to discriminate seismic events. We extracted all these features from 10-second windows of the seismograms recorded with an array of five seismometers installed in an avalanche test site located above Davos, Switzerland. The database used to train and test the models contained 84 avalanches and 828 noise (unrelated to avalanches) events recorded during the winter seasons of 2020–2021 and 2021–2022. Finally, we assessed the performance of each classifier, compared the results, and proposed different aggregation methods to improve the predictive performance of the developed seismic detection algorithms. The classifiers achieved an avalanche f1-score of 0.61 (seismic attributes), 0.49 (temporal autoencoder) and 0.60 (spectral autoencoder) and avalanche recall of 0.68, 0.71 and 0.71, respectively. Overall, the macro f1-score ranged from 0.70 (temporal autoencoder) to 0.78 (seismic attributes). After applying a post-processing step to event-based predictions, the avalanche recall of the three models significantly increased, reaching values between 0.82 and 0.91. The developed approach could be potentially used as an operational, near-real-time avalanche detection system. Yet, the relatively high number of false alarms still needs further implementation of the current automated seismic classification algorithms to be used as unique methods to detect avalanches effectively.
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
The problem is that you have not published the code and data necessary to replicate your manuscript. Our policy clearly states that all the code and data used in a manuscript must be published at the submission time in one of the acceptable repositories listed in our policy, and that the Code and Data Availability section must contain the details (links and DOIs) for such repositories. Instead this section in your manuscript reads "The code and data to develop the final models used in this study will be made available on GitLab and EnviDat"
You have provided internally a internet address containing part of these assets (not all of them, according to my understanding). This is not enough. First, the WSL server is not a repository that complies with the standards required for scientific publication; second, all the information must be available to every potential reader in Discussions to facilitate the peer-review and comments by the community, and sharing it privately with the editors fails to comply with the Discussions peer-review process.
Therefore, please, publish your code in one of the appropriate repositories, and reply to this comment with the relevant information (link and DOI) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy. Therefore, the current situation with your manuscript is irregular.
In this way, if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Also, you must include in a potentially reviewed version of your manuscript the modified 'Code and Data Availability' section, containing the links and DOI of the repository containing code and data.
Finally, in the current git that you have provided for the assets, there is no license listed. If you do not include a license, the code is not "free software/open-source"; it continues to be your property and nobody can use it, despite you make it public. Therefore, when uploading the code and data to the new repository, you should add a license. You could want to choose a free software/open-source (FLOSS) license. We recommend the GPLv3. You only need to include the file 'https://www.gnu.org/licenses/gpl-3.0.txt' as LICENSE.txt with your code. Also, you can choose other options that acceptable repositories provide: GPLv2, Apache License, MIT License, etc.
Juan A. Añel
Geosci. Model Dev. Executive Editor