the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A deep learning method for convective weather forecasting: CNN-BiLSTM-AM (version 1.0)
Abstract. This work developed a CNN-BiLSTM-AM model for convective weather forecasting using deep learning algorithms based on reanalysis and forecast data from the NCEP GFS, the performance of the model was evaluated. The results show that: (1) Compared to traditional machine learning algorithms, the CNN-BiLSTM-AM model has the ability to automatically learn deeper nonlinear features of convective weather. As a result, it exhibits higher forecasting accuracy on the convective weather dataset. Furthermore, as the forecast lead time increases, the information value provided by this model also changes. (2) In comparison to subjective forecasts by forecasters, the objective forecasting approach of the CNN-BiLSTM-AM model demonstrates advantages in metrics such as Probability of Detection (POD), False Alarm Rate (FAR), Threat Score (TS), and Missing Alarm Rate (MAR). Specifically, the average TS score for heavy precipitation reaches 0.336, which is a 33.2 % improvement compared to the forecaster's score of 0.252. Moreover, due to the CNN-BiLSTM-AM model's ability to automatically extract classification features based on a large sample dataset and consider a comprehensive range of convective parameters, it effectively reduces the FAR. (3) The interpretability study of the machine learning-based convective weather mechanism reveals that the importance ranking of convective weather forecasting factors arranged by machine learning methods largely aligns with the subjective understanding of forecasters. For example, the total precipitable water (PWAT) is identified as a critical factor for short-term heavy precipitation forecasting, regional factors have significant impacts on convective weather, and vertical motion at 300 hPa provides dynamic lifting conditions for convection. This objective analysis of factor ranking not only further confirms the effectiveness of machine learning in automatically extracting convective weather features but also validates the rationality of the sample set construction. Overall, the use of the CNN-BiLSTM-AM model in convective weather forecasting demonstrates superior performance compared to traditional machine learning algorithms and subjective forecasting methods.
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(2376 KB)
Interactive discussion
Status: closed
-
RC1: 'Comment on gmd-2023-187', Anonymous Referee #1, 05 Dec 2023
-
CC1: 'Reply on RC1', Yubin Li, 26 Dec 2023
Dear Reviewer:
Thank you very much for your valuable comments and suggestions. Your insightful feedback has been instrumental in improving the overall quality of our work.
We have thoroughly considered each comment, and our detailed responses to each one can be found in the attached document.
Once again, thank you very much for the time and effort you dedicated to reviewing our article.
Kind regard,
Yubin Li
-
CC1: 'Reply on RC1', Yubin Li, 26 Dec 2023
-
RC2: 'Comment on gmd-2023-187', Anonymous Referee #2, 25 Dec 2023
In this work, author use deep learning algorithms to develop a framework including CNN, BiLSTM, and AM for convective weather forecasting, called CNN-BiLTS-AM method. The approach is novel. The NCEP FNLS analysis datasets were exploited as inputs in which several factors related to meteorology, convective physical quantities, and geographical variables were selected by experts. In addition, station measurements were collected and used for model developments (?) and validation purpose. The training and testing datasets were from 2015 – 2020 in which data of randomly selected 72 days use for testing and the remained data used for training. In order to evaluate the proposed method’s performance, different machine learning methods such as KNN, RF, GBDT, SVM, and a physical model (WRF) were implemented, and results were compared in testing dataset and an individual case. The comparison results showed that the CNN-BiLSTM-AM overperformed other algorithms in the test cases.
However, I have major comments as follows:
- The problem statement is not clear. Convection weather forecasting is related to many events. As I understand, evaluation is mainly focused on forecasting of precipitation rate (section 4.2, 4.4, 5.1) and precipitation occurrences (section 4.3). In fact, it is two different problems: classification and regression. In this work, please state clearly what is output?
- It is also not clear that author develop one, two, or many models to solve the given problems and how to build models. There is no problem with a general deep learning framework as presented in section 3.1. However, loss functions, which are very important in DL, depend on output designs and number of models. How many models were developed and how were corresponding outputs and loss functions in this work?
- There is also inconsistence in construction of training and testing datasets which is key points for deep learning/machine learning algorithms. In section 3.2, authors only mentioned to classification problem. In addition, it is not clear what is used as ground truths to label a grid as positive/negative sample. If all 2400 station measurements in China were collected, the model should be built for all China. However, in experimental area is Henan region (section 2.3) with only 12 stations. Were authors use station precipitation measurements to label positive/negative samples? How many stations were used and its location and distribution? Please report also detail number of training and testing samples for this case? (line 223-225)
- The dividing of training and testing datasets are not suitable for this problem. Forecasting is for future, so the use of one- or two-year data for testing is more independent. For example, data from 2015 – 2020 can be divided into training dataset (2015 – 2019) and testing dataset (2020). The current division based on randomly selection can bring very good results on modelling but not correct for future and independent datasets.
- In section 3.2, the questions mentioned above are also rising for training and testing datasets for regression problem (precipitation estimation). Author needs to add this information to this section.
- In section 4.1, many evaluation indexes for regression were not defined. Example: R2, RMSE.
- In section 4.2, the test dataset is from 2015 – 2017 (line 243). It is different from the description of training/testing datasets in Section 3.2 (2015 – 2020, line 222). Also, please point out the predicted parameter is hourly (or daily) precipitation and forecast duration. The observation data is collected from how many ground stations and where they are.
- The biggest concern is inconsistent and loose logic on the article results.
- In section 4.2, in Figure 8, the boxplots were used to present predicted precipitation. However, this description cannot point out the outperformance of the proposed method and RF because they are likely to each other. Moreover, there is confliction of results in Figure 8 and Figure 5. In Figure 5, the model is underestimated precipitation while in Figure 8, the model is overestimated precipitation. Please explain why.
- In figure 9, it is not clear that diurnal variation is calculated at one station, or averages on 12 stations in Henan. If the dataset is the same as in Figure 8, there is disagreement on results of boxplot and line chart which present performance of CNN-BiLSTM-AM. In Figure 9, error of the proposed method is almost equal to 0 while in figure 8, there is difference of prediction and observation presented by boxplot. Please explain why.
- In Figure 10, the RF (b) should have lower error than CNN-BiLSTM-AM (a) based on color scales, and it is conflict with above conclusions. Following this concern, why RF is not used for stability analysis in Section 5.1.
Citation: https://doi.org/10.5194/gmd-2023-187-RC2 -
CC2: 'Reply on RC2', Yubin Li, 28 Dec 2023
Dear Reviewer:
Thank you very much for your helpful comments. We have replied to each of the review comments in our response. And please find our responses in the attached file.
Additionally, we will continue to revise the paper based on these review comments.
Once again, we deeply appreciate your valuable time and effort.
Kind regards,
Yubin Li
-
AC1: 'Comment on gmd-2023-187', Zhiqiu Gao, 08 Jan 2024
Dear editor,
Thank you for your time and effort in handling our manuscript.
Previously, in December 2023, we provided preliminary replies to the issues raised by the reviewers within each thread of their comments through Dr. Yubin Li's account.
We have now prepared a fully revised version of the manuscript incorporating all of the reviewers' suggestions, and including point-to-point responses.
Following the review process, we will upload the revised manuscript and the point-to-point responses in the next step.
Once again, we sincerely appreciate your valuable time and effort.
Best regards,
Zhiqiu Gao
On behalf of all authorsCitation: https://doi.org/10.5194/gmd-2023-187-AC1
Interactive discussion
Status: closed
-
RC1: 'Comment on gmd-2023-187', Anonymous Referee #1, 05 Dec 2023
-
CC1: 'Reply on RC1', Yubin Li, 26 Dec 2023
Dear Reviewer:
Thank you very much for your valuable comments and suggestions. Your insightful feedback has been instrumental in improving the overall quality of our work.
We have thoroughly considered each comment, and our detailed responses to each one can be found in the attached document.
Once again, thank you very much for the time and effort you dedicated to reviewing our article.
Kind regard,
Yubin Li
-
CC1: 'Reply on RC1', Yubin Li, 26 Dec 2023
-
RC2: 'Comment on gmd-2023-187', Anonymous Referee #2, 25 Dec 2023
In this work, author use deep learning algorithms to develop a framework including CNN, BiLSTM, and AM for convective weather forecasting, called CNN-BiLTS-AM method. The approach is novel. The NCEP FNLS analysis datasets were exploited as inputs in which several factors related to meteorology, convective physical quantities, and geographical variables were selected by experts. In addition, station measurements were collected and used for model developments (?) and validation purpose. The training and testing datasets were from 2015 – 2020 in which data of randomly selected 72 days use for testing and the remained data used for training. In order to evaluate the proposed method’s performance, different machine learning methods such as KNN, RF, GBDT, SVM, and a physical model (WRF) were implemented, and results were compared in testing dataset and an individual case. The comparison results showed that the CNN-BiLSTM-AM overperformed other algorithms in the test cases.
However, I have major comments as follows:
- The problem statement is not clear. Convection weather forecasting is related to many events. As I understand, evaluation is mainly focused on forecasting of precipitation rate (section 4.2, 4.4, 5.1) and precipitation occurrences (section 4.3). In fact, it is two different problems: classification and regression. In this work, please state clearly what is output?
- It is also not clear that author develop one, two, or many models to solve the given problems and how to build models. There is no problem with a general deep learning framework as presented in section 3.1. However, loss functions, which are very important in DL, depend on output designs and number of models. How many models were developed and how were corresponding outputs and loss functions in this work?
- There is also inconsistence in construction of training and testing datasets which is key points for deep learning/machine learning algorithms. In section 3.2, authors only mentioned to classification problem. In addition, it is not clear what is used as ground truths to label a grid as positive/negative sample. If all 2400 station measurements in China were collected, the model should be built for all China. However, in experimental area is Henan region (section 2.3) with only 12 stations. Were authors use station precipitation measurements to label positive/negative samples? How many stations were used and its location and distribution? Please report also detail number of training and testing samples for this case? (line 223-225)
- The dividing of training and testing datasets are not suitable for this problem. Forecasting is for future, so the use of one- or two-year data for testing is more independent. For example, data from 2015 – 2020 can be divided into training dataset (2015 – 2019) and testing dataset (2020). The current division based on randomly selection can bring very good results on modelling but not correct for future and independent datasets.
- In section 3.2, the questions mentioned above are also rising for training and testing datasets for regression problem (precipitation estimation). Author needs to add this information to this section.
- In section 4.1, many evaluation indexes for regression were not defined. Example: R2, RMSE.
- In section 4.2, the test dataset is from 2015 – 2017 (line 243). It is different from the description of training/testing datasets in Section 3.2 (2015 – 2020, line 222). Also, please point out the predicted parameter is hourly (or daily) precipitation and forecast duration. The observation data is collected from how many ground stations and where they are.
- The biggest concern is inconsistent and loose logic on the article results.
- In section 4.2, in Figure 8, the boxplots were used to present predicted precipitation. However, this description cannot point out the outperformance of the proposed method and RF because they are likely to each other. Moreover, there is confliction of results in Figure 8 and Figure 5. In Figure 5, the model is underestimated precipitation while in Figure 8, the model is overestimated precipitation. Please explain why.
- In figure 9, it is not clear that diurnal variation is calculated at one station, or averages on 12 stations in Henan. If the dataset is the same as in Figure 8, there is disagreement on results of boxplot and line chart which present performance of CNN-BiLSTM-AM. In Figure 9, error of the proposed method is almost equal to 0 while in figure 8, there is difference of prediction and observation presented by boxplot. Please explain why.
- In Figure 10, the RF (b) should have lower error than CNN-BiLSTM-AM (a) based on color scales, and it is conflict with above conclusions. Following this concern, why RF is not used for stability analysis in Section 5.1.
Citation: https://doi.org/10.5194/gmd-2023-187-RC2 -
CC2: 'Reply on RC2', Yubin Li, 28 Dec 2023
Dear Reviewer:
Thank you very much for your helpful comments. We have replied to each of the review comments in our response. And please find our responses in the attached file.
Additionally, we will continue to revise the paper based on these review comments.
Once again, we deeply appreciate your valuable time and effort.
Kind regards,
Yubin Li
-
AC1: 'Comment on gmd-2023-187', Zhiqiu Gao, 08 Jan 2024
Dear editor,
Thank you for your time and effort in handling our manuscript.
Previously, in December 2023, we provided preliminary replies to the issues raised by the reviewers within each thread of their comments through Dr. Yubin Li's account.
We have now prepared a fully revised version of the manuscript incorporating all of the reviewers' suggestions, and including point-to-point responses.
Following the review process, we will upload the revised manuscript and the point-to-point responses in the next step.
Once again, we sincerely appreciate your valuable time and effort.
Best regards,
Zhiqiu Gao
On behalf of all authorsCitation: https://doi.org/10.5194/gmd-2023-187-AC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
553 | 260 | 52 | 865 | 31 | 44 |
- HTML: 553
- PDF: 260
- XML: 52
- Total: 865
- BibTeX: 31
- EndNote: 44
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1