A deep learning method for convective weather forecasting: CNN-BiLSTM-AM (version 1.0)

Zhang, Jianbin; Gao, Zhiqiu; Li, Yubin; Jiang, Yuncong

doi:https://doi.org/10.5194/gmd-2023-187

Preprints

https://doi.org/10.5194/gmd-2023-187

Preprints

Submitted as: development and technical paper

25 Oct 2023

Submitted as: development and technical paper |

| 25 Oct 2023

Status: this preprint has been withdrawn by the authors.

A deep learning method for convective weather forecasting: CNN-BiLSTM-AM (version 1.0)

Jianbin Zhang, Zhiqiu Gao, Yubin Li, and Yuncong Jiang

Abstract. This work developed a CNN-BiLSTM-AM model for convective weather forecasting using deep learning algorithms based on reanalysis and forecast data from the NCEP GFS, the performance of the model was evaluated. The results show that: (1) Compared to traditional machine learning algorithms, the CNN-BiLSTM-AM model has the ability to automatically learn deeper nonlinear features of convective weather. As a result, it exhibits higher forecasting accuracy on the convective weather dataset. Furthermore, as the forecast lead time increases, the information value provided by this model also changes. (2) In comparison to subjective forecasts by forecasters, the objective forecasting approach of the CNN-BiLSTM-AM model demonstrates advantages in metrics such as Probability of Detection (POD), False Alarm Rate (FAR), Threat Score (TS), and Missing Alarm Rate (MAR). Specifically, the average TS score for heavy precipitation reaches 0.336, which is a 33.2 % improvement compared to the forecaster's score of 0.252. Moreover, due to the CNN-BiLSTM-AM model's ability to automatically extract classification features based on a large sample dataset and consider a comprehensive range of convective parameters, it effectively reduces the FAR. (3) The interpretability study of the machine learning-based convective weather mechanism reveals that the importance ranking of convective weather forecasting factors arranged by machine learning methods largely aligns with the subjective understanding of forecasters. For example, the total precipitable water (PWAT) is identified as a critical factor for short-term heavy precipitation forecasting, regional factors have significant impacts on convective weather, and vertical motion at 300 hPa provides dynamic lifting conditions for convection. This objective analysis of factor ranking not only further confirms the effectiveness of machine learning in automatically extracting convective weather features but also validates the rationality of the sample set construction. Overall, the use of the CNN-BiLSTM-AM model in convective weather forecasting demonstrates superior performance compared to traditional machine learning algorithms and subjective forecasting methods.

This preprint has been withdrawn.

Received: 17 Sep 2023 – Discussion started: 25 Oct 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 2376 KB)

Withdrawal notice
This preprint has been withdrawn.
Preprint (2376 KB)

Download & links

This preprint has been withdrawn.

Jianbin Zhang, Zhiqiu Gao, Yubin Li, and Yuncong Jiang

Interactive discussion

Status: closed

RC1:
'Comment on gmd-2023-187', Anonymous Referee #1, 05 Dec 2023

Please see the attached file.

Citation: https://doi.org/10.5194/gmd-2023-187-RC1
- CC1: 'Reply on RC1', Yubin Li, 26 Dec 2023
  
  Dear Reviewer:
  Thank you very much for your valuable comments and suggestions. Your insightful feedback has been instrumental in improving the overall quality of our work.
  We have thoroughly considered each comment, and our detailed responses to each one can be found in the attached document.
  Once again, thank you very much for the time and effort you dedicated to reviewing our article.
  Kind regard,
  Yubin Li
  
  Citation: https://doi.org/10.5194/gmd-2023-187-CC1
RC2:
'Comment on gmd-2023-187', Anonymous Referee #2, 25 Dec 2023
In this work, author use deep learning algorithms to develop a framework including CNN, BiLSTM, and AM for convective weather forecasting, called CNN-BiLTS-AM method. The approach is novel. The NCEP FNLS analysis datasets were exploited as inputs in which several factors related to meteorology, convective physical quantities, and geographical variables were selected by experts. In addition, station measurements were collected and used for model developments (?) and validation purpose. The training and testing datasets were from 2015 – 2020 in which data of randomly selected 72 days use for testing and the remained data used for training. In order to evaluate the proposed method’s performance, different machine learning methods such as KNN, RF, GBDT, SVM, and a physical model (WRF) were implemented, and results were compared in testing dataset and an individual case. The comparison results showed that the CNN-BiLSTM-AM overperformed other algorithms in the test cases.
However, I have major comments as follows:
The problem statement is not clear. Convection weather forecasting is related to many events. As I understand, evaluation is mainly focused on forecasting of precipitation rate (section 4.2, 4.4, 5.1) and precipitation occurrences (section 4.3). In fact, it is two different problems: classification and regression. In this work, please state clearly what is output?

It is also not clear that author develop one, two, or many models to solve the given problems and how to build models. There is no problem with a general deep learning framework as presented in section 3.1. However, loss functions, which are very important in DL, depend on output designs and number of models. How many models were developed and how were corresponding outputs and loss functions in this work?

There is also inconsistence in construction of training and testing datasets which is key points for deep learning/machine learning algorithms. In section 3.2, authors only mentioned to classification problem. In addition, it is not clear what is used as ground truths to label a grid as positive/negative sample. If all 2400 station measurements in China were collected, the model should be built for all China. However, in experimental area is Henan region (section 2.3) with only 12 stations. Were authors use station precipitation measurements to label positive/negative samples? How many stations were used and its location and distribution? Please report also detail number of training and testing samples for this case? (line 223-225)

The dividing of training and testing datasets are not suitable for this problem. Forecasting is for future, so the use of one- or two-year data for testing is more independent. For example, data from 2015 – 2020 can be divided into training dataset (2015 – 2019) and testing dataset (2020). The current division based on randomly selection can bring very good results on modelling but not correct for future and independent datasets.

In section 3.2, the questions mentioned above are also rising for training and testing datasets for regression problem (precipitation estimation). Author needs to add this information to this section.

In section 4.1, many evaluation indexes for regression were not defined. Example: R2, RMSE.

In section 4.2, the test dataset is from 2015 – 2017 (line 243). It is different from the description of training/testing datasets in Section 3.2 (2015 – 2020, line 222). Also, please point out the predicted parameter is hourly (or daily) precipitation and forecast duration. The observation data is collected from how many ground stations and where they are.

The biggest concern is inconsistent and loose logic on the article results.
In section 4.2, in Figure 8, the boxplots were used to present predicted precipitation. However, this description cannot point out the outperformance of the proposed method and RF because they are likely to each other. Moreover, there is confliction of results in Figure 8 and Figure 5. In Figure 5, the model is underestimated precipitation while in Figure 8, the model is overestimated precipitation. Please explain why.

In figure 9, it is not clear that diurnal variation is calculated at one station, or averages on 12 stations in Henan. If the dataset is the same as in Figure 8, there is disagreement on results of boxplot and line chart which present performance of CNN-BiLSTM-AM. In Figure 9, error of the proposed method is almost equal to 0 while in figure 8, there is difference of prediction and observation presented by boxplot. Please explain why.

In Figure 10, the RF (b) should have lower error than CNN-BiLSTM-AM (a) based on color scales, and it is conflict with above conclusions. Following this concern, why RF is not used for stability analysis in Section 5.1.
Citation: https://doi.org/10.5194/gmd-2023-187-RC2
- CC2: 'Reply on RC2', Yubin Li, 28 Dec 2023
  
  Dear Reviewer:
  
  Thank you very much for your helpful comments. We have replied to each of the review comments in our response. And please find our responses in the attached file.
  
  Additionally, we will continue to revise the paper based on these review comments.
  
  Once again, we deeply appreciate your valuable time and effort.
  
  Kind regards,
  
  Yubin Li
  
  Citation: https://doi.org/10.5194/gmd-2023-187-CC2
AC1: 'Comment on gmd-2023-187', Zhiqiu Gao, 08 Jan 2024

Dear editor,
Thank you for your time and effort in handling our manuscript.
Previously, in December 2023, we provided preliminary replies to the issues raised by the reviewers within each thread of their comments through Dr. Yubin Li's account.
We have now prepared a fully revised version of the manuscript incorporating all of the reviewers' suggestions, and including point-to-point responses.
Following the review process, we will upload the revised manuscript and the point-to-point responses in the next step.
Once again, we sincerely appreciate your valuable time and effort.
Best regards,

Zhiqiu Gao

On behalf of all authors

Citation: https://doi.org/10.5194/gmd-2023-187-AC1

Interactive discussion

Status: closed

RC1:
'Comment on gmd-2023-187', Anonymous Referee #1, 05 Dec 2023

Please see the attached file.

Citation: https://doi.org/10.5194/gmd-2023-187-RC1
- CC1: 'Reply on RC1', Yubin Li, 26 Dec 2023
  
  Dear Reviewer:
  Thank you very much for your valuable comments and suggestions. Your insightful feedback has been instrumental in improving the overall quality of our work.
  We have thoroughly considered each comment, and our detailed responses to each one can be found in the attached document.
  Once again, thank you very much for the time and effort you dedicated to reviewing our article.
  Kind regard,
  Yubin Li
  
  Citation: https://doi.org/10.5194/gmd-2023-187-CC1
RC2:
'Comment on gmd-2023-187', Anonymous Referee #2, 25 Dec 2023
In this work, author use deep learning algorithms to develop a framework including CNN, BiLSTM, and AM for convective weather forecasting, called CNN-BiLTS-AM method. The approach is novel. The NCEP FNLS analysis datasets were exploited as inputs in which several factors related to meteorology, convective physical quantities, and geographical variables were selected by experts. In addition, station measurements were collected and used for model developments (?) and validation purpose. The training and testing datasets were from 2015 – 2020 in which data of randomly selected 72 days use for testing and the remained data used for training. In order to evaluate the proposed method’s performance, different machine learning methods such as KNN, RF, GBDT, SVM, and a physical model (WRF) were implemented, and results were compared in testing dataset and an individual case. The comparison results showed that the CNN-BiLSTM-AM overperformed other algorithms in the test cases.
However, I have major comments as follows:
The problem statement is not clear. Convection weather forecasting is related to many events. As I understand, evaluation is mainly focused on forecasting of precipitation rate (section 4.2, 4.4, 5.1) and precipitation occurrences (section 4.3). In fact, it is two different problems: classification and regression. In this work, please state clearly what is output?

It is also not clear that author develop one, two, or many models to solve the given problems and how to build models. There is no problem with a general deep learning framework as presented in section 3.1. However, loss functions, which are very important in DL, depend on output designs and number of models. How many models were developed and how were corresponding outputs and loss functions in this work?

There is also inconsistence in construction of training and testing datasets which is key points for deep learning/machine learning algorithms. In section 3.2, authors only mentioned to classification problem. In addition, it is not clear what is used as ground truths to label a grid as positive/negative sample. If all 2400 station measurements in China were collected, the model should be built for all China. However, in experimental area is Henan region (section 2.3) with only 12 stations. Were authors use station precipitation measurements to label positive/negative samples? How many stations were used and its location and distribution? Please report also detail number of training and testing samples for this case? (line 223-225)

The dividing of training and testing datasets are not suitable for this problem. Forecasting is for future, so the use of one- or two-year data for testing is more independent. For example, data from 2015 – 2020 can be divided into training dataset (2015 – 2019) and testing dataset (2020). The current division based on randomly selection can bring very good results on modelling but not correct for future and independent datasets.

In section 3.2, the questions mentioned above are also rising for training and testing datasets for regression problem (precipitation estimation). Author needs to add this information to this section.

In section 4.1, many evaluation indexes for regression were not defined. Example: R2, RMSE.

In section 4.2, the test dataset is from 2015 – 2017 (line 243). It is different from the description of training/testing datasets in Section 3.2 (2015 – 2020, line 222). Also, please point out the predicted parameter is hourly (or daily) precipitation and forecast duration. The observation data is collected from how many ground stations and where they are.

The biggest concern is inconsistent and loose logic on the article results.
In section 4.2, in Figure 8, the boxplots were used to present predicted precipitation. However, this description cannot point out the outperformance of the proposed method and RF because they are likely to each other. Moreover, there is confliction of results in Figure 8 and Figure 5. In Figure 5, the model is underestimated precipitation while in Figure 8, the model is overestimated precipitation. Please explain why.

In figure 9, it is not clear that diurnal variation is calculated at one station, or averages on 12 stations in Henan. If the dataset is the same as in Figure 8, there is disagreement on results of boxplot and line chart which present performance of CNN-BiLSTM-AM. In Figure 9, error of the proposed method is almost equal to 0 while in figure 8, there is difference of prediction and observation presented by boxplot. Please explain why.

In Figure 10, the RF (b) should have lower error than CNN-BiLSTM-AM (a) based on color scales, and it is conflict with above conclusions. Following this concern, why RF is not used for stability analysis in Section 5.1.
Citation: https://doi.org/10.5194/gmd-2023-187-RC2
- CC2: 'Reply on RC2', Yubin Li, 28 Dec 2023
  
  Dear Reviewer:
  
  Thank you very much for your helpful comments. We have replied to each of the review comments in our response. And please find our responses in the attached file.
  
  Additionally, we will continue to revise the paper based on these review comments.
  
  Once again, we deeply appreciate your valuable time and effort.
  
  Kind regards,
  
  Yubin Li
  
  Citation: https://doi.org/10.5194/gmd-2023-187-CC2
AC1: 'Comment on gmd-2023-187', Zhiqiu Gao, 08 Jan 2024

Dear editor,
Thank you for your time and effort in handling our manuscript.
Previously, in December 2023, we provided preliminary replies to the issues raised by the reviewers within each thread of their comments through Dr. Yubin Li's account.
We have now prepared a fully revised version of the manuscript incorporating all of the reviewers' suggestions, and including point-to-point responses.
Following the review process, we will upload the revised manuscript and the point-to-point responses in the next step.
Once again, we sincerely appreciate your valuable time and effort.
Best regards,

Zhiqiu Gao

On behalf of all authors

Citation: https://doi.org/10.5194/gmd-2023-187-AC1

Jianbin Zhang, Zhiqiu Gao, Yubin Li, and Yuncong Jiang

Viewed

Total article views: 1,127 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
682	385	60	1,127	56	81

HTML: 682
PDF: 385
XML: 60
Total: 1,127
BibTeX: 56
EndNote: 81

Views and downloads (calculated since 25 Oct 2023)

Month	HTML	PDF	XML	Total
Oct 2023	97	35	4	136
Nov 2023	83	26	3	112
Dec 2023	65	25	11	101
Jan 2024	39	15	5	59
Feb 2024	21	25	3	49
Mar 2024	35	28	5	68
Apr 2024	25	18	7	50
May 2024	20	17	9	46
Jun 2024	51	14	2	67
Jul 2024	19	7	0	26
Aug 2024	20	2	2	24
Sep 2024	19	10	0	29
Oct 2024	19	14	0	33
Nov 2024	32	18	1	51
Dec 2024	16	10	0	26
Jan 2025	10	21	1	32
Feb 2025	11	11	2	24
Mar 2025	17	10	1	28
Apr 2025	20	20	1	41
May 2025	19	10	1	30
Jun 2025	14	28	1	43
Jul 2025	25	13	1	39
Aug 2025	5	8	0	13

Cumulative views and downloads (calculated since 25 Oct 2023)

Month	HTML	PDF	XML	Total
Oct 2023	97	35	4	136
Nov 2023	83	26	3	112
Dec 2023	65	25	11	101
Jan 2024	39	15	5	59
Feb 2024	21	25	3	49
Mar 2024	35	28	5	68
Apr 2024	25	18	7	50
May 2024	20	17	9	46
Jun 2024	51	14	2	67
Jul 2024	19	7	0	26
Aug 2024	20	2	2	24
Sep 2024	19	10	0	29
Oct 2024	19	14	0	33
Nov 2024	32	18	1	51
Dec 2024	16	10	0	26
Jan 2025	10	21	1	32
Feb 2025	11	11	2	24
Mar 2025	17	10	1	28
Apr 2025	20	20	1	41
May 2025	19	10	1	30
Jun 2025	14	28	1	43
Jul 2025	25	13	1	39
Aug 2025	5	8	0	13

Viewed (geographical distribution)

Total article views: 1,102 (including HTML, PDF, and XML) Thereof 1,102 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 07 Aug 2025

Short summary

This study developed a deep learning model called CNN-BiLSTM-AM for convective weather forecasting. The results showed that the CNN-BiLSTM-AM model outperformed traditional machine learning algorithms in predicting convective weather, with higher accuracy as the forecast lead time increased. When compared to subjective forecasts by forecasters, the objective approach of the CNN-BiLSTM-AM model also demonstrated advantages in various metrics.


Total:	0
HTML:	0
PDF:	0
XML:	0