Articles | Volume 18, issue 17
https://doi.org/10.5194/gmd-18-5781-2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/gmd-18-5781-2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
An extension of WeatherBench 2 to binary hydroclimatic forecasts
Tongtiegang Zhao
CORRESPONDING AUTHOR
Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), School of Civil Engineering, Sun Yat-Sen University, Guangzhou 510275, China
Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), School of Civil Engineering, Sun Yat-Sen University, Guangzhou 510275, China
Tongbi Tu
Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), School of Civil Engineering, Sun Yat-Sen University, Guangzhou 510275, China
Xiaohong Chen
Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), School of Civil Engineering, Sun Yat-Sen University, Guangzhou 510275, China
Related authors
Tongtiegang Zhao, Zecong Chen, Yongyong Zhang, Bingyao Zhang, and Yu Li
Hydrol. Earth Syst. Sci., 29, 2429–2443, https://doi.org/10.5194/hess-29-2429-2025, https://doi.org/10.5194/hess-29-2429-2025, 2025
Short summary
Short summary
The classic logistic function characterizes the stationary relationship between drought loss and intensity. This paper accounts for time in the magnitude, shape and location parameters of the logistic function and derives nonstationary intensity loss functions. A case study is designed to test the functions for drought-affected populations by province in mainland China from 2006 to 2023. Overall, the nonstationary intensity loss functions are shown to be a useful tool for drought management.
Tongtiegang Zhao, Zexin Chen, Yu Tian, Bingyao Zhang, Yu Li, and Xiaohong Chen
Hydrol. Earth Syst. Sci., 28, 3597–3611, https://doi.org/10.5194/hess-28-3597-2024, https://doi.org/10.5194/hess-28-3597-2024, 2024
Short summary
Short summary
The local performance plays a critical part in practical applications of global streamflow reanalysis. This paper develops a decomposition approach to evaluating streamflow analysis at different timescales. The reanalysis is observed to be more effective in characterizing seasonal, annual and multi-annual features than daily, weekly and monthly features. Also, the local performance is shown to be primarily influenced by precipitation seasonality, longitude, mean precipitation and mean slope.
Qiang Li and Tongtiegang Zhao
EGUsphere, https://doi.org/10.5194/egusphere-2024-1449, https://doi.org/10.5194/egusphere-2024-1449, 2024
Preprint withdrawn
Short summary
Short summary
This paper focuses on the effect of the water balance constraint on the robustness of the long short-term memory (LSTM) network in learning rainfall-runoff relationships. Through large-sample tests, it is found that incorporating this constraint into the LSTM improves the robustness, while the improvement tends to decrease as the amount of training data increases. The results point to the compensation effects between training data and process knowledge on the LSTM’s performance.
Qiang Li and Tongtiegang Zhao
EGUsphere, https://doi.org/10.5194/egusphere-2023-2841, https://doi.org/10.5194/egusphere-2023-2841, 2024
Preprint archived
Short summary
Short summary
The lack of physical mechanism is a critical issue for the use of popular deep learning models. This paper presents an in-depth investigation of the fundamental mass balance constraint for deep learning-based rainfall-runoff prediction. The robustness against data sparsity, random parameters initialization and contrasting climate conditions are detailed. The results highlight that the water balance constraint evidently improves the robustness in particular when there is limited training data.
Huayang Cai, Bo Li, Junhao Gu, Tongtiegang Zhao, and Erwan Garel
Ocean Sci., 19, 603–614, https://doi.org/10.5194/os-19-603-2023, https://doi.org/10.5194/os-19-603-2023, 2023
Short summary
Short summary
For many problems concerning water resource utilization in estuaries, it is essential to be able to express observed salinity distributions based on simple theoretical models. In this study, we propose an analytical salt intrusion model inspired from a theory for predictions of flood hydrographs in watersheds. The newly developed model can be well calibrated using a minimum of three salinity measurements along the estuary and has been successfully applied in 21 estuaries worldwide.
Huayang Cai, Hao Yang, Pascal Matte, Haidong Pan, Zhan Hu, Tongtiegang Zhao, and Guangliang Liu
Ocean Sci., 18, 1691–1702, https://doi.org/10.5194/os-18-1691-2022, https://doi.org/10.5194/os-18-1691-2022, 2022
Short summary
Short summary
Quantifying spatial–temporal water level dynamics is essential for water resources management in estuaries. In this study, we propose a simple yet powerful regression model to examine the influence of the world’s largest dam, the Three Gorges Dam (TGD), on the spatial–temporal water level dynamics within the Yangtze River estuary. The presented method is particularly useful for determining scientific strategies for sustainable water resources management in dam-controlled estuaries worldwide.
Tongtiegang Zhao, Haoling Chen, Yu Tian, Denghua Yan, Weixin Xu, Huayang Cai, Jiabiao Wang, and Xiaohong Chen
Hydrol. Earth Syst. Sci., 26, 4233–4249, https://doi.org/10.5194/hess-26-4233-2022, https://doi.org/10.5194/hess-26-4233-2022, 2022
Short summary
Short summary
This paper develops a novel set operations of coefficients of determination (SOCD) method to explicitly quantify the overlapping and differing information for GCM forecasts and ENSO teleconnection. Specifically, the intersection operation of the coefficient of determination derives the overlapping information for GCM forecasts and the Niño3.4 index, and then the difference operation determines the differing information in GCM forecasts (Niño3.4 index) from the Niño3.4 index (GCM forecasts).
Tongtiegang Zhao, Haoling Chen, Quanxi Shao, Tongbi Tu, Yu Tian, and Xiaohong Chen
Hydrol. Earth Syst. Sci., 25, 5717–5732, https://doi.org/10.5194/hess-25-5717-2021, https://doi.org/10.5194/hess-25-5717-2021, 2021
Short summary
Short summary
This paper develops a novel approach to attributing correlation skill of dynamical GCM forecasts to statistical El Niño–Southern Oscillation (ENSO) teleconnection using the coefficient of determination. Three cases of attribution are effectively facilitated, which are significantly positive anomaly correlation attributable to positive ENSO teleconnection, attributable to negative ENSO teleconnection and not attributable to ENSO teleconnection.
Tongtiegang Zhao, Zecong Chen, Yongyong Zhang, Bingyao Zhang, and Yu Li
Hydrol. Earth Syst. Sci., 29, 2429–2443, https://doi.org/10.5194/hess-29-2429-2025, https://doi.org/10.5194/hess-29-2429-2025, 2025
Short summary
Short summary
The classic logistic function characterizes the stationary relationship between drought loss and intensity. This paper accounts for time in the magnitude, shape and location parameters of the logistic function and derives nonstationary intensity loss functions. A case study is designed to test the functions for drought-affected populations by province in mainland China from 2006 to 2023. Overall, the nonstationary intensity loss functions are shown to be a useful tool for drought management.
Tongtiegang Zhao, Zexin Chen, Yu Tian, Bingyao Zhang, Yu Li, and Xiaohong Chen
Hydrol. Earth Syst. Sci., 28, 3597–3611, https://doi.org/10.5194/hess-28-3597-2024, https://doi.org/10.5194/hess-28-3597-2024, 2024
Short summary
Short summary
The local performance plays a critical part in practical applications of global streamflow reanalysis. This paper develops a decomposition approach to evaluating streamflow analysis at different timescales. The reanalysis is observed to be more effective in characterizing seasonal, annual and multi-annual features than daily, weekly and monthly features. Also, the local performance is shown to be primarily influenced by precipitation seasonality, longitude, mean precipitation and mean slope.
Qiang Li and Tongtiegang Zhao
EGUsphere, https://doi.org/10.5194/egusphere-2024-1449, https://doi.org/10.5194/egusphere-2024-1449, 2024
Preprint withdrawn
Short summary
Short summary
This paper focuses on the effect of the water balance constraint on the robustness of the long short-term memory (LSTM) network in learning rainfall-runoff relationships. Through large-sample tests, it is found that incorporating this constraint into the LSTM improves the robustness, while the improvement tends to decrease as the amount of training data increases. The results point to the compensation effects between training data and process knowledge on the LSTM’s performance.
Qiang Li and Tongtiegang Zhao
EGUsphere, https://doi.org/10.5194/egusphere-2023-2841, https://doi.org/10.5194/egusphere-2023-2841, 2024
Preprint archived
Short summary
Short summary
The lack of physical mechanism is a critical issue for the use of popular deep learning models. This paper presents an in-depth investigation of the fundamental mass balance constraint for deep learning-based rainfall-runoff prediction. The robustness against data sparsity, random parameters initialization and contrasting climate conditions are detailed. The results highlight that the water balance constraint evidently improves the robustness in particular when there is limited training data.
Huayang Cai, Bo Li, Junhao Gu, Tongtiegang Zhao, and Erwan Garel
Ocean Sci., 19, 603–614, https://doi.org/10.5194/os-19-603-2023, https://doi.org/10.5194/os-19-603-2023, 2023
Short summary
Short summary
For many problems concerning water resource utilization in estuaries, it is essential to be able to express observed salinity distributions based on simple theoretical models. In this study, we propose an analytical salt intrusion model inspired from a theory for predictions of flood hydrographs in watersheds. The newly developed model can be well calibrated using a minimum of three salinity measurements along the estuary and has been successfully applied in 21 estuaries worldwide.
Huayang Cai, Hao Yang, Pascal Matte, Haidong Pan, Zhan Hu, Tongtiegang Zhao, and Guangliang Liu
Ocean Sci., 18, 1691–1702, https://doi.org/10.5194/os-18-1691-2022, https://doi.org/10.5194/os-18-1691-2022, 2022
Short summary
Short summary
Quantifying spatial–temporal water level dynamics is essential for water resources management in estuaries. In this study, we propose a simple yet powerful regression model to examine the influence of the world’s largest dam, the Three Gorges Dam (TGD), on the spatial–temporal water level dynamics within the Yangtze River estuary. The presented method is particularly useful for determining scientific strategies for sustainable water resources management in dam-controlled estuaries worldwide.
Tongtiegang Zhao, Haoling Chen, Yu Tian, Denghua Yan, Weixin Xu, Huayang Cai, Jiabiao Wang, and Xiaohong Chen
Hydrol. Earth Syst. Sci., 26, 4233–4249, https://doi.org/10.5194/hess-26-4233-2022, https://doi.org/10.5194/hess-26-4233-2022, 2022
Short summary
Short summary
This paper develops a novel set operations of coefficients of determination (SOCD) method to explicitly quantify the overlapping and differing information for GCM forecasts and ENSO teleconnection. Specifically, the intersection operation of the coefficient of determination derives the overlapping information for GCM forecasts and the Niño3.4 index, and then the difference operation determines the differing information in GCM forecasts (Niño3.4 index) from the Niño3.4 index (GCM forecasts).
Tongtiegang Zhao, Haoling Chen, Quanxi Shao, Tongbi Tu, Yu Tian, and Xiaohong Chen
Hydrol. Earth Syst. Sci., 25, 5717–5732, https://doi.org/10.5194/hess-25-5717-2021, https://doi.org/10.5194/hess-25-5717-2021, 2021
Short summary
Short summary
This paper develops a novel approach to attributing correlation skill of dynamical GCM forecasts to statistical El Niño–Southern Oscillation (ENSO) teleconnection using the coefficient of determination. Three cases of attribution are effectively facilitated, which are significantly positive anomaly correlation attributable to positive ENSO teleconnection, attributable to negative ENSO teleconnection and not attributable to ENSO teleconnection.
Hailong Wang, Kai Duan, Bingjun Liu, and Xiaohong Chen
Hydrol. Earth Syst. Sci., 25, 4741–4758, https://doi.org/10.5194/hess-25-4741-2021, https://doi.org/10.5194/hess-25-4741-2021, 2021
Short summary
Short summary
Using remote sensing and reanalysis data, we examined the relationships between vegetation development and water resource availability in a humid subtropical basin. We found overall increases in total water storage and surface greenness and vegetation production, and the changes were particularly profound in cropland-dominated regions. Correlation analysis implies water availability leads the variations in greenness and production, and irrigation may improve production during dry periods.
Jun Li, Zhaoli Wang, Xushu Wu, Jakob Zscheischler, Shenglian Guo, and Xiaohong Chen
Hydrol. Earth Syst. Sci., 25, 1587–1601, https://doi.org/10.5194/hess-25-1587-2021, https://doi.org/10.5194/hess-25-1587-2021, 2021
Short summary
Short summary
We introduce a daily-scale index, termed the standardized compound drought and heat index (SCDHI), to measure the key features of compound dry-hot conditions. SCDHI can not only monitor the long-term compound dry-hot events, but can also capture such events at sub-monthly scale and reflect the related vegetation activity impacts. The index can provide a new tool to quantify sub-monthly characteristics of compound dry-hot events, which are vital for releasing early and timely warning.
Cited articles
Agrawal, N., Nelson, P. V., and Low, R. D.: A Novel Approach for Predicting Large Wildfires Using Machine Learning towards Environmental Justice via Environmental Remote Sensing and Atmospheric Reanalysis Data across the United States, Remote Sens., 15, 5501, https://doi.org/10.3390/rs15235501, 2023.
Balsamo, G., Rabier, F., Balmaseda, M., Bauer, P., Brown, A., Dueben, P., English, S., McNally, T., Pappenberger, F., Sandu, I., Thepaut, J.-N., and Wedi, N.: Recent progress and outlook for the ECMWF Integrated Forecasting System, EGU General Assembly 2023, Vienna, Austria, 23–28 April 2023, EGU23-13110, https://doi.org/10.5194/egusphere-egu23-13110, 2023.
Bauer, P., Thorpe, A., and Brunet, G.: The quiet revolution of numerical weather prediction, Nature, 525, 47–55, https://doi.org/10.1038/nature14956, 2015.
Bauer, P., Quintino, T., Wedi, N., Bonanni, A., Chrust, M., Deconinck, W., Diamantakis, M., Dueben, P., English, S., Flemming, J., Gillies, P., Hadade, I., Hawkes, J., Hawkins, M., Iffrig, O., Kühnlein, C., Lange, M., Lean, P., Marsden, O., Müller, A., Saarinen, S., Sarmany, D., Sleigh, M., Smart, S., Smolarkiewicz, P., Thiemert, D., Tumolo, G., Weihrauch, C., Zanna, C., and Maciel, P.: The ECMWF Scalability Programme: Progress and Plans, ECMWF Technical Memoranda, ECMWF, https://doi.org/10.21957/gdit22ulm, 2020.
Ben Bouallègue, Z. and the AIFS team: Accuracy versus activity, ECMWF, https://doi.org/10.21957/8b50609a0f, 2024.
Ben Bouallègue, Z., Clare, M. C. A., Magnusson, L., Gascón, E., Maier-Gerber, M., Janoušek, M., Rodwell, M., Pinault, F., Dramsch, J. S., Lang, S. T. K., Raoult, B., Rabier, F., Chevallier, M., Sandu, I., Dueben, P., Chantry, M., and Pappenberger, F.: The Rise of Data-Driven Weather Forecasting: A First Statistical Assessment of Machine Learning–Based Weather Forecasts in an Operational-Like Context, B. Am. Meteorol. Soc. 105, E864–E883, https://doi.org/10.1175/BAMS-D-23-0162.1, 2024.
Benjamini, Y. and Hochberg, Y.: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing, J. Roy. Stat. Soc. B, 57, 289–300, https://doi.org/10.1111/j.2517-6161.1995.tb02031.x, 1995.
Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., and Tian, Q.: Accurate medium-range global weather forecasting with 3D neural networks, Nature, 619, 533–538, https://doi.org/10.1038/s41586-023-06185-3, 2023.
Bonavita, M.: On Some Limitations of Current Machine Learning Weather Prediction Models, Geophys. Res. Lett., 51, e2023GL107377, https://doi.org/10.1029/2023GL107377, 2024.
Brodie, S., Pozo Buil, M., Welch, H., Bograd, S. J., Hazen, E. L., Santora, J. A., Seary, R., Schroeder, I. D., and Jacox, M. G.: Ecological forecasts for marine resource management during climate extremes, Nat. Commun., 14, 7701, https://doi.org/10.1038/s41467-023-43188-0, 2024.
Chakraborty, P., Dube, A., Sarkar, A., Mitra, A. K., Bhatla, R., and Singh, R. S.: How much does a high-resolution global ensemble forecast improve upon deterministic prediction skill for the Indian summer monsoon?, Meteorol. Atmos. Phys., 135, 33, https://doi.org/10.1007/s00703-023-00966-1, 2023.
Charlton-Perez, A. J., Dacre, H. F., Driscoll, S., Gray, S. L., Harvey, B., Harvey, N. J., Hunt, K. M. R., Lee, R. W., Swaminathan, R., Vandaele, R., and Volonté, A.: Do AI models produce better weather forecasts than physics-based models? A quantitative evaluation case study of Storm Ciarán, npj Clim. Atmos. Sci., 7, 1–11, https://doi.org/10.1038/s41612-024-00638-w, 2024.
Chen, L., Zhong, X., Zhang, F., Cheng, Y., Xu, Y., Qi, Y., and Li, H.: FuXi: a cascade machine learning forecasting system for 15-day global weather forecast, npj Clim. Atmos. Sci., 6, 1–11, https://doi.org/10.1038/s41612-023-00512-1, 2023.
Chen, X., Leung, L. R., Gao, Y., Liu, Y., Wigmosta, M., and Richmond, M.: Predictability of Extreme Precipitation in Western U.S. Watersheds Based on Atmospheric River Occurrence, Intensity, and Duration, Geophys. Res. Lett., 45, 11693–11701, https://doi.org/10.1029/2018GL079831, 2018.
Clare, M. C. A., Jamil, O., and Morcrette, C. J.: Combining distribution-based neural networks to predict weather forecast probabilities, Q. J. Roy. Meteorol. Soc., 147, 4337–4357, https://doi.org/10.1002/qj.4180, 2021.
Coelho, G. de A., Ferreira, C. M., and Kinter III, J. L.: Multiscale and multi event evaluation of short-range real-time flood forecasting in large metropolitan areas, J. Hydrol., 612, 128212, https://doi.org/10.1016/j.jhydrol.2022.128212, 2022.
de Burgh-Day, C. O. and Leeuwenburg, T.: Machine learning for numerical weather and climate modelling: a review, Geosci. Model Dev., 16, 6433–6477, https://doi.org/10.5194/gmd-16-6433-2023, 2023.
Donaldson, R. J., Dyer, R. M., and Kraus, M. J.: An objective evaluator of techniques for predicting severe weather events, in: Preprints, Ninth Conf. on Severe Local Storms, Norman, OK, 21–23 October 1975, Amer. Meteor. Soc., 321326, 1975.
Ferro, C. A. T. and Stephenson, D. B.: Extremal Dependence Indices: Improved Verification Measures for Deterministic Forecasts of Rare Binary Events, Weather Forecast., 26, 699–713, https://doi.org/10.1175/WAF-D-10-05030.1, 2011.
Finley, J. P.: Tornado predictions, Am. Meteorol. J., 1, 85–88, 1884.
Gilbert, G. K.: Finley's tornado predictions, Am. Meteorol. J., 1, 166–172, 1884.
Gomis-Cebolla, J., Rattayova, V., Salazar-Galán, S., and Francés, F.: Evaluation of ERA5 and ERA5-Land reanalysis precipitation datasets over Spain (1951–2020), Atmos. Res., 284, 106606, https://doi.org/10.1016/j.atmosres.2023.106606, 2023.
Heidke, P.: Berechnung Des Erfolges Und Der Güte Der Windstärkevorhersagen Im Sturmwarnungsdienst, Geograf. Ann., 8, 301–349, https://doi.org/10.1080/20014422.1926.11881138, 1926.
Huang, Z. and Zhao, T.: Predictive performance of ensemble hydroclimatic forecasts: Verification metrics, diagnostic plots and forecast attributes, WIREs Water, 9, e1580, https://doi.org/10.1002/wat2.1580, 2022.
Huang, Z., Zhao, T., Xu, W., Cai, H., Wang, J., Zhang, Y., Liu, Z., Tian, Y., Yan, D., and Chen, X.: A seven-parameter Bernoulli-Gamma-Gaussian model to calibrate subseasonal to seasonal precipitation forecasts, J. Hydrol., 610, 127896, https://doi.org/10.1016/j.jhydrol.2022.127896, 2022.
Jacox, M. G., Alexander, M. A., Amaya, D., Becker, E., Bograd, S. J., Brodie, S., Hazen, E. L., Pozo Buil, M., and Tommasi, D.: Global seasonal forecasts of marine heatwaves, Nature, 604, 486–490, https://doi.org/10.1038/s41586-022-04573-9, 2022.
Jin, W., Weyn, J., Zhao, P., Xiang, S., Bian, J., Fang, Z., Dong, H., Sun, H., Thambiratnam, K., and Zhang, Q.: WeatherReal: A Benchmark Based on In-Situ Observations for Evaluating Weather Models, arXiv [preprint], https://doi.org/10.48550/arXiv.2409.09371, 2024.
Jolliffe, I. T. and Stephenson, D. B.: Forecast verification: a practitioner's guide in atmospheric science, 2nd edn., John Wiley & Sons, ISBN: 9781119960003, 2012.
Keisler, R.: Forecasting Global Weather with Graph Neural Networks, arXiv [preprint], https://doi.org/10.48550/arXiv.2202.07575, 2022.
Kochkov, D., Yuval, J., Langmore, I., Norgaard, P., Smith, J., Mooers, G., Klöwer, M., Lottes, J., Rasp, S., Düben, P., Hatfield, S., Battaglia, P., Sanchez-Gonzalez, A., Willson, M., Brenner, M. P., and Hoyer, S.: Neural general circulation models for weather and climate, Nature, 632, 1060–1066, https://doi.org/10.1038/s41586-024-07744-y, 2024.
Lagadec, L.-R., Patrice, P., Braud, I., Chazelle, B., Moulin, L., Dehotin, J., Hauchard, E., and Breil, P.: Description and evaluation of a surface runoff susceptibility mapping method, J. Hydrol., 541, 495–509, https://doi.org/10.1016/j.jhydrol.2016.05.049, 2016.
Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F., Ravuri, S., Ewalds, T., Eaton-Rosen, Z., Hu, W., Merose, A., Hoyer, S., Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S., and Battaglia, P.: Learning skillful medium-range global weather forecasting, Science, 382, 1416–1421, https://doi.org/10.1126/science.adi2336, 2023.
Lang, Y., Ye, A., Gong, W., Miao, C., Di, Z., Xu, J., Liu, Y., Luo, L., and Duan, Q.: Evaluating Skill of Seasonal Precipitation and Temperature Predictions of NCEP CFSv2 Forecasts over 17 Hydroclimatic Regions in China, J. Hydrometeorol., 15, 1546–1559, https://doi.org/10.1175/JHM-D-13-0208.1, 2014.
Larraondo, P. R., Renzullo, L. J., Van Dijk, A. I. J. M., Inza, I., and Lozano, J. A.: Optimization of Deep Learning Precipitation Models Using Categorical Binary Metrics, J. Adv. Model. Earth Syst., 12, e2019MS001909, https://doi.org/10.1029/2019MS001909, 2020.
Lerch, S., Thorarinsdottir, T. L., Ravazzolo, F., and Gneiting, T.: Forecaster's Dilemma: Extreme Events and Forecast Evaluation, Stat. Sci., 32, 106–127, https://doi.org/10.1214/16-STS588, 2017.
Li, Q. and Zhao, T.: Data for the extension of the WeatherBench 2 to binary hydroclimatic forecasts: ensemble forecasts for 24 h precipitation (v0.1.0), Zenodo [data set], https://doi.org/10.5281/zenodo.15066828, 2025a.
Li, Q. and Zhao, T.: Data for the extension of the WeatherBench 2 to binary hydroclimatic forecasts: ensemble forecasts for 24 h maximum temperature (v0.1.0), Zenodo [data set], https://doi.org/10.5281/zenodo.15066898, 2025b.
Li, Q. and Zhao, T.: Code for the extension of the WeatherBench 2 to binary hydroclimatic forecasts (v0.3.0), Zenodo [code], https://doi.org/10.5281/zenodo.15067282, 2025c.
Li, Q. and Zhao, T.: Data for the extension of the WeatherBench 2 to binary hydroclimatic forecasts (v0.2.0), Zenodo [data set], https://doi.org/10.5281/zenodo.15067178, 2025d.
Liang, K.-Y. and Zeger, S. L.: Longitudinal data analysis using generalized linear models, Biometrika, 73, 13–22, https://doi.org/10.1093/biomet/73.1.13, 1986.
Liu, C.-C., Hsu, K., Peng, M. S., Chen, D.-S., Chang, P.-L., Hsiao, L.-F., Fong, C.-T., Hong, J.-S., Cheng, C.-P., Lu, K.-C., Chen, C.-R., and Kuo, H.-C.: Evaluation of five global AI models for predicting weather in Eastern Asia and Western Pacific, npj Clim. Atmos. Sci. 7, 1–12, https://doi.org/10.1038/s41612-024-00769-0, 2024a.
Liu, H., Tan, Z., Wang, Y., Tang, J., Satoh, M., Lei, L., Gu, J., Zhang, Y., Nie, G., and Chen, Q.: A Hybrid Machine Learning/Physics-Based Modeling Framework for 2-Week Extended Prediction of Tropical Cyclones, J. Geophys. Res.: Mach. Learn. Comput., 1, e2024JH000207, https://doi.org/10.1029/2024JH000207, 2024b.
Makkonen, L.: Plotting Positions in Extreme Value Analysis, J. Appl. Meteorol. Clim., 45, 334–340, https://doi.org/10.1175/JAM2349.1, 2006.
Mardani, M., Brenowitz, N., Cohen, Y., Pathak, J., Chen, C.-Y., Liu, C.-C., Vahdat, A., Nabian, M. A., Ge, T., Subramaniam, A., Kashinath, K., Kautz, J., and Pritchard, M.: Residual corrective diffusion modeling for km-scale atmospheric downscaling, Commun. Earth Environ., 6, 1–10, https://doi.org/10.1038/s43247-025-02042-5, 2025.
Merz, B., Kuhlicke, C., Kunz, M., Pittore, M., Babeyko, A., Bresch, D. N., Domeisen, D. I. V., Feser, F., Koszalka, I., Kreibich, H., Pantillon, F., Parolai, S., Pinto, J. G., Punge, H. J., Rivalta, E., Schröter, K., Strehlow, K., Weisse, R., and Wurpts, A.: Impact Forecasting to Support Emergency Management of Natural Hazards, Rev. Geophys., 58, e2020RG000704, https://doi.org/10.1029/2020RG000704, 2020.
Murphy, A. H.: What Is a Good Forecast? An Essay on the Nature of Goodness in Weather Forecasting, Weather Forecast., 8, 281–293, https://doi.org/10.1175/1520-0434(1993)008<0281:WIAGFA>2.0.CO;2, 1993.
North, R., Trueman, M., Mittermaier, M., and Rodwell, M. J.: An assessment of the SEEPS and SEDI metrics for the verification of 6 h forecast precipitation accumulations, Meteorol. Appl., 17, 2347–2358, https://doi.org/10.1002/met.1405, 2013.
Olivetti, L. and Messori, G.: Advances and prospects of deep learning for medium-range extreme weather forecasting, Geosci. Model Dev., 17, 2347–2358, https://doi.org/10.5194/gmd-17-2347-2024, 2024a.
Olivetti, L. and Messori, G.: Do data-driven models beat numerical models in forecasting weather extremes? A comparison of IFS HRES, Pangu-Weather, and GraphCast, Geosci. Model Dev., 17, 7915–7962, https://doi.org/10.5194/gmd-17-7915-2024, 2024b.
Orozco López, E., Kaplan, D., Linhoss, A., Hogan, R. J., Ferro, C. A. T., Jolliffe, I. T., and Stephenson, D. B.: Equitability Revisited: Why the “Equitable Threat Score” Is Not Equitable, Weather Forecast., 25, 710–726, https://doi.org/10.1175/2009WAF2222350.1, 2010.
Pasche, O. C., Wider, J., Zhang, Z., Zscheischler, J., and Engelke, S.: Validating Deep Learning Weather Forecast Models on Recent High-Impact Extreme Events, Artif. Intel. Earth Syst., 4, e240033, https://doi.org/10.1175/AIES-D-24-0033.1, 2025.
Pathak, J., Subramanian, S., Harrington, P., Raja, S., Chattopadhyay, A., Mardani, M., Kurth, T., Hall, D., Li, Z., Azizzadenesheli, K., Hassanzadeh, P., Kashinath, K., and Anandkumar, A.: FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators, arXiv [preprint], https://doi.org/10.48550/arXiv.2202.11214, 2022.
Peirce, C. S.: The Numerical Measure of the Success of Predictions, Science, ns-4, 453–454, https://doi.org/10.1126/science.ns-4.93.453.b, 1884.
Price, I., Sanchez-Gonzalez, A., Alet, F., Andersson, T. R., El-Kadi, A., Masters, D., Ewalds, T., Stott, J., Mohamed, S., Battaglia, P., Lam, R., and Willson, M.: Probabilistic weather forecasting with machine learning, Nature, 637, 84–90, https://doi.org/10.1038/s41586-024-08252-9, 2025.
Primo, C. and Ghelli, A.: The affect of the base rate on the extreme dependency score, Meteorol. Appl., 16, 533–535, https://doi.org/10.1002/met.152, 2009.
Rasp, S. and Thuerey, N.: Data-Driven Medium-Range Weather Prediction With a Resnet Pretrained on Climate Simulations: A New Model for WeatherBench, J. Adv. Model. Earth Syst., 13, e2020MS002405, https://doi.org/10.1029/2020MS002405, 2021.
Rasp, S., Dueben, P. D., Scher, S., Weyn, J. A., Mouatadid, S., and Thuerey, N.: WeatherBench: A Benchmark Data Set for Data-Driven Weather Forecasting, J. Adv. Model. Earth Syst., 12, e2020MS002203, https://doi.org/10.1029/2020MS002203, 2020.
Rasp, S., Hoyer, S., Merose, A., Langmore, I., Battaglia, P., Russell, T., Sanchez-Gonzalez, A., Yang, V., Carver, R., Agrawal, S., Chantry, M., Ben Bouallegue, Z., Dueben, P., Bromberg, C., Sisk, J., Barrington, L., Bell, A., and Sha, F.: WeatherBench 2: A Benchmark for the Next Generation of Data-Driven Global Weather Models, J. Adv. Model. Earth Syst., 16, e2023MS004019, https://doi.org/10.1029/2023MS004019, 2024.
Richardson, D. S.: Skill and relative economic value of the ECMWF ensemble prediction system, Q. J. Roy. Meteorol. Soc., 126, 649–667, https://doi.org/10.1002/qj.49712656313, 2000.
Richardson, D. S.: Predictability and economic value, in: Predictability of Weather and Climate, edited by: Palmer, T. and Hagedorn, R., Cambridge University Press, 628–644, ISBN: 978-0-511-61765, 2006.
Schaefer, J. T.: The Critical Success Index as an Indicator of Warning Skill, Weather Forecast., 5, 570–575, https://doi.org/10.1175/1520-0434(1990)005<0570:TCSIAA>2.0.CO;2, 1990.
Selz, T. and Craig, G. C.: Can Artificial Intelligence-Based Weather Prediction Models Simulate the Butterfly Effect?, Geophys. Res. Lett., 50, e2023GL105747, https://doi.org/10.1029/2023GL105747, 2023.
Shen, H., Tolson, B. A., and Mai, J.: PRACTITIONERS' CORNER: Computing Robust Standard Errors for Within-groups Estimators, Oxford B. Econ. Stat., 49, 431–434, https://doi.org/10.1111/j.1468-0084.1987.mp49004006.x, 1987.
Slater, L. J., Arnal, L., Boucher, M.-A., Chang, A. Y.-Y., Moulds, S., Murphy, C., Nearing, G., Shalev, G., Shen, C., Speight, L., Villarini, G., Wilby, R. L., Wood, A., and Zappa, M.: Hybrid forecasting: blending climate predictions with AI models, Hydrol. Earth Syst. Sci., 27, 1865–1889, https://doi.org/10.5194/hess-27-1865-2023, 2023.
Stephenson, D. B.: Use of the “Odds Ratio” for Diagnosing Forecast Skill, Weather Forecast., 15, 221–232, https://doi.org/10.1175/1520-0434(2000)015<0221:UOTORF>2.0.CO;2, 2000.
Stephenson, D. B., Casati, B., Ferro, C. A. T., and Wilson, C. A.: The extreme dependency score: a non-vanishing measure for forecasts of rare events, Meteorol. Appl., 15, 41–50, https://doi.org/10.1002/met.53, 2008.
Swets, J. A.: Form of empirical ROCs in discrimination and diagnostic tasks: Implications for theory and measurement of performance, Psychol. Bull., 99, 181–198, https://doi.org/10.1037/0033-2909.99.2.181, 1986a.
Swets, J. A.: Indices of discrimination or diagnostic accuracy: Their ROCs and implied models, Psychol. Bull., 99, 100–117, https://doi.org/10.1037/0033-2909.99.1.100, 1986b.
Weyn, J. A., Durran, D. R., and Caruana, R.: Improving Data-Driven Global Weather Prediction Using Deep Convolutional Neural Networks on a Cubed Sphere, J. Adv. Model. Earth Syst., 12, e2020MS002109, https://doi.org/10.1029/2020MS002109, 2020.
Wilks, D. S.: A skill score based on economic value for probability forecasts, Meteorol. Appl., 8, 209–219, https://doi.org/10.1017/S1350482701002092, 2001.
Wilks, D. S.: “The Stippling Shows Statistically Significant Grid Points”: How Research Results are Routinely Overstated and Overinterpreted, and What to Do about It, B. Am. Meteorol. Soc., 97, 2263–2273, https://doi.org/10.1175/BAMS-D-15-00267.1, 2016.
Xiong, S., Zhao, T., Guo, C., Tian, Y., Yang, F., Chen, W., and Chen, X.: Evaluation and attribution of trends in compound dry-hot events for major river basins in China, Sci. China Earth Sci., 67, 79–91, https://doi.org/10.1007/s11430-022-1174-7, 2024.
Xu, H., Duan, Y., and Xu, X.: Evaluating AI's capability to reflect physical mechanisms: a case study of tropical cyclone impacts on extreme rainfall, Environ. Res. Lett., 19, 104006, https://doi.org/10.1088/1748-9326/ad6fbb, 2024a.
Xu, H., Zhao, Y., Zhao, D., Duan, Y., and Xu, X.: Improvement of disastrous extreme precipitation forecasting in North China by Pangu-weather AI-driven regional WRF model, Environ. Res. Lett., 19, 054051, https://doi.org/10.1088/1748-9326/ad41f0, 2024b.
Xu, H., Zhao, Y., Dajun, Z., Duan, Y., Xu, X., Xu, H., Zhao, Y., Dajun, Z., Duan, Y., and Xu, X.: Exploring the typhoon intensity forecasting through integrating AI weather forecasting with regional numerical weather model, npj Clim. Atmos. Sci., 8, 1–10, https://doi.org/10.1038/s41612-025-00926-z, 2025.
Zhao, T., Xiong, S., Wang, J., Liu, Z., Tian, Y., Yan, D., Zhang, Y., Chen, X., and Wang, H.: A Two-Stage Framework for Bias and Reliability Tests of Ensemble Hydroclimatic Forecasts, Water Resour. Res., 58, e2022WR032568, https://doi.org/10.1029/2022WR032568, 2022.
Zhao, T., Xiong, S., Tian, Y., Wu, Y., Li, B., and Chen, X.: Compound dry and hot events over major river basins of the world from 1921 to 2020, Weather Clim. Ext., 44, 100679, https://doi.org/10.1016/j.wace.2024.100679, 2024.
Zhong, X., Chen, L., Liu, J., Lin, C., Qi, Y., and Li, H.: FuXi-Extreme: Improving extreme rainfall and wind forecasts with diffusion model, Sci. China Earth Sci., 67, 3696–3708, https://doi.org/10.1007/s11430-023-1427-x, 2024.
Short summary
The recent WeatherBench 2 provides a versatile framework for the verification of deterministic and ensemble forecasts. In this paper, we present an explicit extension to binary forecasts of hydroclimatic extremes. Seventeen verification metrics for binary forecasts are employed, and scorecards are generated to showcase the predictive performance. The extension facilitates more comprehensive comparisons of hydroclimatic forecasts and provides useful information for forecast applications.
The recent WeatherBench 2 provides a versatile framework for the verification of deterministic...