Preprints
https://doi.org/10.5194/gmd-2023-83
https://doi.org/10.5194/gmd-2023-83
Submitted as: model evaluation paper
 | 
08 Jun 2023
Submitted as: model evaluation paper |  | 08 Jun 2023
Status: a revised version of this preprint was accepted for the journal GMD and is expected to appear here in due course.

Ensemble of optimised machine learning algorithms for predicting surface soil moisture content at global scale

Qianqian Han, Yijian Zeng, Lijie Zhang, Calimanut-Ionut Cira, Egor Prikaziuk, Ting Duan, Chao Wang, Brigitta Szabó, Salvatore Manfreda, Ruodan Zhuang, and Bob Su

Abstract. Accurate information on surface soil moisture (SSM) content at a global scale under different climatic conditions is important for hydrological and climatological applications. Machine learning (ML) based systematic integration of in-situ hydrological measurements, complex environmental and climate data and satellite observation facilitate to generate the best data products to monitor and analyse the exchanges of water, energy and carbon in the Earth system at a proper space-time resolution. This study investigates the estimation of daily SSM using eight optimised ML algorithms and ten ensemble models (constructed via model bootstrap aggregating techniques and five-fold cross-validation). The algorithmic implementations were trained and tested using the international soil moisture network (ISMN) data collected from 1722 stations distributed across the World. The result showed that K-neighbours Regressor (KNR) performs best on “test_random” set, while Random Forest Regressor (RFR) performs best on “test_temporal” and “test_independent-stations”. Independent evaluation on novel stations across different climate zones was conducted. For the optimised ML algorithms, the median RMSEs were below 0.1 cm3/cm3. GradientBoosting (GB), Multi-layer Perceptron Regressor (MLPR), Stochastic Gradient Descent Regressor (SGDR), and Random Forest Regressor (RFR) achieved a median r score of 0.6 in twelve, eleven, nine and nine climate zones, respectively, out of fifteen climate zones. The performance of ensemble models improved significantly with the median value of RMSE below 0.075 cm3/cm3 for all climate zones . All voting regressors achieved the r scores of above 0.6 in thirteen climate zones except BSh and BWh because of the sparse distribution of training stations. The metrical evaluation showed that ensemble models can improve the performance of single ML algorithms and achieve more stable results. Based on the results computed for three different test sets, the ensemble model with KNR, RFR and XB performed the best. Overall, our investigation shows that ensemble machine learning algorithms have a greater capability for predicting SSM compared to the optimised, or base ML algorithms, and indicates their huge potential applicability in estimating water cycle budgets, managing irrigation and predicting crop yields.

Qianqian Han et al.

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on gmd-2023-83', Anonymous Referee #1, 12 Jul 2023
    • AC2: 'Reply on RC1', Qianqian Han, 04 Aug 2023
      • AC4: 'Reply on AC2', Qianqian Han, 15 Aug 2023
    • AC6: 'Reply on RC1', Qianqian Han, 15 Aug 2023
  • RC2: 'Comment on gmd-2023-83', Anonymous Referee #2, 25 Jul 2023
    • AC3: 'Reply on RC2', Qianqian Han, 04 Aug 2023
      • AC5: 'Reply on AC3', Qianqian Han, 15 Aug 2023
    • AC7: 'Reply on RC2', Qianqian Han, 15 Aug 2023
  • CEC1: 'Comment on gmd-2023-83', Juan Antonio Añel, 31 Jul 2023
    • AC1: 'Reply on CEC1', Qianqian Han, 31 Jul 2023
      • CEC2: 'Reply on AC1', Juan Antonio Añel, 31 Jul 2023

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on gmd-2023-83', Anonymous Referee #1, 12 Jul 2023
    • AC2: 'Reply on RC1', Qianqian Han, 04 Aug 2023
      • AC4: 'Reply on AC2', Qianqian Han, 15 Aug 2023
    • AC6: 'Reply on RC1', Qianqian Han, 15 Aug 2023
  • RC2: 'Comment on gmd-2023-83', Anonymous Referee #2, 25 Jul 2023
    • AC3: 'Reply on RC2', Qianqian Han, 04 Aug 2023
      • AC5: 'Reply on AC3', Qianqian Han, 15 Aug 2023
    • AC7: 'Reply on RC2', Qianqian Han, 15 Aug 2023
  • CEC1: 'Comment on gmd-2023-83', Juan Antonio Añel, 31 Jul 2023
    • AC1: 'Reply on CEC1', Qianqian Han, 31 Jul 2023
      • CEC2: 'Reply on AC1', Juan Antonio Añel, 31 Jul 2023

Qianqian Han et al.

Qianqian Han et al.

Viewed

Total article views: 785 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
611 142 32 785 31 6 6
  • HTML: 611
  • PDF: 142
  • XML: 32
  • Total: 785
  • Supplement: 31
  • BibTeX: 6
  • EndNote: 6
Views and downloads (calculated since 08 Jun 2023)
Cumulative views and downloads (calculated since 08 Jun 2023)

Viewed (geographical distribution)

Total article views: 767 (including HTML, PDF, and XML) Thereof 767 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 28 Sep 2023
Download
Short summary
Using machine learning, we estimated global surface soil moisture for understanding water, energy, and carbon exchanges. Ensemble models outperformed individual algorithms in predicting soil moisture under different climates. The best performing ensemble model included K-nearest neighbors, random forest, and extreme gradient boosting. These findings have implications for hydrological and climatological applications such as water cycle monitoring, irrigation management, and crop yield prediction.