the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Ensemble of optimised machine learning algorithms for predicting surface soil moisture content at global scale
Qianqian Han
Yijian Zeng
Lijie Zhang
Calimanut-Ionut Cira
Egor Prikaziuk
Ting Duan
Chao Wang
Brigitta Szabó
Salvatore Manfreda
Ruodan Zhuang
Bob Su
Abstract. Accurate information on surface soil moisture (SSM) content at a global scale under different climatic conditions is important for hydrological and climatological applications. Machine learning (ML) based systematic integration of in-situ hydrological measurements, complex environmental and climate data and satellite observation facilitate to generate the best data products to monitor and analyse the exchanges of water, energy and carbon in the Earth system at a proper space-time resolution. This study investigates the estimation of daily SSM using eight optimised ML algorithms and ten ensemble models (constructed via model bootstrap aggregating techniques and five-fold cross-validation). The algorithmic implementations were trained and tested using the international soil moisture network (ISMN) data collected from 1722 stations distributed across the World. The result showed that K-neighbours Regressor (KNR) performs best on “test_random” set, while Random Forest Regressor (RFR) performs best on “test_temporal” and “test_independent-stations”. Independent evaluation on novel stations across different climate zones was conducted. For the optimised ML algorithms, the median RMSEs were below 0.1 cm3/cm3. GradientBoosting (GB), Multi-layer Perceptron Regressor (MLPR), Stochastic Gradient Descent Regressor (SGDR), and Random Forest Regressor (RFR) achieved a median r score of 0.6 in twelve, eleven, nine and nine climate zones, respectively, out of fifteen climate zones. The performance of ensemble models improved significantly with the median value of RMSE below 0.075 cm3/cm3 for all climate zones . All voting regressors achieved the r scores of above 0.6 in thirteen climate zones except BSh and BWh because of the sparse distribution of training stations. The metrical evaluation showed that ensemble models can improve the performance of single ML algorithms and achieve more stable results. Based on the results computed for three different test sets, the ensemble model with KNR, RFR and XB performed the best. Overall, our investigation shows that ensemble machine learning algorithms have a greater capability for predicting SSM compared to the optimised, or base ML algorithms, and indicates their huge potential applicability in estimating water cycle budgets, managing irrigation and predicting crop yields.
- Preprint
(1776 KB) - Metadata XML
-
Supplement
(255 KB) - BibTeX
- EndNote
Qianqian Han et al.
Status: closed
-
RC1: 'Comment on gmd-2023-83', Anonymous Referee #1, 12 Jul 2023
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2023-83/gmd-2023-83-RC1-supplement.pdf
-
AC2: 'Reply on RC1', Qianqian Han, 04 Aug 2023
Thanks for your helpful comments and suggestions. We are working on these and will submit the revised manuscript and point to point response soon.
Citation: https://doi.org/10.5194/gmd-2023-83-AC2 -
AC4: 'Reply on AC2', Qianqian Han, 15 Aug 2023
Publisher’s note: the content of this comment was removed on 15 August 2023 since the comment was posted by mistake.
Citation: https://doi.org/10.5194/gmd-2023-83-AC4
-
AC4: 'Reply on AC2', Qianqian Han, 15 Aug 2023
- AC6: 'Reply on RC1', Qianqian Han, 15 Aug 2023
-
AC2: 'Reply on RC1', Qianqian Han, 04 Aug 2023
-
RC2: 'Comment on gmd-2023-83', Anonymous Referee #2, 25 Jul 2023
Based on global-scale data, this study considered the use of integrated machine learning models to investigate the estimation of daily soil moisture (SSM). The results of the study demonstrate that the integrated machine learning algorithm outperforms both the optimization and base machine learning algorithms in predicting SSM. The topic of SSM estimation is of significant importance. I have the following suggestions and questions to further enhance the current manuscript:
- I would suggest including a literature review on data selection in the introduction, while reserving the Data section solely for describing the data used and the data preprocessing.
- This study investigates the utilization of integrated machine learning models. However, it raises the question of whether three base models in the integrated model are optimal. Have the authors considered the possibility of adjusting the number of models, such as exploring whether a model integrated with two machine learning models may yield better performance?
- Page 3, Line 80: “(ii)justify … model”. I would suggest adding experimental results or analyses in this area, such as manipulating the effect of a predictor on the results by increasing or decreasing its influence.
- Page 12, Line 255-271. The authors chose eight machine learning models based on their popularity and performance, but it appears that there is a significant amount of research applying artificial neural networks (ANN) or using them as a baseline (Uthayakumar et al., 2022; Senyurek et al., 2020; Liu et al., 2020...). Surprisingly, the authors did not include ANN in their selection.
- The manuscript is too long and needs to be reduced, and the authors of the figures and tables need to be revised and optimized.
Citation: https://doi.org/10.5194/gmd-2023-83-RC2 -
AC3: 'Reply on RC2', Qianqian Han, 04 Aug 2023
Thanks for your helpful comments and suggestions. We are working on these and will submit the revised manuscript and point to point response soon.
Citation: https://doi.org/10.5194/gmd-2023-83-AC3 -
AC5: 'Reply on AC3', Qianqian Han, 15 Aug 2023
Publisher’s note: the content of this comment was removed on 15 August 2023 since the comment was posted by mistake.
Citation: https://doi.org/10.5194/gmd-2023-83-AC5
-
AC5: 'Reply on AC3', Qianqian Han, 15 Aug 2023
- AC7: 'Reply on RC2', Qianqian Han, 15 Aug 2023
-
CEC1: 'Comment on gmd-2023-83', Juan Antonio Añel, 31 Jul 2023
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that the code that you have stored in the Zenodo repository does not include a license. In the repository, it reads that the license is "other"; however, the license is not included among the uploaded files.If you do not include a license, the code continues to be your property and can not be used by others. As a consequence, nobody can try to reproduce your work. Therefore, you need to include a license in the repository. You could want to choose a free software/open-source (FLOSS) license. We recommend the GPLv3. You only need to include the file 'https://www.gnu.org/licenses/gpl-3.0.txt' as LICENSE.txt with your code. Also, you can choose other options that Zenodo provides: GPLv2, Apache License, MIT License, etc.
Therefore, please, solve it and reply to this comment when you have done it.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/gmd-2023-83-CEC1 -
AC1: 'Reply on CEC1', Qianqian Han, 31 Jul 2023
Dear Juan A. Añel,
Thanks for your help us check. We added the "https://www.gnu.org/licenses/gpl-3.0.txt" as LICENSE.txt, and the updated DOI is https://doi.org/10.5281/zenodo.8198978. We will update the DOI in our revised manuscript as well.
Best wishes
Qianqian Han on behalf of all Co-Authors
Citation: https://doi.org/10.5194/gmd-2023-83-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 31 Jul 2023
Dear authors,
Many thanks for addressing the issue expeditiously.
Regards,
Juan A. Añel
Geosci. Model Dev. Exec. Editor
Citation: https://doi.org/10.5194/gmd-2023-83-CEC2
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 31 Jul 2023
-
AC1: 'Reply on CEC1', Qianqian Han, 31 Jul 2023
Status: closed
-
RC1: 'Comment on gmd-2023-83', Anonymous Referee #1, 12 Jul 2023
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2023-83/gmd-2023-83-RC1-supplement.pdf
-
AC2: 'Reply on RC1', Qianqian Han, 04 Aug 2023
Thanks for your helpful comments and suggestions. We are working on these and will submit the revised manuscript and point to point response soon.
Citation: https://doi.org/10.5194/gmd-2023-83-AC2 -
AC4: 'Reply on AC2', Qianqian Han, 15 Aug 2023
Publisher’s note: the content of this comment was removed on 15 August 2023 since the comment was posted by mistake.
Citation: https://doi.org/10.5194/gmd-2023-83-AC4
-
AC4: 'Reply on AC2', Qianqian Han, 15 Aug 2023
- AC6: 'Reply on RC1', Qianqian Han, 15 Aug 2023
-
AC2: 'Reply on RC1', Qianqian Han, 04 Aug 2023
-
RC2: 'Comment on gmd-2023-83', Anonymous Referee #2, 25 Jul 2023
Based on global-scale data, this study considered the use of integrated machine learning models to investigate the estimation of daily soil moisture (SSM). The results of the study demonstrate that the integrated machine learning algorithm outperforms both the optimization and base machine learning algorithms in predicting SSM. The topic of SSM estimation is of significant importance. I have the following suggestions and questions to further enhance the current manuscript:
- I would suggest including a literature review on data selection in the introduction, while reserving the Data section solely for describing the data used and the data preprocessing.
- This study investigates the utilization of integrated machine learning models. However, it raises the question of whether three base models in the integrated model are optimal. Have the authors considered the possibility of adjusting the number of models, such as exploring whether a model integrated with two machine learning models may yield better performance?
- Page 3, Line 80: “(ii)justify … model”. I would suggest adding experimental results or analyses in this area, such as manipulating the effect of a predictor on the results by increasing or decreasing its influence.
- Page 12, Line 255-271. The authors chose eight machine learning models based on their popularity and performance, but it appears that there is a significant amount of research applying artificial neural networks (ANN) or using them as a baseline (Uthayakumar et al., 2022; Senyurek et al., 2020; Liu et al., 2020...). Surprisingly, the authors did not include ANN in their selection.
- The manuscript is too long and needs to be reduced, and the authors of the figures and tables need to be revised and optimized.
Citation: https://doi.org/10.5194/gmd-2023-83-RC2 -
AC3: 'Reply on RC2', Qianqian Han, 04 Aug 2023
Thanks for your helpful comments and suggestions. We are working on these and will submit the revised manuscript and point to point response soon.
Citation: https://doi.org/10.5194/gmd-2023-83-AC3 -
AC5: 'Reply on AC3', Qianqian Han, 15 Aug 2023
Publisher’s note: the content of this comment was removed on 15 August 2023 since the comment was posted by mistake.
Citation: https://doi.org/10.5194/gmd-2023-83-AC5
-
AC5: 'Reply on AC3', Qianqian Han, 15 Aug 2023
- AC7: 'Reply on RC2', Qianqian Han, 15 Aug 2023
-
CEC1: 'Comment on gmd-2023-83', Juan Antonio Añel, 31 Jul 2023
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that the code that you have stored in the Zenodo repository does not include a license. In the repository, it reads that the license is "other"; however, the license is not included among the uploaded files.If you do not include a license, the code continues to be your property and can not be used by others. As a consequence, nobody can try to reproduce your work. Therefore, you need to include a license in the repository. You could want to choose a free software/open-source (FLOSS) license. We recommend the GPLv3. You only need to include the file 'https://www.gnu.org/licenses/gpl-3.0.txt' as LICENSE.txt with your code. Also, you can choose other options that Zenodo provides: GPLv2, Apache License, MIT License, etc.
Therefore, please, solve it and reply to this comment when you have done it.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/gmd-2023-83-CEC1 -
AC1: 'Reply on CEC1', Qianqian Han, 31 Jul 2023
Dear Juan A. Añel,
Thanks for your help us check. We added the "https://www.gnu.org/licenses/gpl-3.0.txt" as LICENSE.txt, and the updated DOI is https://doi.org/10.5281/zenodo.8198978. We will update the DOI in our revised manuscript as well.
Best wishes
Qianqian Han on behalf of all Co-Authors
Citation: https://doi.org/10.5194/gmd-2023-83-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 31 Jul 2023
Dear authors,
Many thanks for addressing the issue expeditiously.
Regards,
Juan A. Añel
Geosci. Model Dev. Exec. Editor
Citation: https://doi.org/10.5194/gmd-2023-83-CEC2
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 31 Jul 2023
-
AC1: 'Reply on CEC1', Qianqian Han, 31 Jul 2023
Qianqian Han et al.
Qianqian Han et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
611 | 142 | 32 | 785 | 31 | 6 | 6 |
- HTML: 611
- PDF: 142
- XML: 32
- Total: 785
- Supplement: 31
- BibTeX: 6
- EndNote: 6
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1