Preprints
https://doi.org/10.5194/gmd-2022-206
https://doi.org/10.5194/gmd-2022-206
Submitted as: model evaluation paper
21 Nov 2022
Submitted as: model evaluation paper | 21 Nov 2022
Status: this preprint is currently under review for the journal GMD.

Modeling river water temperature with limiting forcing data: air2stream v1.0.0, machine learning and multiple regression

Manuel C. Almeida and Pedro S. Coelho Manuel C. Almeida and Pedro S. Coelho
  • Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Mare – Centro de Ciências do Mar e do Ambiente, Lisboa, 2825 - 516, Portugal

Abstract. The prediction of river water temperature (WT) is of key importance in the field of environmental science. Water temperature datasets for low order rivers are often in short supply, leaving lake/reservoir water quality modelers with the challenge of extracting as much information as possible from existing datasets, usually without the use of physically based models, due to the significant amount of data required (e.g., river morphology, degree of shading, wind velocity). In this study, five models are used to predict the water temperature of 83 rivers (with 98 % missing data): three machine-learning (ML) algorithms (Random Forest, Artificial Neural Network and Support Vector Regression), the hybrid Air2stream model with all available parameterizations and a Multiple Regression. The machine learning hyperparameters were optimized with a Tree-structured Parzen Estimators algorithm and the results of each model are presented as an ensemble of 12 individual optimized model runs. The meteorological datasets were obtained from the fifth-generation atmospheric reanalysis, ERA5. In general terms, the results of the study demonstrate the vital importance of hyperparameter optimization and suggest that, from a practical modeling perspective, when the number of predictor variables and observed river WT values are limited, the application of all the models considered in this study is relevant (models ensemble mean annual – Root mean square error (RMSE): 2.75 ºC ± 1.00; Nash-Sutcliffe efficiency (NSE): 0.56 ± 0.48). The model that performed best was Random Forest (annual mean - RMSE: 3.18 ºC ± 1.06; NSE: 0.52 ± 0.23). The results also revealed the existence of a logarithmic correlation among the RMSE between the observed and predicted river WT and the watershed time of concentration. The RMSE increases by an average of 0.1 ºC with a one-hour increase in the watershed time of concentration. (watershed area: μ= 106 km2; σ=153).

Manuel C. Almeida and Pedro S. Coelho

Status: open (until 16 Jan 2023)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse

Manuel C. Almeida and Pedro S. Coelho

Manuel C. Almeida and Pedro S. Coelho

Viewed

Total article views: 169 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
138 25 6 169 3 2
  • HTML: 138
  • PDF: 25
  • XML: 6
  • Total: 169
  • BibTeX: 3
  • EndNote: 2
Views and downloads (calculated since 21 Nov 2022)
Cumulative views and downloads (calculated since 21 Nov 2022)

Viewed (geographical distribution)

Total article views: 165 (including HTML, PDF, and XML) Thereof 165 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 06 Dec 2022
Download
Short summary
Water temperature (WT) datasets of low order rivers are commonly scarce. In this study, five different models are used to predict the WT of 83 rivers. Generally, the results show that the models hyperparameter optimization is essential and that to minimize the prediction error it is relevant to apply all the models considered in this study. Results also show that there is a logarithmic correlation among the error of the predicted river WT and the watersheds time of concentration.