Articles | Volume 19, issue 7
https://doi.org/10.5194/gmd-19-2657-2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/gmd-19-2657-2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Validation strategies for deep learning-based groundwater level time series prediction using exogenous meteorological input features
Institute of Applied Geosciences, Division of Hydrogeology, Karlsruhe Institute of Technology, Karlsruhe, Germany
Tanja Liesch
Institute of Applied Geosciences, Division of Hydrogeology, Karlsruhe Institute of Technology, Karlsruhe, Germany
Maria Wetzel
Federal Institute for Geosciences and Natural Resources, Berlin, Germany
Stefan Kunz
Federal Institute for Geosciences and Natural Resources, Berlin, Germany
Stefan Broda
Federal Institute for Geosciences and Natural Resources, Berlin, Germany
Related authors
No articles found.
Marc Ohmer, Tanja Liesch, Bastian Habbel, Benedikt Heudorfer, Mariana Gomez, Patrick Clos, Maximilian Nölscher, and Stefan Broda
Earth Syst. Sci. Data, 18, 77–95, https://doi.org/10.5194/essd-18-77-2026, https://doi.org/10.5194/essd-18-77-2026, 2026
Short summary
Short summary
We present a public dataset of weekly groundwater levels from more than 3000 wells across Germany, spanning 32 years. It combines weather data and site-specific environmental information to support forecasting groundwater changes. Three benchmark models of varying complexity show how data and modeling approaches influence predictions. This resource promotes open, reproducible research and helps guide future water management decisions.
Tanja Liesch and Marc Ohmer
EGUsphere, https://doi.org/10.5194/egusphere-2025-4048, https://doi.org/10.5194/egusphere-2025-4048, 2025
Short summary
Short summary
We studied how to add site information to deep learning models that predict groundwater levels at many wells at once. Using data from Germany, we compared four simple ways to combine time varying weather with time invariant site characteristics. All methods gave similar average accuracy. Repeating site data at each time step was slightly best but used more computer power. The quality of site information mattered more than the method, guiding future model design.
Marc Ohmer and Tanja Liesch
EGUsphere, https://doi.org/10.5194/egusphere-2025-4055, https://doi.org/10.5194/egusphere-2025-4055, 2025
Short summary
Short summary
We compared global vs. local deep learning models for groundwater level prediction using ~3,000 wells. Unlike surface water, groundwater is complex and data-scarce. Results: global models show no systematic accuracy advantage over local ones. Data similarity matters more than quantity for better predictions. Successful groundwater modeling requires strategies tailored to these unique complexities, not just larger datasets.
Stefan Kunz, Alexander Schulz, Maria Wetzel, Maximilian Nölscher, Teodor Chiaburu, Felix Biessmann, and Stefan Broda
Hydrol. Earth Syst. Sci., 29, 3405–3433, https://doi.org/10.5194/hess-29-3405-2025, https://doi.org/10.5194/hess-29-3405-2025, 2025
Short summary
Short summary
Accurate groundwater level predictions are crucial for sustainable management. This study applies two machine learning models – Neural Hierarchical Interpolation for Time Series Forecasting (N-HiTS) and the Temporal Fusion Transformer (TFT) – to forecast seasonal groundwater levels for 5288 wells across Germany. N-HiTS outperformed TFT, with both models performing well in diverse hydrogeological settings, particularly in lowlands with distinct seasonal dynamics.
Raoul A. Collenteur, Ezra Haaf, Mark Bakker, Tanja Liesch, Andreas Wunsch, Jenny Soonthornrangsan, Jeremy White, Nick Martin, Rui Hugman, Ed de Sousa, Didier Vanden Berghe, Xinyang Fan, Tim J. Peterson, Jānis Bikše, Antoine Di Ciacca, Xinyue Wang, Yang Zheng, Maximilian Nölscher, Julian Koch, Raphael Schneider, Nikolas Benavides Höglund, Sivarama Krishna Reddy Chidepudi, Abel Henriot, Nicolas Massei, Abderrahim Jardani, Max Gustav Rudolph, Amir Rouhani, J. Jaime Gómez-Hernández, Seifeddine Jomaa, Anna Pölz, Tim Franken, Morteza Behbooei, Jimmy Lin, and Rojin Meysami
Hydrol. Earth Syst. Sci., 28, 5193–5208, https://doi.org/10.5194/hess-28-5193-2024, https://doi.org/10.5194/hess-28-5193-2024, 2024
Short summary
Short summary
We show the results of the 2022 Groundwater Time Series Modelling Challenge; 15 teams applied data-driven models to simulate hydraulic heads, and three model groups were identified: lumped, machine learning, and deep learning. For all wells, reasonable performance was obtained by at least one team from each group. There was not one team that performed best for all wells. In conclusion, the challenge was a successful initiative to compare different models and learn from each other.
Mariana Gomez, Maximilian Nölscher, Andreas Hartmann, and Stefan Broda
Hydrol. Earth Syst. Sci., 28, 4407–4425, https://doi.org/10.5194/hess-28-4407-2024, https://doi.org/10.5194/hess-28-4407-2024, 2024
Short summary
Short summary
To understand the impact of external factors on groundwater level modelling using a 1-D convolutional neural network (CNN) model, we train, validate, and tune individual CNN models for 505 wells distributed across Lower Saxony, Germany. We then evaluate the performance of these models against available geospatial and time series features. This study provides new insights into the relationship between these factors and the accuracy of groundwater modelling.
Andreas Wunsch, Tanja Liesch, and Nico Goldscheider
Hydrol. Earth Syst. Sci., 28, 2167–2178, https://doi.org/10.5194/hess-28-2167-2024, https://doi.org/10.5194/hess-28-2167-2024, 2024
Short summary
Short summary
Seasons have a strong influence on groundwater levels, but relationships are complex and partly unknown. Using data from wells in Germany and an explainable machine learning approach, we showed that summer precipitation is the key factor that controls the severeness of a low-water period in fall; high summer temperatures do not per se cause stronger decreases. Preceding winters have only a minor influence on such low-water periods in general.
Benedikt Heudorfer, Tanja Liesch, and Stefan Broda
Hydrol. Earth Syst. Sci., 28, 525–543, https://doi.org/10.5194/hess-28-525-2024, https://doi.org/10.5194/hess-28-525-2024, 2024
Short summary
Short summary
We build a neural network to predict groundwater levels from monitoring wells. We predict all wells at the same time, by learning the differences between wells with static features, making it an entity-aware global model. This works, but we also test different static features and find that the model does not use them to learn exactly how the wells are different, but only to uniquely identify them. As this model class is not actually entity aware, we suggest further steps to make it so.
Guillaume Cinkus, Naomi Mazzilli, Hervé Jourde, Andreas Wunsch, Tanja Liesch, Nataša Ravbar, Zhao Chen, and Nico Goldscheider
Hydrol. Earth Syst. Sci., 27, 2397–2411, https://doi.org/10.5194/hess-27-2397-2023, https://doi.org/10.5194/hess-27-2397-2023, 2023
Short summary
Short summary
The Kling–Gupta Efficiency (KGE) is a performance criterion extensively used to evaluate hydrological models. We conduct a critical study on the KGE and its variant to examine counterbalancing errors. Results show that, when assessing a simulation, concurrent over- and underestimation of discharge can lead to an overall higher criterion score without an associated increase in model relevance. We suggest that one carefully choose performance criteria and use scaling factors.
Guillaume Cinkus, Andreas Wunsch, Naomi Mazzilli, Tanja Liesch, Zhao Chen, Nataša Ravbar, Joanna Doummar, Jaime Fernández-Ortega, Juan Antonio Barberá, Bartolomé Andreo, Nico Goldscheider, and Hervé Jourde
Hydrol. Earth Syst. Sci., 27, 1961–1985, https://doi.org/10.5194/hess-27-1961-2023, https://doi.org/10.5194/hess-27-1961-2023, 2023
Short summary
Short summary
Numerous modelling approaches can be used for studying karst water resources, which can make it difficult for a stakeholder or researcher to choose the appropriate method. We conduct a comparison of two widely used karst modelling approaches: artificial neural networks (ANNs) and reservoir models. Results show that ANN models are very flexible and seem great for reproducing high flows. Reservoir models can work with relatively short time series and seem to accurately reproduce low flows.
Maria Wetzel, Thomas Kempka, and Michael Kühn
Adv. Geosci., 58, 1–10, https://doi.org/10.5194/adgeo-58-1-2022, https://doi.org/10.5194/adgeo-58-1-2022, 2022
Short summary
Short summary
Porosity-permeability relations are simulated for a precipitation-dissolution cycle in a virtual sandstone. A hysteresis in permeability is observed depending on the geochemical process and dominating reaction regime, whereby permeability varies by more than two orders of magnitude. Controlling parameters for this hysteresis phenomenon are the closure and re-opening of micro-scale flow channels, derived from changes in pore throat diameter and connectivity of the pore network.
Marc Ohmer, Tanja Liesch, and Andreas Wunsch
Hydrol. Earth Syst. Sci., 26, 4033–4053, https://doi.org/10.5194/hess-26-4033-2022, https://doi.org/10.5194/hess-26-4033-2022, 2022
Short summary
Short summary
We present a data-driven approach to select optimal locations for groundwater monitoring wells. The applied approach can optimize the number of wells and their location for a network reduction (by ranking wells in order of their information content and reducing redundant) and extension (finding sites with great information gain) or both. It allows us to include a cost function to account for more/less suitable areas for new wells and can help to obtain maximum information content for a budget.
Andreas Wunsch, Tanja Liesch, Guillaume Cinkus, Nataša Ravbar, Zhao Chen, Naomi Mazzilli, Hervé Jourde, and Nico Goldscheider
Hydrol. Earth Syst. Sci., 26, 2405–2430, https://doi.org/10.5194/hess-26-2405-2022, https://doi.org/10.5194/hess-26-2405-2022, 2022
Short summary
Short summary
Modeling complex karst water resources is difficult enough, but often there are no or too few climate stations available within or close to the catchment to deliver input data for modeling purposes. We apply image recognition algorithms to time-distributed, spatially gridded meteorological data to simulate karst spring discharge. Our models can also learn the approximate catchment location of a spring independently.
Morgan Tranter, Maria Wetzel, Marco De Lucia, and Michael Kühn
Adv. Geosci., 56, 57–65, https://doi.org/10.5194/adgeo-56-57-2021, https://doi.org/10.5194/adgeo-56-57-2021, 2021
Short summary
Short summary
Barite formation is an important factor for many use cases of the geological subsurface because it may change the rock.
In this modelling study, the replacement reaction of celestite to barite is investigated.
The steps that were identified to play a role are celestite dissolution followed by two-step precipitation of barite: spontaneous formation of small crystals and their subsequent growth.
Explicitly including the processes improve the usability of the models for quantified prediction.
Cited articles
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mane, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viegas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, https://www.tensorflow.org/ (last access: 14 April 2025), 2015. a
Ahmadi, A., Olyaei, M., Heydari, Z., Emami, M., Zeynolabedin, A., Ghomlaghi, A., Daccache, A., Fogg, G. E., and Sadegh, M.: Groundwater Level Modeling with Machine Learning: A Systematic Review and Meta-Analysis, Water, 14, 949, https://doi.org/10.3390/w14060949, 2022. a, b
Arlot, S. and Celisse, A.: A survey of cross-validation procedures for model selection, Statistics Surveys, 4, 40–79, https://doi.org/10.1214/09-SS054, 2010. a, b, c
Bergmeir, C. and Benítez, J. M.: Forecaster performance evaluation with cross-validation and variants, in: 2011 11th International Conference on Intelligent Systems Design and Applications, IEEE, 849–854, https://doi.org/10.1109/ISDA.2011.6121763, 2011. a, b, c, d
Bossche, J. V. d., Jordahl, K., Fleischmann, M., McBride, J., Wasserman, J., Richards, M., Badaracco, A. G., Snow, A. D., Tratner, J., Gerard, J., Ward, B., Perry, M., Farmer, C., Hjelle, G. A., Taves, M., Hoeven, E. t., Cochran, M., rraymondgh, Gillies, S., Caria, G., Culbertson, L., Bartos, M., Eubank, N., Bell, R., sangarshanan, Flavin, J., Rey, S., maxalbert, Bilogur, A., and Ren, C.: geopandas/geopandas: v0.13.2, Zenodo [code], https://doi.org/10.5281/zenodo.8009629, 2023. a
Castilho, C. M.: Time Series Forecasting with exogenous factors: Statistical vs. Machine Learning approaches, PhD thesis, Faculdade de Economia, Universidade do Porto, https://repositorio-aberto.up.pt/bitstream/10216/141197/2/433647.pdf (last access: 10 March 2025), 2020. a
Cerqueira, V., Torgo, L., Smailović, J., and Mozetič, I.: A Comparative Study of Performance Estimation Methods for Time Series Forecasting, in: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), IEEE, 529–538, https://doi.org/10.1109/DSAA.2017.7, 2017. a, b, c
Derbela, M. and Nouiri, I.: Intelligent approach to predict future groundwater level based on artificial neural networks (ANN), Euro-Mediterranean Journal for Environmental Integration, 5, 51, https://doi.org/10.1007/s41207-020-00185-9, 2020. a
Doll, F., Liesch, T., Wetzel, M., Kunz, S., and Broda, S.: Data and Code to Validation Strategies for Deep Learning-Based Groundwater Level Time Series Prediction Using Exogenous Meteorological Input Features, Zenodo [code, data set], https://doi.org/10.5281/zenodo.18467734, 2026. a, b, c
Gholizadeh, H., Zhang, Y., Frame, J., Gu, X., and Green, C. T.: Long short-term memory models to quantify long-term evolution of streamflow discharge and groundwater depth in Alabama, Science of The Total Environment, 901, 165884, https://doi.org/10.1016/j.scitotenv.2023.165884, 2023. a
Gomez, M., Nölscher, M., Hartmann, A., and Broda, S.: Assessing groundwater level modelling using a 1-D convolutional neural network (CNN): linking model performances to geospatial and time series features, Hydrology and Earth System Sciences, 28, 4407–4425, https://doi.org/10.5194/hess-28-4407-2024, 2024. a, b
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E.: Array programming with NumPy, Nature, 585, 357–362, https://doi.org/10.1038/s41586-020-2649-2, 2020. a
Hunter, J. D.: Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90–95, https://doi.org/10.1109/MCSE.2007.55, 2007. a
Iqbal, M., Ali Naeem, U., Ahmad, A., Rehman, H.-u., Ghani, U., and Farid, T.: Relating groundwater levels with meteorological parameters using ANN technique, Measurement, 166, 108163, https://doi.org/10.1016/j.measurement.2020.108163, 2020. a
Kunz, S., Schulz, A., Wetzel, M., Nölscher, M., Chiaburu, T., Biessmann, F., and Broda, S.: Towards a global spatial machine learning model for seasonal groundwater level predictions in Germany, Hydrology and Earth System Sciences, 29, 3405–3433, https://doi.org/10.5194/hess-29-3405-2025, 2025. a
McQuarrie, A. D. R. and Tsai, C.-L.: Regression and time series model selection, World Scientific, Singapore, ISBN 978-981-238-545-1, https://doi.org/10.1142/3573, 1998. a
Moghaddam, M. A., Ferre, T. P. A., Chen, X., Chen, K., and Ehsani, M. R.: Application of Machine Learning Methods in Inferring Surface Water Groundwater Exchanges using High Temporal Resolution Temperature Measurements, arXiv [preprint], https://doi.org/10.48550/arXiv.2201.00726, 2022. a
Nair, S. S.: Groundwater level forecasting using Artificial Neural Network, Int. J. Sci. Res. Publ., 6, 234–238, 2016. a
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, É.: Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res., 12, 2825–2830, http://jmlr.org/papers/v12/pedregosa11a.html (last access: 7 March 2025), 2011. a, b, c
Racine, J.: Consistent cross-validatory model-selection for dependent data: hv-block cross-validation, Journal of Econometrics, 99, 39–61, https://doi.org/10.1016/S0304-4076(00)00030-0, 2000. a
Rauthe, M., Steiner, H., Riediger, U., Mazurkiewicz, A., and Gratzki, A.: A Central European precipitation climatology – Part I: Generation and validation of a high-resolution gridded daily data set (HYRAS), Meteorologische Zeitschrift, 235–256, https://doi.org/10.1127/0941-2948/2013/0436, 2013. a
Razafimaharo, C., Krähenmann, S., Höpp, S., Rauthe, M., and Deutschländer, T.: New high-resolution gridded dataset of daily mean, minimum, and maximum temperature and relative humidity for Central Europe (HYRAS), Theoretical and Applied Climatology, 142, 1531–1553, https://doi.org/10.1007/s00704-020-03388-w, 2020. a
Seabold, S. and Perktold, J.: Statsmodels: Econometric and Modeling with Python. 9th Python in Science Conference, Austin, 28 June–3 July 2010, 57–61, https://doi.org/10.25080/Majora-92bf1922-011, 2010. a, b
Shen, H., Tolson, B. A., and Mai, J.: Time to Update the Split-Sample Approach in Hydrological Model Calibration, Water Resources Research, 58, e2021WR031523, https://doi.org/10.1029/2021WR031523, 2022. a
Snijders, T. A. B.: On Cross-Validation for Predictor Evaluation in Time Series, in: On Model Uncertainty and its Statistical Implications, edited by: Dijkstra, T. K., Springer, Berlin, Heidelberg, 56–69, ISBN 978-3-642-61564-1, https://doi.org/10.1007/978-3-642-61564-1_4, 1988. a, b
Sun, A. Y., Scanlon, B. R., Zhang, Z., Walling, D., Bhanja, S. N., Mukherjee, A., and Zhong, Z.: Combining Physically Based Modeling and Deep Learning for Fusing GRACE Satellite Data: Can We Learn From Mismatch?, Water Resources Research, 55, 1179–1195, https://doi.org/10.1029/2018WR023333, 2019. a
Tao, H., Hameed, M. M., Marhoon, H. A., Zounemat-Kermani, M., Heddam, S., Kim, S., Sulaiman, S. O., Tan, M. L., Sa’adi, Z., Mehr, A. D., Allawi, M. F., Abba, S. I., Zain, J. M., Falah, M. W., Jamei, M., Bokde, N. D., Bayatvarkeshi, M., Al-Mukhtar, M., Bhagat, S. K., Tiyasha, T., Khedher, K. M., Al-Ansari, N., Shahid, S., and Yaseen, Z. M.: Groundwater level prediction using machine learning models: A comprehensive review, Neurocomputing, 489, 271–308, https://doi.org/10.1016/j.neucom.2022.03.014, 2022. a, b, c
Tashman, L. J.: Out-of-sample tests of forecasting accuracy: an analysis and review, International Journal of Forecasting, 16, 437–450, https://doi.org/10.1016/S0169-2070(00)00065-0, 2000. a, b
The Pandas Development Team: pandas-dev/pandas: Pandas, Zenodo [code], https://doi.org/10.5281/zenodo.8092754, 2023. a
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C. J., Polat, ., Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., and van Mulbregt, P.: SciPy 1.0: fundamental algorithms for scientific computing in Python, Nature Methods, 17, 261–272, https://doi.org/10.1038/s41592-019-0686-2, 2020. a
Vu, M. T., Jardani, A., Massei, N., and Fournier, M.: Reconstruction of missing groundwater level data by using Long Short-Term Memory (LSTM) deep neural network, Journal of Hydrology, 597, 125776, https://doi.org/10.1016/j.jhydrol.2020.125776, 2021. a
Wunsch, A., Liesch, T., and Broda, S.: Groundwater level forecasting with artificial neural networks: a comparison of long short-term memory (LSTM), convolutional neural networks (CNNs), and non-linear autoregressive networks with exogenous input (NARX), Hydrology and Earth System Sciences, 25, 1671–1687, https://doi.org/10.5194/hess-25-1671-2021, 2021. a, b, c, d, e
Zhang, J., Zhu, Y., Zhang, X., Ye, M., and Yang, J.: Developing a Long Short-Term Memory (LSTM) based model for predicting water table depth in agricultural areas, Journal of Hydrology, 561, 918–929, https://doi.org/10.1016/j.jhydrol.2018.04.065, 2018. a
Zheng, F., Chen, J., Maier, H. R., and Gupta, H.: Achieving Robust and Transferable Performance for Conservation-Based Models of Dynamical Physical Systems, Water Resources Research, 58, e2021WR031818, https://doi.org/10.1029/2021WR031818, 2022. a
Short summary
With the growing use of machine learning for groundwater level (GWL) prediction, proper performance estimation is crucial. This study compares three validation strategies—blocked cross-validation (bl-CV), repeated out-of-sample (repOOS), and out-of-sample (OOS)—for 1D-CNN and LSTM models using meteorological inputs. Results show that bl-CV offers the most reliable performance estimates, while OOS is the most uncertain, highlighting the need for careful method selection.
With the growing use of machine learning for groundwater level (GWL) prediction, proper...