Preprints
https://doi.org/10.5194/gmd-2020-429
https://doi.org/10.5194/gmd-2020-429

Submitted as: development and technical paper 27 Jan 2021

Submitted as: development and technical paper | 27 Jan 2021

Review status: this preprint is currently under review for the journal GMD.

GP-SWAT (v1.0): a two-layer graph-based parallel simulation framework for the SWAT model

Dejian Zhang2,3,, Bingqing Lin1,, Jiefeng Wu4, and Qiaoying Lin1 Dejian Zhang et al.
  • 1Department of Resources and Environmental Sciences, Quanzhou Normal University, Donghai Street 398, Quanzhou, Fujian 362000, China
  • 2College of Computer and Information Engineering, Xiamen University of Technology, Ligong Road 600, Xiamen, Fujian 361024, China
  • 3Digital Fujian Institute of Big Data for Natural Hazards Monitor, Ligong Road 600, Xiamen, Fujian 361024, China
  • 4School of Hydrology and Water Resources, Nanjing University of Information Science and Technology, Nanjing 210000, China
  • These authors contributed equally to this work.

Abstract. High-fidelity and large-scale hydrological models are increasingly used to investigate the impacts of human activities and climate change on water availability and quality. However, the detailed representations of real-world systems and processes contained in these models inevitably lead to prohibitively high execution times, ranging from minutes to days. This becomes computationally prohibitive or even infeasible when large iterative model simulations are involved. In this study, we propose a generic two-layer model parallelization scheme to reduce the run time of computationally expensive model applications through a combination of model spatial decomposition and the graph-parallel Pregel algorithm. Taking the Soil and Water Assessment Tool (SWAT) as an example, we implemented a generic tool named GP-SWAT, enabling model-level and subbasin-level model parallelization on a Spark computer cluster. We then evaluated GP-SWAT in two sets of experiments to demonstrate the potential of GP-SWAT to accelerate single and iterative model simulations and to run in different environments. In each test set, Spark-SWAT was applied for the parallel simulation of eight synthetic hydrological models with different input/output (I/O) burdens and river network characteristics. The experimental results indicate that GP-SWAT can effectively solve high-computational-demand problems of the SWAT model. In addition, as a scalable and flexible tool, it can be run in diverse environments, from a commodity computer running the Microsoft Windows operating system to a Spark cluster consisting of a large number of computational nodes. Moreover, it is possible to apply this generic scheme to other subbasin-based hydrological models or even acyclic models in other domains to alleviate input/output (I/O) demands and optimize model computational performance.

Dejian Zhang et al.

Status: open (until 04 Apr 2021)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse

Dejian Zhang et al.

Data sets

Test models used to assess the performance of GP-SWAT Dejian Zhang, Bingqing Lin, Jiefeng Wu, and Qiaoying Lin https://doi.org/10.5281/zenodo.4447969

Model code and software

Source code of GP-SWAT Dejian Zhang, Bingqing Lin, Jiefeng Wu, and Qiaoying Lin https://doi.org/10.5281/zenodo.4447969

Dejian Zhang et al.

Viewed

Total article views: 192 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
160 30 2 192 0 1
  • HTML: 160
  • PDF: 30
  • XML: 2
  • Total: 192
  • BibTeX: 0
  • EndNote: 1
Views and downloads (calculated since 27 Jan 2021)
Cumulative views and downloads (calculated since 27 Jan 2021)

Viewed (geographical distribution)

Total article views: 159 (including HTML, PDF, and XML) Thereof 158 with geography defined and 1 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 05 Mar 2021
Download
Short summary
GP-SWAT is a two-layer model parallelization tool for SWAT model based on the graph-parallel Pregel algorithm. It can be employed to perform both individual and iterative model parallelization, endowing it with a range of possible applications and great flexibility in maximizing performance. As a flexible and scalable tool, it can run in diverse environments, ranging from a commodity computer with a Microsoft Windows, Mac, or Linux OS to a Spark cluster consisting of a large number of nodes.