A methodological framework for improving the performance of data-driven models, a case study for daily runoff prediction in the Maumee domain, U.S.

. Geoscientific models are simplified representations of complex earth and environmental systems (EESs). Compared with physics-based numerical models, data-driven modeling has gained popularity due mainly to data proliferation in EESs and the ability to perform prediction without requiring explicit mathematical representation of complex biophysical processes. However, because of the black-box nature of data-driven models, their performance cannot be guaranteed. To address this issue, we developed a generalizable framework for improving the efficiency and effectiveness of model training and the reduction of 5 model overfitting. This framework consists of two parts: hyperparameter selection based on Sobol global sensitivity analysis, and hyperparameter tuning using a Bayesian optimization approach. We demonstrated the framework efficacy through a case study of daily edge-of-field (EOF) runoff predictions by a tree-based data-driven model using the eXtreme Gradient Boosting (XGBoost) algorithm in the Maumee domain, U.S. This framework contributes towards improving the performance of a variety of data-driven models and can thus help promote their applications in EESs.

wide application of such models. On the contrary, data-driven models do not require an explicit mathematical formulation of all underlying complex processes to perform predictive analysis. Thus, the development of data-driven models is often less involved. Moreover, the proliferation of data further leads to the rise of data-driven modeling in EESs (Willard et al., 2020). 25 For data-driven modeling, model performance relies heavily on the capability of the underlying machine learning (ML) algorithms to retrieve information from data; this capability is controlled by the complexity of ML algorithms and their associated parameters, that is, hyperparameters (Yang and Shami, 2020;Hutter et al., 2015). When the underlying ML algorithm is too simple to learn complex patterns from data, we see large biases in the training phase (i.e., underfitting; Jabbar and Khan, 2015;Koehrsen, 2018). In contrast, model overfitting occurs when the ML algorithm is overcomplicated to capture all random noises 30 in the training data; the resulting model performs very well in training but poorly on the test (i.e., variance error). As such, to improve model performance, we need to determine appropriate ML algorithms that can balance model bias and variance error (Koehrsen, 2018).
There are various rules of thumb to choose an appropriate ML algorithm for data-driven modeling. When the model is underfitting, we can choose more complex ML algorithms (e.g., from linear regression models to tree-based regression models). 35 However, in practice it can be more challenging to reduce overfitting; model overfitting is often associated with a long training time and poor performance in test sets. Because of the black-box nature of data-driven models, only a handful of approaches are available to deal with overfitting. One such approach is random sampling with or without replacement (Gimenez-Nadal et al., 2019) in which data points are randomly selected for training and test. This approach attempts to ensure that the data samples are uniformly distributed: both the training and test sets have data points to represent the entire domain space. Combined with 40 this approach, other approaches such as early stopping (Yao et al., 2007), cross-validation (Fushiki, 2011), and regularization techniques (Zhu et al., 2018) are used to address overfitting by tuning hyperparameters to balance model performance in training and test sets.
Hyperparameters affect model performance through ML algorithms during model training, although they are external parameters to data-driven models. However, not all hyperparameters have the same level of impact on model performance, as 45 they affect different aspects of ML algorithms to retrieve data patterns. For example, some hyperparameters control the algorithm complexity, while some are used to reduce overfitting as mentioned above. By tuning these hyperparameters, we want to identify optimal hyperparameter values for the ML algorithm. We can then apply the optimized ML algorithm to maximize the model performance during training.
Tuning hyperparameters manually becomes unfeasible as the number of hyperparameters associated with the ML algorithm 50 increases. Hyperparameter optimization algorithms are developed to automatically identify the optimal hyperparameters to maximize model performance by minimizing a predefined objective function (i.e., loss function) of a data-driven model. A variety of optimization approaches are available and categorized based on the mechanisms used to search the optimal hyperparameter values: 1) exhaustive search using grid or random search (Liashchynskyi and Liashchynskyi, 2019;Bergstra et al., 2011) and 2) surrogate models using sequential model-based optimization (SMBO) methods (Bergstra et al., 2011). The choice 55 of tuning approaches is affected by several factors, such as the number of hyperparameters, different ranges of their values, and the complexity of ML algorithms. In general, compared with the category I approaches, the category II approaches are more suitable for data-driven models with complex ML algorithms (Bergstra et al., 2011).
Rather than tuning all hyperparameters, it is expected to be more efficient and effective if we only need to tune a subset of them to achieve similar or better model performance. Similarly to assessing the overall impact of model parameters on model 60 prediction for physics-based models, we can use global sensitivity analysis approaches to identify critical hyperparameters for model performance based on sensitivity scores (Sobol, 2001); hyperparameters with high sensitivity scores are considered influential, while the rest with low sensitivity scores are considered to have no or negligible influence on model performance.
Additionally, since fewer influential hyperparameters are involved in model training, less training time is required to achieve maximum model performance. Therefore, it is particularly useful if data-driven models need to be trained with streaming data 65 for real-time predictions (Gomes et al., 2017).
With the proliferation of data in EESs, we expect to have more EES applications using data-driven models. In this study, we present a new framework for data-driven modeling that combines hyperparameter selection and tuning to minimize training time, reduce overfitting, and maximize overall model performance. As such, the fundamental contribution of our work is a framework, which can 1) identify a subset of hyperparameters critical for model performance through hyperparameter selection 70 using a variance-based sensitivity analysis approach, and 2) provide optimal values for the selected hyperparameters through an optimization-based hyperparameter tuning approach. As such, we can improve the overall efficiency and effectiveness of model training, leading to better model performance. In turn, this can further promote the use of data-driven models in EESs.
The efficacy of the framework is evaluated using data-driven models developed to predict the magnitudes of daily surface runoff at a farm scale in the Maumee domain, U.S.

Method
In this study, we developed a framework to improve the performance of data-driven models by reducing their training time and overfitting; the framework comprises two modules, hyperparameter selection (HS) and hyperparameter tuning (HT; Figure   1). To use the framework, we first choose a machine learning algorithm and its associated hyperparameters. Then, we feed the initial hyperparameters (1) to the hyperparameter selection (HS) module to determine the influential hyperparameters (2). Once 80 initial values are assigned to the influential hyperparameters (4), we use the hyperparameter tuning (HT) module to identify their optimal values (3), which allows the algorithm to achieve the optimal performance in training.
In the following sections, we will discuss the framework in detail, including the use of a global sensitivity analysis approach to select the hyperparameters critical for model performance and an optimization approach for hyperparameter tuning to identify the optimum of these critical hyperparameters for model training. A data-driven model using the eXtreme Gradient where V i is the first order contribution of h i to V (O(Y )), and V ij denotes the variance arising from the interactions between two hyperparameters, h i and h j . We can then measure the influence of a hyperparameter by its contribution to V (O(Y )) using the sensitivity score of the first (S) and total order (ST ) indices as follows: where V ∼i indicates the contribution to V (O (Y )) by all hyperparameters except h i ; S i measures the direct contribution to V (O (Y )) by h i ; ST i measures the contribution by h i and its interactions, of any order, with all other hyperparameters.
To estimate S and ST , we first generated sufficient samples of the hyperparameters that can well represent the sample space.
We chose the Quasi-Monte Carlo sampling method (Owen, 2020), which uses quasi-random numbers (i.e., low discrepancy 105 sequences) to sample points far away from the existing sample points. As such, the sample points can cover the sample space more evenly and quickly with the faster convergence rate of O((logN ) k N −1 ) where N and k are the number and dimension of samples (Campolongo et al., 2011). In total, we generated m samples for n hyperparameters. We then fed the samples into the data-driven model, M to obtain the corresponding O (Y ). Next, we estimated the variance components, in Eq.

110
Finally, we selected the hyperparameters with high scores of the total order index as influential hyperparameters.

Hyperparameter Tuning
After hyperparameter selection, we expect to tune fewer hyperparameters through hyperparameter optimization, which involves the process to maximize or minimize the score of the objective function, O (Y) of a data-driven model, M over the sample space of its hyperparameters, H. As such, we can identify the optimal values of the hyperparameters, which are then used for 115 training data-driven models.
Rather than manually tuning these hyperparameters, we chose to use an automated optimization approach, Bayesian hyperparameter optimization (Bergstra et al., 2011). This approach creates a surrogate model of the objective function using a probabilistic model. The surrogate model avoids solely relying upon local gradient and Hessian approximations by tracking the paired values of the hyperparameters and the corresponding scores of the objective function from previous trials and proposes 120 new hyperparameters that can improve the score based on the Bayes rules. This automated approach requires far less time to identify the optimum of the hyperparameters, as the objective function can converge to a better score faster.
To describe the Bayesian optimization approach in more detail, let us assume that we have evaluated the objective function, O (Y) for n sets of hyperparameters, {h (1) , . . . , h (n) }. Based on pairs of a set of hyperparameters and the corresponding score (h, O(y)) from n evaluations, we applied a sequential Model-Based Optimization (SMBO) method, a Tree-structured Parzen 125 Estimator (TPE) (Bergstra et al., 2011) to develop a surrogate model for O (Y). The TPE defines the conditional probability, p(h| O(y)) using two densities: where l(h) and g(h) can be modeled using different probability density or mass functions for continuous or discrete hyperparameters. For example, l(h) and g(h) can be a uniform, a Gaussian, or a log-uniform distribution for continuous hyperparam- The following step is to decide the next hyperparameter values that possibly give a better score, O (y) given the corresponding uncertainty measured by the surrogate model (Frazier, 2018). To do so, a selection function is defined based on the 135 Expected Improvement (EI): where p(O (y)|h) is parameterized as p(h| O (y))p(O (y)). It is set to zero when O(y) < O(y * ) in order to neglect all hyperparameters that yield no improvements in the score. Through maximizing EI, a better set of hyperparameters is identified.

140
Together with the corresponding score, they are used to update the TPE, l(h), and g(h) for the maximization of EI. The iterative process continues until the maximum allowed number of iterations is reached.

Case Study
The Maumee River Watershed ( in the watershed is mainly composed of glacial till, which is a mixture of clay, silt, sand and gravel deposited by glaciers; this type of soil is highly fertile, but prone to erosion if not managed properly (USDA, 2013). Owing to these excellent geophysical and humid continental climate conditions, over 70% of the watershed is dedicated to agriculture, growing row crops such as 150 corn, soybeans, and wheat (Kalcic et al., 2016). As such, fertilizer applied for crop growth in the watershed contributes over 77% of Total Phosphorus (TP) entering the Western Basin of Lake Erie through the Maumee River (Kast et al., 2021;Maccoux et al., 2016).
Agricultural runoff is the main source of non-point source pollution in the Maumee domain. The high nutrient load carried by edge-of-field (EOF) runoff from agricultural fields in the watershed has had detrimental effects on aquatic ecosystems, 155 such as harmful algal blooms and hypoxia in Lake Erie (Scavia et al., 2019;Stackpoole et al., 2019). The occurrence and magnitude of EOF runoff can be influenced by many factors, but mainly driven by precipitation and snowmelt (Ford et al., 2022;Hu et al., 2021;Hamlin et al., 2020). An early warning system to forecast runoff risk can assist agriculture producers in the timing of fertilizer application to retain more nutrients in the land; this also reduces nutrient transport carried by runoff to nearby water bodies. To design such an early warning system, in the previous study (Hu et al., 2021), we developed a hybrid 160 model to predict the magnitude of daily EOF runoff for all EOF sites in the domain ( Figure 2); the model combines National Oceanic and Atmospheric Administration's (NOAA) National Water Model (NWM) with a data-driven model based on the eXtreme Gradient Boosting algorithm (XGBoost; Chen and Guestrin, 2016). In this study, we demonstrate the efficiency and effectiveness to train XGBoost models using the proposed framework ( Figure 1).

Data preparation 165
In this study, we used two types of datasets to train XGBoost models preceded by different approaches as illustrated in the framework ( Figure 1) and evaluated their performance of daily EOF runoff prediction at each EOF site in the study area. These two datasets include 1) observations of daily EOF runoff at the EOF sites within the Maumee domain ( Figure 2). We obtained this dataset from the conservation partners in the previous study (Hu et al., 2021); 2) daily values of the influential NWM model outputs. Based on the previous work on hybrid modeling using directed information for causal inference (Hu et al.,170 2021), we first calculated daily values of 72 NWM model outputs on the 1 km x 1 km grids where EOF sites are located and then identified seven influential outputs for the Maumee domain (Tables S1 and S2 in Supporting Information). As we did not separate the winter season (i.e., from November to April) from the rest of the year when selecting influential variables, some of the selected influential outputs represented the driving forces of daily EOF runoff during the winter season, such as snow melt and soil temperature (Table S2 in Supporting Information).

Implementation
eXtreme Gradient Boosting (XGBoost) is a tree-based ensemble machine learning algorithm, which is mainly designed for its overall high convergence speed through optimal use of memory resources, as well as good predictability through ensemble learning that leverages the combined predictive power of multiple tree models (Chen and Guestrin, 2016). Using the gradient boosting technique, XGBoost incorporates a new tree model (i.e., weak learner) into the tree ensemble models obtained from 180 previous iterations in a repetitive manner. At the tth iteration, an objective function, J is defined as: where n is the number of samples and L is the training loss function;ŷ i is the prediction from the tree ensemble models F; and m is the number of tree models; R is the regularization function used to penalize the complexity of the tree ensemble models to reduce model overfitting. All these functions are characterized by a set of hyperparameters, e.g., 185 learning rate (LR) and maximum tree depth (MD). Through optimizing J, an XGBoost model can be obtained with locally optimal hyperparameter values, which gives the best predictive performance at the ith iteration. The process iterates for a defined number of repetitions to train the XGBoost model that can balance model bias and variance error.
The XGBoost algorithm has been demonstrated to be effective for a wide range of regression and classification problems, such as overfitting and imbalanced datasets (Dong et al., 2020). In this study, we used XGBoost models to predict the magni- After the influential hyperparameters were identified, the next step was to search the optimal values for these hyperparameters through hyperparameter tuning (i.e., the HT approach). To do so, we first randomly selected 70% of the EOF data sets within 200 the domain. Based on the selected data, we then used the Bayesian optimization (BO) approach implemented via the Python Hyperopt library (Bergstra et al., 2013) to identify the optimal hyperparameter values. Given these optimal values, we trained and evaluated XGBoost models in predicting the magnitude of daily EOF runoff at the EOF sites in the domain (Hu et al., 2021). Additionally, to evaluate the impact of hyperparameter selection on model performance, we trained and evaluated the performance of XGBoost models without hyperparameter selection, that is, using only hyperparameter tuning on the initial 205 set of hyperparameters. To mitigate the impact of the imbalanced runoff data, we used the Stratified K-Fold cross-validation across different scenarios to ensure the training and test datasets follow a similar distribution and defined a loss function that penalizes more the missing predictions of non-zero runoff events, that is, the minority class in this study.

Evaluation Metrics
In this study, we used mean absolute error (MAE) to measure the score of the objective function, O (Y), and R-Squared(R 2 ) 210 to measure the level of agreement between the predictions from the XGBoost models and observations of daily EOF runoff.
where n is the sample size; y i andŷ i are the observed and predicted value of daily EOF runoff for a specific EOF site, respectively;Ȳ andȲ are the mean value of y i andŷ i , respectively. The MAE value equal to zero is the perfect score, 215 whereas the R 2 value closer to one is considered to be the perfect agreement between predictions and observations.

Results
The ability to represent the search space of all nine hyperparameters by the selected samples is critical to estimating their influence on the model performance through the sensitivity analysis approach. In our case, we have selected 4,000 samples in total. As shown by the histogram plots on the diagonal in Figure 3 Through hyperparameter selection, the influence of hyperparameters is ranked by their contributions to the variance of the objective function, characterized by the sensitivity score of the total order index, ST (Figure 3(b)). The higher the score, 225 the more influential is the hyperparameter. Among the nine hyperparameters for the XGBoost model, the subsample ratio of the training data, SS is the most influential hyperparameter with the highest sensitivity score for both S and ST , followed by the maximum tree depth, M D, and the learning rate LR (Figure 3(b)). We noticed small differences in sensitivity score between S and ST for the first two most influential hyperparameters, SS and M D, indicating that the contributions to their ST scores are made by the variation of the hyperparameters per se. In contrast, for the hyperparameter, LR, a large portion 230 of its high ST score is contributed through the interaction between LR and the other hyperparameters. As a result, instead of tuning all hyperparameters, we only need to search for the optimal values of these three influential hyperparameters through hyperparameter tuning.
We trained XGBoost models for the prediction of daily EOF runoff events in the study domain ( Figure 2) with a fixed number of iterations (i.e., 8,000 in our case). We noticed that the training of XGBoost models preceded by our proposed framework, 235 the combination of hyperparameter selection and tuning (i.e., the HS-HT approach), took the least time (0.7s; Figure 4(c)) compared with the training of models preceded by either the HS or HT approach (1.4s and 61.6s; Figure 4(a) and (b)). Similar to the case with only the HS approach, the performance of the XGBoost model steadily improved during training when the HS-HT approach was used, i.e., R 2 increased from 0.52 to 0.68. In contrast, when only the HT approach is used, the model quickly achieved almost perfect performance (i.e., close to R 2 = 1) but gained small improvements over the rest of the training.

240
In this regard, the training process after the HT approach was not as effective compared with that using the HS-HT approach, although the former achieved better training performance.   However, better training performance cannot guarantee a better test performance due to the risk of overfitting. For the Maumee domain (Figure 2), the XGBoost model achieved an almost perfect agreement with the observations (R 2 = 1.0) when preceded by the HT approach, while having a relatively poor performance in test with R 2 = 0.31 ( Figure 5(a)), representing 245 a 69.0% reduction in performance. On the contrary, the XGBoost model performed worse in training when preceded by the proposed framework (i.e., the HS-HT approach) with R 2 = 0.67, but produced a better test performance (R 2 = 0.40), resulting in a smaller performance reduction, 40.3%.
Similarly, we also evaluated the overfitting of the resulting XGBoost models by directly measuring the gaps between the model performances in training at different numbers of iterations and their test performances (Figure 5(b)). Please note that 250 once XGBoost models are trained, their test performances become irrelevant to the number of iterations during the training process and thus stay constant ( Figure 5(b)). We noticed that the model preceded by the HT approach was more prone to overfitting, since the gaps measured by the differences in R 2 values were always larger than the gaps for the case with the proposed framework. As such, it demonstrated that the proposed framework can help reduce model overfitting.
As shown by Figure 6(a), using the proposed framework (i.e., the HS-HT approach), the resulting XGBoost model achieved a better agreement with the observations (R 2 = 0.40) than the corresponding XGBoost models preceded by other approaches (R 2 = 0.36 for the HS approach, R 2 = 0.31 for the HT approach). We noticed the relative difference in model performance for the HS and HS-HT approaches (in terms of R 2 value) is smaller than the difference for the HT and HS-HT approaches.
In this regard, the XGBoost model had the worst performance if only the HT approach was used to search for the optimal values for all hyperparameters. This is further demonstrated by the comparison of residual errors between the observations and 260 predictions by XGBoost models preceded by different approaches (Figure 6(b)): for the HS-HT approach, residual errors are more concentrated around the zero value compared with the wider scatter of errors as the result of using only the HS or HT approach, respectively. As such, the XGBoost models can often achieve better test performance when preceded by the proposed framework.

265
In this section, we will discuss the effects of the proposed framework using hyperparameter selection and tuning on model training and the overall performance of XGBoost models. Through the discussion below, we aim to demonstrate that the results gained from this study are generally applicable to other data-driven models.

Influence of hyperparameters
In this study, we conducted the Sobol-based global sensitivity analysis (i.e., the HS approach) to identify the influential hy-270 perparameters of XGBoost models. We identified three influential hyperparameters for the XGBoost model based on their sensitivity scores of the total order index (i.e., ST ) and their relative differences from the first order index (i.e., S). Among them, the maximum tree depth, M D and the learning rate, LR are often considered important hyperparameters for XGBoost models, since LR is associated with model convergence and M D controls the depth of the tree model. As the depth of the tree increases, the tree model contains more inner layers, enabling it to better learn complex, nonlinear patterns from the data.

275
However, tree models with greater depth are also more prone to overfitting.
For the learning rate (LR), a higher learning rate often leads to faster training, but the resulting tree models are more likely to reach sub-optimal solutions. In contrast, models with a low learning rate converge slowly, but are likely to have good performance with optimal hyperparameter values. Additionally, around half of its influence measured by ST is the result of interactions with other hyperparameters (Figure 3). For such a hyperparameter, we need to investigate if other hyperparameters 280 with low ST scores should also be considered influential due to their interactions with the target hyperparameter. In our case, we decided not to consider others, mainly because the S score of the learning rate is already much higher than the ST score of the next one in the rank, i.e., the number of tree estimators, ET .
Although these two hyperparameters are considered influential in the current study, the most influential hyperparameter is the sub-sample ratio (SS) of the training data, which determines the sample size used to grow a new tree model in each 285 boosting iteration. This is possibly due to the imbalanced data of the target variable, the daily EOF runoff, which is often zero-inflated with sparsely distributed runoff events over a long time horizon. The number of non-zero EOF runoffs in the training set, determined by the sub-sample ratio, can affect the model performance. With more zero values included in the data set, fewer non-zero EOF runoffs are available to support model training, and vice versa. As such, the sub-sample ratio appears to be the most critical hyperparameter for the performance of the XGBoost model in the study. Similar to the sensitivity analysis 290 of physics-based models, analysis results depend on the characteristics of the target variable (e.g., the daily EOF runoff in our case). As such, for applications involving data-driven models, we can first rely on our experience to select the hyperparameters and then refine the list of influential hyperparameters using the proposed HS approach.

Algorithm complexity and model training
Data-driven models perform differently in training with and without hyperparameter selection. In general, models with more 295 hyperparameters are more capable of learning complex, nonlinear relationships from data. In our case study, XGBoost models were initially set up with nine hyperparameters (Figure 3) to account for the complexity of daily EOF runoff prediction (Hu et al., 2021). This explains why the XGBoost model without hyperparameter selection can often achieve very good training performance ( Figure 4). However, fast convergence to good training performance indicates that data patterns can be too easy for the model to learn. Furthermore, after the initial significant improvement, the performance of the XGBoost model levels off 300 for the majority of the training time. In this regard, the whole training is not effective, using additional training time on almost negligible improvements.
After hyperparameter selection, three out of nine hyperparameters are considered influential to the prediction of daily EOF runoff, which allows model training with a less complex XGBoost algorithm for the search of optimal model parameter values.
For this reason, given the same number of iterations for training, it is thus more efficient to train the model after hyperparameter 305 selection in terms of training time (Figure 4). Meanwhile, guaranteed by the HS approach, the removal of non-influential hyperparameters will have no or limited impact on model performance in the EOF runoff prediction. Training can also be more effective, as demonstrated by the steady improvement of the XGBoost model during the training period. As such, through hyperparameter selection, the resulting XGBoost model, equipped with fewer but influential hyperparameters, can be trained more efficiently and effectively to predict the target variable, e.g., the daily EOF runoff over the Maumee domain.

310
Meanwhile, XGBoost models also perform differently in training with and without hyperparameter tuning. When training an XGBoost model without the HT approach, we assign values to the hyperparameters by trial and error. The resulting XGBoost algorithm is likely not to be optimal and thus can take longer time to search for the optimal values for the model parameters compared with the case using hyperparameter optimization; this is demonstrated by the faster convergence to better performance when training is preceded by the HS-HT approach compared with that by the HS approach alone (Figure 4(a) and (c)).

315
Nevertheless, model training preceded by the HS approach is still more effective compared with that using the HT approach alone (Figure 4(a) and (b)). This can be because the XGBoost algorithm with more hyperparameters (without hyperparameter selection) can more easily learn the pattern from data, resulting in no improvement in training for the majority of the training time. As such, the combination of the HS and HT approach as proposed by the framework can most effectively improve search efficiency.

Model overfitting and performance
The complexity of the underlying machine learning algorithm can be characterized by the number of hyperparameters and their values, which are critical to the model performance. High algorithm complexity can often result in overfitted models, as demonstrated by the large model performance gap in training and test (Figure 5(a) and (b)). Through the identification of influential hyperparameters, the HS approach helps reduce the algorithm complexity by using an appropriate number of 325 hyperparameters that can balance the prediction errors and variance in the data set. As a result, reduction of algorithm complexity through the removal of non-influential hyperparameters can effectively reduce model overfitting without compromising model performance, which is further guaranteed by the use of the HT approach, searching optimal values for these influential hyperparameters.

330
The framework is designed to reduce model training time and improve model performance, which is done through the identification of influential hyperparameters and their optimal values. Please note that the specific results for hyperparameter selection and tuning are data-and domain-specific, and the impact of data size, quality and location is not yet fully explored in this study.
Additionally, previous work (Hu et al., 2018) has demonstrated the importance of feature selection for model performance in terms of model training time and overfitting. Thus, it is worth investigating the performance of data-driven models when the 335 framework is combined with feature selection.

Conclusions
In this paper, we developed a framework composed of hyperparameter selection and tuning, which can effectively improve the performance of data-driven models by reducing both model training time and model overfitting. We demonstrated the framework efficacy using a case study of daily EOF runoff prediction by XGBoost models in the Maumee domain, U.S.

340
Through the use of Sobol-based global sensitivity analysis, hyperparameter selection enables the reduction in complexity of the XGBoost algorithm without compromising its performance in model training. This further allows hyperparameter tuning using a Bayesian optimization approach to be more effective in searching the optimal values only for the influential hyperparameters.
The resulting optimized XGBoost algorithm can effectively reduce model overfitting and improve the overall performance of XGBoost models in the prediction of daily EOF runoff. This framework can thus serve as a useful tool for the application of 345 data-driven models in EESs.
Code and data availability. Input data and codes to reproduce the study can be found here: https://doi.org/10.5281/zenodo.7026695 . results were carried out by CG and YH. SME produced the map of the case study site. The original draft of the paper was written by CG and YH, with edits, suggestions, and revisions provided by SME.
Competing interests. The contact author has declared that neither they nor their co-authors have any competing interests.